Updating multiple machines on low bandwidth

Posted by Steve on Fri 16 Jun 2006 at 11:40

There are situations where it is common to want to update multiple machines running Debian GNU/Linux whilst minimizing the bandwidth used for downloading packages and updates. There are several different solutions for this problem and here we'll look at one of them: apt-proxy.

In my home setup I have three machines all running Debians unstable distribution sid. It is wasteful to have each of these machines download the latest packages from the network especially considering that each host contains an almost identical list of installed packages.

One of the simplest solutions is to setup a caching proxy server which will be used to fetch packages by each host. This will ensure that packages are downloaded from the network the first time they are requested, then when the next two machines come to request the same package it will be fetched from the cache - not using up any external bandwidth at all!

There are several proxies which are included in the Debian distribution, the one I like best is the apt-proxy package.

Installing the package upon a single host is very straightforward:

root@itchy:~# apt-get install apt-proxy

Once installed you can configure the software by editing the file /etc/apt-proxy/apt-proxy-v2.conf. In most environments you'll be fine with the defaults.

The main things you might consider changing are the port number the server listens upon, 9999 by default, and the location upon the host where the .deb files will be cached. These can be changed by the following entries in the configuration file:

;; Server port to listen on
port = 9999

;; Cache directory for apt-proxy
cache_dir = /var/cache/apt-proxy

(The cached files are stored in the same "pool structure" as they would be on Debians mirrors, so choosing to save them to /var/cache/apt/archives which might seem sensible won't do what you might expect.)

If you do choose to made some changes you'll need to restart the server to make them take effect:

root@itchy:~# /etc/init.d/apt-proxy restart
Stopping apt-proxy [wait 1].
Starting apt-proxy.

Now that you've setup the proxy the next thing you must do is update your clients to actually use it. For each machine upon your LAN you need to update the sources.list file which apt-get uses to determine the download sources.

In my case the server I installed apt-proxy upon was called itchy (and each machine can find the IP address for that host) so I'll change each machines /etc/apt/sources.list file from this:

#
#  /etc/apt/sources.list
#

#
# Unstable
#
deb     http://ftp.uk.debian.org/debian sid main contrib non-free
deb-src http://ftp.uk.debian.org/debian sid main contrib non-free

To this:

#
#  /etc/apt/sources.list
#

#
# Unstable, via apt-proxy running on itchy.
#
deb     http://itchy.my.flat:9999/debian sid main contrib non-free
deb-src http://itchy.my.flat:9999/debian sid main contrib non-free

Once this is done running "apt-get update" on an updated machine looks like this:

root@desktop:~# apt-get update
Get: 1 http://itchy sid Release.gpg [189B]
Hit http://itchy sid Release
Ign http://itchy sid/main Packages/DiffIndex
Ign http://itchy sid/contrib Packages/DiffIndex
Ign http://itchy sid/non-free Packages/DiffIndex
Ign http://itchy sid/main Sources/DiffIndex
Ign http://itchy sid/contrib Sources/DiffIndex
Ign http://itchy sid/non-free Sources/DiffIndex
Hit http://itchy sid/main Packages
Hit http://itchy sid/contrib Packages
Hit http://itchy sid/non-free Packages
Hit http://itchy sid/main Sources
Hit http://itchy sid/contrib Sources
Hit http://itchy sid/non-free Sources
Fetched 189B in 3s (56B/s)
Reading package lists... Done

Here we see that we connected to itchy instead of ftp.uk.debian.org, and once we run "apt-get update" upon a machine we'll see the cached files appear on itchy.

Remember that the .deb files are cached to /var/cache/apt-proxy by default. Looking in that directory we can see:

root@itchy:~# ls /var/cache/apt-proxy/debian/pool/main/
a  d  g  j  liba  libe  libh  libm  libp  libt  libw  m  p  s  v  y
b  e  h  k  libc  libf  libi  libn  libr  libu  libx  n  q  t  w  z
c  f  i  l  libd  libg  libl  libo  libs  libv  liby  o  r  u  x

For example in the a/ directory we have:

root@itchy:~# ls /var/cache/apt-proxy/debian/pool/main/a/
aalib        alsa-lib  alsa-tools  apache2    apmd  apt-proxy  arts
alsa-driver  alsa-oss  alsa-utils  apachetop  apt   aptitude   autoconf

We can see the total space currently in use with the du command, with appropriate arguments:

root@itchy:~# du  --total --human-readable /var/cache/apt-proxy/ | grep total
762M    total

That represents a bandwidth saving of almost 2Gb! (Considering that most of the packages in the cache would have been downloaded three times were the cache not in place. Not 100% since the package lists upon the hosts do differ somewhat.)

The apt-proxy installation can also be used to cache the downloaded packages used by debootstrap and pbuilder if you use either of those tools. See /usr/share/doc/apt-proxy/README.gz for details.

 

 


Posted by linox_be (84.194.xx.xx) on Fri 16 Jun 2006 at 14:15
I'm also using a caching/proxy system, but it is working for much more then the apt system: a transparent proxy. It's just a Squid proxy, running as a transparent proxy.

Why transparent? Just to be sure you never need to change any config in any program using http.

A special config? No, just change the file size limit to (for example) 400MB!

Something else then apt then ... ? Yep! Near the very fast apt-get downloads at several MegaBytes/s on an ISDN connection, you have the Microsoft Windows updates for example ... nice to see your Windows Update downloading at 3MegaByte/s!!!

mini-HOWTO: Transparent Proxy with Linux and Squid


Fred
Linox.BE

[ Parent | Reply to this comment ]

Posted by Steve (62.30.xx.xx) on Fri 16 Jun 2006 at 14:39
[ View Steve's Scratchpad | View Weblogs ]

Thats certainly a good solution, and we've covered setting up a transparent proxy here before.

Still there are advantages to using apt-proxy such as the automatic creation of a pool structure which can come in handy for all kinds of things.

I guess pick whichever solution works best for you :)

Steve

[ Parent | Reply to this comment ]

Posted by Hercynium (64.69.xx.xx) on Fri 16 Jun 2006 at 16:10
Better yet - do both!

Transparent squid-cache combined with the apt-proxy should provide the best of both worlds, without the need for a humongous cache size or really large TTL settings.

[ Parent | Reply to this comment ]

Posted by undefined (192.31.xx.xx) on Sat 17 Jun 2006 at 06:39
i use apt-proxy to provide quick access to packages when installing debian or ubuntu in virtual servers (linux-vserver). so i find it most beneficial to preload apt-proxy with the latest dvd release (obtained and continually seeded through bittorrent).

hopefully the following process will help somebody with the apt-proxy-import in sarge as i found it poorly documented when i used it several months ago.

# setup cd image on loopback
losetup -f ubuntu-5.10-dvd-amd64.iso

# mount loopback
mount -t iso9660 /dev/loop0 /mnt/

# add internet repository to apt-proxy configuration
cat <>/etc/apt-proxy/apt-proxy-v2.conf

[ubuntu-breezy]
backends =
http://us.archive.ubuntu.com/ubuntu
http://archive.ubuntu.com/ubuntu

[ubuntu-breezy-security]
backends =
http://security.ubuntu.com/ubuntu
EOF

# must download Packages.gz before Packages if Packages doesn't exist in internet repository
wget -O/dev/null http://apt-proxy:9999/ubuntu-breezy/dists/breezy/{main,restricted}/binary-{amd64,i386}/{Packages{.gz,.bz2,},Releas e}

# restart apt-proxy to register newly downloaded Packages
/etc/init.d/apt-proxy restart

# import all packages from cd image
apt-proxy-import -v -r -i /mnt/pool/ 2>&1 | multitee 0-1,2 2>apt-proxy-import.log

# list packages not imported (ignoring installation-only udebs)
grep -A1 'Not found, trying to guess' apt-proxy-import.log | grep -v '\(Not found, trying to guess\|^--$\)' | grep -v '\.udeb'

# unmount loopback
umount -t iso9660 /dev/loop0 /mnt/

# remove loopback
losetup -d /dev/loop0

cd /var/cache/apt-proxy/ubuntu-breezy/

# find any packages on filesystem but not in a Packages file
find pool/ -iname "\.deb" | while read FILE; do grep -q ${FILE} dists/breezy//binary-*/Packages || echo "not found: ${FILE}"; done

# find any packages in a Packages file but not on filesystem
grep -h '^Filename: ' dists/breezy//binary-/Packages | cut -f2 -d\ | while read FILE; do test -f ${FILE} || echo "not found: ${FILE}"; done

exit

[ Parent | Reply to this comment ]

Posted by Anonymous (200.222.xx.xx) on Sat 17 Jun 2006 at 20:04
Great information, thanks.

But I need something different:

I need to update multiple machines with no bandwidth at all.

I mean I have several machines not conected to Internet. My only access point to Internet is at a Cybercafe. I have an usb pendrive (250 megs) I use to carry data I got from the net.

I think this cenario is very commom.

Thank you again.

[ Parent | Reply to this comment ]

Posted by Steve (62.30.xx.xx) on Sat 17 Jun 2006 at 20:07
[ View Steve's Scratchpad | View Weblogs ]

Look at the apt-zip package. From the description:

 These scripts simplify the process of using dselect and apt on a
 non-networked Debian box, using removable media like ZIP floppies.
 One generates a `fetch' script (supporting backends such as wget and
 lftp, in a modular, extensible way) to be run on a host with better
 connectivity, check space constraints of your removable media, and
 then install the package on your Debian box.
 .
 Note on current version: space-checking is not done and spanning
 multiple disks is not yet supported.

Steve

[ Parent | Reply to this comment ]

Posted by GoodTimes (69.17.xx.xx) on Sat 17 Jun 2006 at 21:11
[ View Weblogs ]
ok, i'm not following something here

if you have a sources.list file with a number of repositories, how do you specify those? will just connecting to apt-proxy give you the equivalent of the sources.list file from the machine running apt-proxy?

how does the machine running apt-proxy interact with it? should it's sources.list file also change?

hmmm, the apt-proxy package IS a little sparse...



aaron

[ Parent | Reply to this comment ]

Posted by Steve (62.30.xx.xx) on Sat 17 Jun 2006 at 23:43
[ View Steve's Scratchpad | View Weblogs ]

apt-proxy is designed for use with Debian mirrors, so although it is mostly a general purpose proxy server it does know which mirror to use.

I guess in the interests of completeness I should have described this when discussing the configuration file, basically you'd update the configuration file to contain the "usual" Debian mirror.

For example in my case I have this:

;; Backend servers, in order of preference
backends =
        http://ftp.us.debian.org/debian
        http://ftp.de.debian.org/debian
        http://ftp2.de.debian.org/debian
        ftp://ftp.uk.debian.org/debian

This means that when I do "apt-get update/upgrade" the proxy connects to ftp.us.debian.org initially, and if that fails then it uses ftp.de.debian.org, and so on.

The machine that is running apt-proxy doesn't need its sources lists to be changed as such - although I do set all my machines to use "deb http://itchy:99989 ..." so that that machine fills/fetches from the cache too. The sources.list on the apt-proxy machine has no relevence on the mirrors which are contacted...

Steve

[ Parent | Reply to this comment ]

Posted by Anonymous (198.54.xx.xx) on Sun 18 Jun 2006 at 18:04
apt-proxy in sarge is woefully buggy, and unreliable. I've run it for several years, and on any large update have had to find myself restarting it by hand. Yes, I did patch it.

When Ubuntu dapper came out, I finally switched to apt-cacher. While apt-cacher isn't perfect, it certainly works a darn sight better :-)

[ Parent | Reply to this comment ]

Posted by Anonymous (81.255.xx.xx) on Wed 2 Aug 2006 at 15:04
I'm agree with you.

apt-proxy is working fine when you have to upgrade 2-3 computers, but not on a large installation (3-300 computers).

So, I also switched to apt-cacher 3 months ago, and I have never restarted the daemon since then.

[ Parent | Reply to this comment ]

Posted by Anonymous (203.23.xx.xx) on Mon 18 Sep 2006 at 04:03
So. I have one machine fully updated and upgraded. I then go to the second machine and want to stop the huge downloads so I hear about apt-cacher. I install it on the first machine.

When my second machine starts up - guess what ? Its downloading all the files again ! Why ? Is it because installing apt-cacher should be done *before* updating and upgrading the first machine !?!? Is it because its repository is empty and it doesn't have the common sense to create a repository from the /var/cache/apt/archive directory ??

I got really annoyed and transferred the /var/cache/apt/archives/ files manually.

Does it work with aptitude ? Probably.

There is also conflicting information given in this tutorial with the way the sources.list file should be setup to another 'howto' on the apt-cacher program. Which information is best ?!?!

Ahhh I just love it when all of my options are not completely polished ... but I have options !!

[ Parent | Reply to this comment ]

Sign In

Username:

Password:

[Register|Advanced]

 

Flattr

 

Current Poll

What do you use for configuration management?








( 336 votes ~ 1 comments )

 

 

Related Links