Updating multiple machines on low bandwidth
Posted by Steve on Fri 16 Jun 2006 at 11:40
There are situations where it is common to want to update multiple machines running Debian GNU/Linux whilst minimizing the bandwidth used for downloading packages and updates. There are several different solutions for this problem and here we'll look at one of them: apt-proxy.
In my home setup I have three machines all running Debians unstable distribution sid. It is wasteful to have each of these machines download the latest packages from the network especially considering that each host contains an almost identical list of installed packages.
One of the simplest solutions is to setup a caching proxy server which will be used to fetch packages by each host. This will ensure that packages are downloaded from the network the first time they are requested, then when the next two machines come to request the same package it will be fetched from the cache - not using up any external bandwidth at all!
There are several proxies which are included in the Debian distribution, the one I like best is the apt-proxy package.
Installing the package upon a single host is very straightforward:
root@itchy:~# apt-get install apt-proxy
Once installed you can configure the software by editing the file /etc/apt-proxy/apt-proxy-v2.conf. In most environments you'll be fine with the defaults.
The main things you might consider changing are the port number the server listens upon, 9999 by default, and the location upon the host where the .deb files will be cached. These can be changed by the following entries in the configuration file:
;; Server port to listen on port = 9999 ;; Cache directory for apt-proxy cache_dir = /var/cache/apt-proxy
(The cached files are stored in the same "pool structure" as they would be on Debians mirrors, so choosing to save them to /var/cache/apt/archives which might seem sensible won't do what you might expect.)
If you do choose to made some changes you'll need to restart the server to make them take effect:
root@itchy:~# /etc/init.d/apt-proxy restart Stopping apt-proxy [wait 1]. Starting apt-proxy.
Now that you've setup the proxy the next thing you must do is update your clients to actually use it. For each machine upon your LAN you need to update the sources.list file which apt-get uses to determine the download sources.
In my case the server I installed apt-proxy upon was called itchy (and each machine can find the IP address for that host) so I'll change each machines /etc/apt/sources.list file from this:
# # /etc/apt/sources.list # # # Unstable # deb http://ftp.uk.debian.org/debian sid main contrib non-free deb-src http://ftp.uk.debian.org/debian sid main contrib non-free
To this:
# # /etc/apt/sources.list # # # Unstable, via apt-proxy running on itchy. # deb http://itchy.my.flat:9999/debian sid main contrib non-free deb-src http://itchy.my.flat:9999/debian sid main contrib non-free
Once this is done running "apt-get update" on an updated machine looks like this:
root@desktop:~# apt-get update Get: 1 http://itchy sid Release.gpg [189B] Hit http://itchy sid Release Ign http://itchy sid/main Packages/DiffIndex Ign http://itchy sid/contrib Packages/DiffIndex Ign http://itchy sid/non-free Packages/DiffIndex Ign http://itchy sid/main Sources/DiffIndex Ign http://itchy sid/contrib Sources/DiffIndex Ign http://itchy sid/non-free Sources/DiffIndex Hit http://itchy sid/main Packages Hit http://itchy sid/contrib Packages Hit http://itchy sid/non-free Packages Hit http://itchy sid/main Sources Hit http://itchy sid/contrib Sources Hit http://itchy sid/non-free Sources Fetched 189B in 3s (56B/s) Reading package lists... Done
Here we see that we connected to itchy instead of ftp.uk.debian.org, and once we run "apt-get update" upon a machine we'll see the cached files appear on itchy.
Remember that the .deb files are cached to /var/cache/apt-proxy by default. Looking in that directory we can see:
root@itchy:~# ls /var/cache/apt-proxy/debian/pool/main/ a d g j liba libe libh libm libp libt libw m p s v y b e h k libc libf libi libn libr libu libx n q t w z c f i l libd libg libl libo libs libv liby o r u x
For example in the a/ directory we have:
root@itchy:~# ls /var/cache/apt-proxy/debian/pool/main/a/ aalib alsa-lib alsa-tools apache2 apmd apt-proxy arts alsa-driver alsa-oss alsa-utils apachetop apt aptitude autoconf
We can see the total space currently in use with the du command, with appropriate arguments:
root@itchy:~# du --total --human-readable /var/cache/apt-proxy/ | grep total 762M total
That represents a bandwidth saving of almost 2Gb! (Considering that most of the packages in the cache would have been downloaded three times were the cache not in place. Not 100% since the package lists upon the hosts do differ somewhat.)
The apt-proxy installation can also be used to cache the downloaded packages used by debootstrap and pbuilder if you use either of those tools. See /usr/share/doc/apt-proxy/README.gz for details.
[ Send Message | View Steve's Scratchpad | View Weblogs ]
Thats certainly a good solution, and we've covered setting up a transparent proxy here before.
Still there are advantages to using apt-proxy such as the automatic creation of a pool structure which can come in handy for all kinds of things.
I guess pick whichever solution works best for you :)
[ Parent | Reply to this comment ]
Transparent squid-cache combined with the apt-proxy should provide the best of both worlds, without the need for a humongous cache size or really large TTL settings.
[ Parent | Reply to this comment ]
hopefully the following process will help somebody with the apt-proxy-import in sarge as i found it poorly documented when i used it several months ago.
# setup cd image on loopback
losetup -f ubuntu-5.10-dvd-amd64.iso
# mount loopback
mount -t iso9660 /dev/loop0 /mnt/
# add internet repository to apt-proxy configuration
cat <>/etc/apt-proxy/apt-proxy-v2.conf
[ubuntu-breezy]
backends =
http://us.archive.ubuntu.com/ubuntu
http://archive.ubuntu.com/ubuntu
[ubuntu-breezy-security]
backends =
http://security.ubuntu.com/ubuntu
EOF
# must download Packages.gz before Packages if Packages doesn't exist in internet repository
wget -O/dev/null http://apt-proxy:9999/ubuntu-breezy/dists/breezy/{main,restricted}/binary-{amd64,i386}/{Packages{.gz,.bz2,},Releas e}
# restart apt-proxy to register newly downloaded Packages
/etc/init.d/apt-proxy restart
# import all packages from cd image
apt-proxy-import -v -r -i /mnt/pool/ 2>&1 | multitee 0-1,2 2>apt-proxy-import.log
# list packages not imported (ignoring installation-only udebs)
grep -A1 'Not found, trying to guess' apt-proxy-import.log | grep -v '\(Not found, trying to guess\|^--$\)' | grep -v '\.udeb'
# unmount loopback
umount -t iso9660 /dev/loop0 /mnt/
# remove loopback
losetup -d /dev/loop0
cd /var/cache/apt-proxy/ubuntu-breezy/
# find any packages on filesystem but not in a Packages file
find pool/ -iname "\.deb" | while read FILE; do grep -q ${FILE} dists/breezy//binary-*/Packages || echo "not found: ${FILE}"; done
# find any packages in a Packages file but not on filesystem
grep -h '^Filename: ' dists/breezy//binary-/Packages | cut -f2 -d\ | while read FILE; do test -f ${FILE} || echo "not found: ${FILE}"; done
exit
[ Parent | Reply to this comment ]
But I need something different:
I need to update multiple machines with no bandwidth at all.
I mean I have several machines not conected to Internet. My only access point to Internet is at a Cybercafe. I have an usb pendrive (250 megs) I use to carry data I got from the net.
I think this cenario is very commom.
Thank you again.
[ Parent | Reply to this comment ]
[ Send Message | View Steve's Scratchpad | View Weblogs ]
Look at the apt-zip package. From the description:
These scripts simplify the process of using dselect and apt on a non-networked Debian box, using removable media like ZIP floppies. One generates a `fetch' script (supporting backends such as wget and lftp, in a modular, extensible way) to be run on a host with better connectivity, check space constraints of your removable media, and then install the package on your Debian box. . Note on current version: space-checking is not done and spanning multiple disks is not yet supported.
[ Parent | Reply to this comment ]
if you have a sources.list file with a number of repositories, how do you specify those? will just connecting to apt-proxy give you the equivalent of the sources.list file from the machine running apt-proxy?
how does the machine running apt-proxy interact with it? should it's sources.list file also change?
hmmm, the apt-proxy package IS a little sparse...
aaron
[ Parent | Reply to this comment ]
[ Send Message | View Steve's Scratchpad | View Weblogs ]
apt-proxy is designed for use with Debian mirrors, so although it is mostly a general purpose proxy server it does know which mirror to use.
I guess in the interests of completeness I should have described this when discussing the configuration file, basically you'd update the configuration file to contain the "usual" Debian mirror.
For example in my case I have this:
;; Backend servers, in order of preference
backends =
http://ftp.us.debian.org/debian
http://ftp.de.debian.org/debian
http://ftp2.de.debian.org/debian
ftp://ftp.uk.debian.org/debian
This means that when I do "apt-get update/upgrade" the proxy connects to ftp.us.debian.org initially, and if that fails then it uses ftp.de.debian.org, and so on.
The machine that is running apt-proxy doesn't need its sources lists to be changed as such - although I do set all my machines to use "deb http://itchy:99989 ..." so that that machine fills/fetches from the cache too. The sources.list on the apt-proxy machine has no relevence on the mirrors which are contacted...
[ Parent | Reply to this comment ]
When Ubuntu dapper came out, I finally switched to apt-cacher. While apt-cacher isn't perfect, it certainly works a darn sight better :-)
[ Parent | Reply to this comment ]
apt-proxy is working fine when you have to upgrade 2-3 computers, but not on a large installation (3-300 computers).
So, I also switched to apt-cacher 3 months ago, and I have never restarted the daemon since then.
[ Parent | Reply to this comment ]
When my second machine starts up - guess what ? Its downloading all the files again ! Why ? Is it because installing apt-cacher should be done *before* updating and upgrading the first machine !?!? Is it because its repository is empty and it doesn't have the common sense to create a repository from the /var/cache/apt/archive directory ??
I got really annoyed and transferred the /var/cache/apt/archives/ files manually.
Does it work with aptitude ? Probably.
There is also conflicting information given in this tutorial with the way the sources.list file should be setup to another 'howto' on the apt-cacher program. Which information is best ?!?!
Ahhh I just love it when all of my options are not completely polished ... but I have options !!
[ Parent | Reply to this comment ]
[ Send Message ]
Why transparent? Just to be sure you never need to change any config in any program using http.
A special config? No, just change the file size limit to (for example) 400MB!
Something else then apt then ... ? Yep! Near the very fast apt-get downloads at several MegaBytes/s on an ISDN connection, you have the Microsoft Windows updates for example ... nice to see your Windows Update downloading at 3MegaByte/s!!!
mini-HOWTO: Transparent Proxy with Linux and Squid
Fred
Linox.BE
[ Parent | Reply to this comment ]