Avoiding slow package updates with package diffs

Posted by Steve on Thu 14 Sep 2006 at 08:56

If you're using the unstable or testing distribution of Debian GNU/Linux you will almost certainly have noticed that apt-get uses daily-diffs for its package updates. In many common situtations this is more bandwidth efficient, however it isn't always appropriate.

apt-get is a standard command which is used by many Debian users to manage package installation, and upgrades. (Although there are also other package managers such as synaptic, or aptitude.)

Typically apt-get works in a two stage process:

"apt-get update"

The update sub-command instructs apt-get to connect to list of package sources and download a list of all the available packages to the local system. These lists contain the names, descriptions, version information and dependency information of all the packages available for installation.

Using these lists apt-get can be instructed to install a package from a remote source, upgrade all packages, search for Debian packages, and conduct other operations.

"apt-get upgrade

The upgrade command is a familiar one to most users, it instructs the apt-get tool to install any available package upgrades which are available from the remote site(s) you're using.

This command will not install packages which aren't currently installed in your system - instead it will inform you that some packages have been kept back.

(There are also other sub-commands available to apt-get; see the manpage for details by running "man apt-get".)

Until recently each time you ran "apt-get update" you would download each complete package list from the sources you've got configured - regardless of the fact that very little might have changed since the previous time you did so.

As an example here is part of the output from my desktop system when I run this command:

subliminal@messages:~# apt-get update
..
..
Get: 5  http://ftp.uk.debian.org sid/main Packages [4354kB]
Get: 6  http://ftp.uk.debian.org sid/contrib Packages [62.1kB]
Get: 7  http://ftp.uk.debian.org sid/non-free Packages [87.3kB]
Get: 8  http://ftp.uk.debian.org sid/main Sources [1233kB]
Get: 9  http://ftp.uk.debian.org sid/contrib Sources [21.2kB]
Get: 10 http://ftp.uk.debian.org sid/non-free Sources [28.5kB]
Fetched 5792kB in 34s (168kB/s)                                       

Here we can see that I've downloaded several files in the process, and the total size was 5792kB - or 5Mb.

In an attempt to make package updates more dial-up friendly, or easier for other users who are on a low-bandwidth link this system of package downloads was recently augmented by the addition of "package diffs". Rather than downloading the complete package lists upon each update only the difference to the package lists are fetched.

A package update could now look something like this:

notice@me:~# apt-get update
...
...
Get: 27 2006-09-10-1306.48.pdiff [293B]                                        
Get: 28 2006-09-10-1306.48.pdiff [301B]                                        
Get: 29 2006-09-10-1306.48.pdiff [13.2kB]                                      
Get: 30 2006-09-11-1318.15.pdiff [60.1kB]                                      
Get: 31 2006-09-11-1318.15.pdiff [60.1kB]                                      
Get: 32 2006-09-11-1318.15.pdiff [436B]                                        
Get: 33 2006-09-11-1318.15.pdiff [436B]                                        
Get: 34 2006-09-11-1318.15.pdiff [257B]                                        
Get: 35 2006-09-11-1318.15.pdiff [257B]                                        
Get: 36 2006-09-12-1306.11.pdiff [136B]                                        
Get: 37 2006-09-10-1306.48.pdiff [271B]                                        
Get: 38 2006-09-11-1318.15.pdiff [60.1kB]        
Get: 39 2006-09-11-1318.15.pdiff [11.0kB]         
Get: 40 2006-09-11-1318.15.pdiff [11.0kB]         
Get: 41 2006-09-11-1318.15.pdiff [230B]                                
Get: 42 2006-09-11-1318.15.pdiff [230B]            
Get: 43 2006-09-11-1318.15.pdiff [436B]          
Get: 44 2006-09-11-1318.15.pdiff [257B]       
Get: 45 2006-09-11-1318.15.pdiff [11.0kB]     
Get: 46 2006-09-12-1306.11.pdiff [41.9kB]    
Get: 47 2006-09-12-1306.11.pdiff [41.9kB]            
Get: 48 2006-09-12-1306.11.pdiff [439B]                               
Get: 49 2006-09-12-1306.11.pdiff [439B]                               
Get: 50 2006-09-11-1318.15.pdiff [230B]           
Get: 51 2006-09-12-1306.11.pdiff [41.9kB]    
Get: 52 2006-09-12-1306.11.pdiff [9377B]      
Get: 53 2006-09-12-1306.11.pdiff [9377B]                 
Get: 54 2006-09-12-1306.11.pdiff [439B]          
Get: 55 2006-09-12-1306.11.pdiff [9377B]
Fetched 396kB in 38s (10.4kB/s)

This time we see that the total download was only 396kB - a significant saving! However the downside is apparent, instead of downloading several files we've downloaded over fifty and even though we've downloaded less data the time taken to do so was significantly longer.

If you don't frequently update your system then it may make more sense for you to revert to the old "inefficient" system and thankfully this is a simple process.

Rather than running "apt-get update" you can run:

apt-get update -o Acquire::Pdiffs=false

You can even choose to make this behaviour the default by adding editing the file /etc/apt/apt.conf (creating it if it is not present). Simply add the following line:

Acquire::PDiffs "false";

 

 


Posted by todsah (80.113.xx.xx) on Thu 14 Sep 2006 at 11:04
[ Send Message | View todsah's Scratchpad ]
On a (somewhat) related note: If you want to limit the bandwidth used by apt-get when it upgrades, you can install 'trickle' (Homepage). Trickle is a user-space tool (no weird kernel patches required, no firewalling stuff) that allows you to limit the speed of incoming and outgoing transfers.
[root@jib]~# apt-get install trickle
Setting up trickle (1.07-4) ...
[root@jib]~# trickle -d 50 apt-get upgrade
Handy when you need to do a big upgrade, but you don't want to saturate your companies network pipe :)

[ Parent | Reply to this comment ]

Posted by simonw (84.45.xx.xx) on Thu 14 Sep 2006 at 17:01
[ Send Message | View Weblogs ]
One can run 'apt-get update' automatically, with/without download or installation of packages, so there is no need for most people to only run it sporadically.

The reason to do diffs, is to reduce server load. Whilst the smaller updates may take as long (or longer), presumably the bandwidth cost to people providing Debian mirrors is substantially less.

Similarly, trickling the data, might help your bandwidth, but presumably means more concurrent sessions on the Debian mirrors, so increases the total resource required?

The major win at work, was tweaking the Squid cache settings, so it caches files of several megabytes in size. Nothing like downloading 21MB of updates in 3 seconds, to realise that smarter caching really is the answer.

If scalability is the issue driving all this, then a proper peer to peer caching network, which gets faster as more people request the content, is clearly the answer.

[ Parent | Reply to this comment ]

Posted by Anonymous (87.165.xx.xx) on Mon 21 May 2007 at 17:10
Actually, if you update once in a month or once in two month, the server load is much higher than just updating without pDiffs, since you won´t have to download 4 pdiffs per day.

It sure saves load for ppl using autoupdate once a day, which is on most server infrastructures to have a up-to-date database. But for me with my laptop this sure produces a big overload if my laptop goes downloading for an hour ^^

[ Parent | Reply to this comment ]

Posted by peterhoeg (193.163.xx.xx) on Fri 15 Sep 2006 at 13:14
[ Send Message ]
Steve, is this only from apt-get in etch or from what version will the diff-updates be available?

[ Parent | Reply to this comment ]

Posted by Steve (62.30.xx.xx) on Fri 15 Sep 2006 at 13:20
[ Send Message | View Steve's Scratchpad | View Weblogs ]

I believe etch/sid yes.

According to the changelog it was version 0.6.44:

apt (0.6.44) unstable; urgency=low
...
...
  * apt pdiff support from experimental merged
...
...
 -- Michael Vogt [email]  Mon,  8 May 2006 22:28:53 +0200

Steve

[ Parent | Reply to this comment ]

Posted by peterhoeg (193.163.xx.xx) on Fri 15 Sep 2006 at 15:08
[ Send Message ]
Excellent, thanks Steve.

If anybody else should be interested, version 0.6.44 is available from backports.org: http://www.backports.org/debian/pool/main/a/apt/

[ Parent | Reply to this comment ]

Posted by Charlie (203.54.xx.xx) on Thu 12 Oct 2006 at 08:00
[ Send Message ]
Excellent explanation, and very helpful.

Thank you.

[ Parent | Reply to this comment ]

Posted by Anonymous (84.57.xx.xx) on Tue 27 Feb 2007 at 14:50
Updating using diffs is excepionally slow on old (slow) computers. I don´t think this is neccessary, in other words i think it´s sort of a bug. Better use the old method on any old computer until this gets better.

g k

[ Parent | Reply to this comment ]

Posted by Anonymous (70.105.xx.xx) on Tue 10 Apr 2007 at 00:39
Debian will work on a 386 with 4M ram.
I have it on a 90Mc Pentium laptop with Windowmaker @ 16M ram.
When I try to get a package installed,
it always loads the entire database for Apt,
causing much swap activity and brings a normally very useful machine
to virtually a total stand still.

If somebody doesn't correct this, maybe when I get around to it,
I may just rewrite the Apt to only load names, not desciptions.
Which would lower ther ram requirements signifigantly.
Which in turn may allow an old machine to upgrade normally
with out trashing the hard disc with needless swapping.

Why not only show descriptions when asked for?
Like when the show description keys are pressed.
Loading everything is simple, but un-needed.

Just an idea..
Thanks...

[ Parent | Reply to this comment ]

Posted by Anonymous (24.19.xx.xx) on Thu 28 Feb 2013 at 23:22
Cheers mate. Useful

[ Parent | Reply to this comment ]

Sign In

Username:

Password:

[Register|Advanced]

 

Flattr

 

Current Poll

Which init system are you using in Debian?






( 1026 votes ~ 6 comments )