Splitting updatedb into daily and weekly
Posted by mcortese on Thu 20 Apr 2006 at 09:41
We all appreciate the locate command when we are such in a hurry we cannot afford a full and in-elegant find. What we like a little less, though, is the updatedb script consuming up all our disk bandwidth at each boot, summoned by anacron.
Of course, this is only the case if you are running a "desktop" machine: since you turn it on when you need to do some work, then you long for a way to shorten the period of reduced usability forced by updatedb.
Inversely, if you run a server that never goes down, and you successfully schedule your updatedb tasks late at night, then this article is not for you.
Two speeds
In any normal installation, there are directories that change more often than others. This reflects the traditional split between programs and datas.
The /usr directory has a static nature: the files in it are not meant to be changed by normal users, and even root does not update its contents very often. On some installations, /usr is even mounted read-only or served by a remote host via NFS. A common scheme is to access /usr in read/write mode only when doing a software upgrade (e.g. via apt-get).
The /home and /var directories, on the other hand, contain data that change continuously because of users and system activity.
So, it would be a good idea to have two databases for locate: one updated daily with the contents of the dynamic (and often small) directories, the other updated weekly with the contents of the static (and usually big) directories like /usr.
For the quick-changing database, I chose to keep the standard location /var/cache/locate/locatedb. For the rarely-modified one, a good choice could be /var/cache/locate/locatedb.usr.
Two cron scripts
The first think to do, is to duplicate the cron script that updates the locate database, so that one copy is run daily, the second one is run weekly:
# cp /etc/cron.daily/find /etc/cron.weekly/find
The daily script must be modified to ignore the /usr path. So edit /etc/cron.daily/find adding the following lines just after the parts that sources the configuration file, but before calling updatedb:
### Skip big discs rarely updated: PRUNEPATHS="$PRUNEPATHS /usr"
The weekly script needs to be changed as well. Edit /etc/cron.weekly/find at the line that invokes updatedb and modify it so that it reads:
### Search only /usr, since the rest is done daily:
ARGS="--output=/var/cache/locate/locatedb.usr --localpaths=/usr"
cd / && nice -n ${NICE:-10} updatedb $ARGS 2>/dev/null
One command for two databases
The final step is to tell locate that it has to fetch its data from two files, not just one. This is done specifying the two filenames in a shell variable, separated with a colon:
$ LOCATE_PATH=/var/cache/locate/locatedb:/var/cache/locate/locatedb.usr
I suggest you make this setting the default for every user adding the following lines to /etc/bash_bashrc:
### Locate the daily and weekly databases, if not defined yet: if [ -z "$LOCATE_PATH" ]; then export LOCATE_PATH=/var/cache/locate/locatedb:/var/cache/locate/locatedb.usr fi
To test your setup, manually run the daily and weekly scripts, and then try to run locate with a filename present both inside and outside /usr:
# /etc/cron.weekly/find # /etc/cron.daily/find # LOCATE_PATH=/var/cache/locate/locatedb:/var/cache/locate/locatedb.usr # locate dmesg
You should find both the dmesg log file in /var and the man page in /usr.
Not being familiar with Ubuntu, I cannot give you detailed instructions. I can only suggest:
- in the daily version, you should define the shell variable PRUNEPATHS to the directories you want to exclude before calling updatedb;
- in the weekly version, you should make sure that the invocation of updatedb gets the arguments --output and --localpaths as stated in my article.
By the way, a trailing .dpkg-new indicates that during the last upgrade a new version of that file was available, but the system detected that you had manually changed the old file, so the new one was not installed with its regular name in order not to overwrite your changes.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
--
browse ManPages online!
[ Parent | Reply to this comment ]
After sourcing the file into /etc/profile it was working perfect for me. The daily run now last only 35 seconds and the weekly one 2:15 minutes. I think I put this on the findutils wishlist...
--
browse ManPages online!
[ Parent | Reply to this comment ]
I installed mlocate as replacement. The package control file advertises that "instead of re-reading all the contents of all directories each time the database is updated, mlocate keeps timestamp information in its database and can know if the contents of a directory changed without reading them again. This makes updates much faster and less demanding on the hard drive. This feature is only found in mlocate".
I am thus wondering if it is still worth splitting the updatedb into daily and weekly with mlocate. After few tests, it seems that the database update is indeed very fast.
[ Parent | Reply to this comment ]
There's no file called "find" in cron.daily, but there's slocate, find.noslocate and find.noslocate.dpkg-new...
These 3 files seem to run updatedb, but i'm not sure.
Can you explain me what are the things to do on ubuntu ?
Thank you.
[ Parent | Reply to this comment ]