Splitting updatedb into daily and weekly

Posted by mcortese on Thu 20 Apr 2006 at 09:41

We all appreciate the locate command when we are such in a hurry we cannot afford a full and in-elegant find. What we like a little less, though, is the updatedb script consuming up all our disk bandwidth at each boot, summoned by anacron.

Of course, this is only the case if you are running a "desktop" machine: since you turn it on when you need to do some work, then you long for a way to shorten the period of reduced usability forced by updatedb.

Inversely, if you run a server that never goes down, and you successfully schedule your updatedb tasks late at night, then this article is not for you.

Two speeds

In any normal installation, there are directories that change more often than others. This reflects the traditional split between programs and datas.

The /usr directory has a static nature: the files in it are not meant to be changed by normal users, and even root does not update its contents very often. On some installations, /usr is even mounted read-only or served by a remote host via NFS. A common scheme is to access /usr in read/write mode only when doing a software upgrade (e.g. via apt-get).

The /home and /var directories, on the other hand, contain data that change continuously because of users and system activity.

So, it would be a good idea to have two databases for locate: one updated daily with the contents of the dynamic (and often small) directories, the other updated weekly with the contents of the static (and usually big) directories like /usr.

For the quick-changing database, I chose to keep the standard location /var/cache/locate/locatedb. For the rarely-modified one, a good choice could be /var/cache/locate/locatedb.usr.

Two cron scripts

The first think to do, is to duplicate the cron script that updates the locate database, so that one copy is run daily, the second one is run weekly:

# cp /etc/cron.daily/find /etc/cron.weekly/find

The daily script must be modified to ignore the /usr path. So edit /etc/cron.daily/find adding the following lines just after the parts that sources the configuration file, but before calling updatedb:

### Skip big discs rarely updated:
PRUNEPATHS="$PRUNEPATHS /usr"

The weekly script needs to be changed as well. Edit /etc/cron.weekly/find at the line that invokes updatedb and modify it so that it reads:

  ### Search only /usr, since the rest is done daily:
  ARGS="--output=/var/cache/locate/locatedb.usr --localpaths=/usr"
  cd / && nice -n ${NICE:-10} updatedb $ARGS 2>/dev/null

One command for two databases

The final step is to tell locate that it has to fetch its data from two files, not just one. This is done specifying the two filenames in a shell variable, separated with a colon:

$ LOCATE_PATH=/var/cache/locate/locatedb:/var/cache/locate/locatedb.usr

I suggest you make this setting the default for every user adding the following lines to /etc/bash_bashrc:

### Locate the daily and weekly databases, if not defined yet:
if [ -z "$LOCATE_PATH" ]; then
  export LOCATE_PATH=/var/cache/locate/locatedb:/var/cache/locate/locatedb.usr
fi

To test your setup, manually run the daily and weekly scripts, and then try to run locate with a filename present both inside and outside /usr:

# /etc/cron.weekly/find
# /etc/cron.daily/find
# LOCATE_PATH=/var/cache/locate/locatedb:/var/cache/locate/locatedb.usr
# locate dmesg

You should find both the dmesg log file in /var and the man page in /usr.


This article can be found online at the Debian Administration website at the following bookmarkable URL (along with associated comments):

This article is copyright 2006 mcortese - please ask for permission to republish or translate.