Speed up compiling software with distcc

Posted by Steve on Mon 21 Mar 2005 at 19:44

Many people use their Debian machines for compiling software and waiting for large compilation jobs to finish can be very frustrating. For example compiling GCC on my home machine takes over ten hours. But by sharing the compilation jobs amongst a number of machines compilation can finished in much less time.

distcc is the Debian package of the software found at the distcc website, and it allows you to easily distribute your compilation jobs over a number of machines.

In contrast to many distributed systems the requirments are minimal:

As you can see there are no onerous requirements, common with distributed work. Specifically:

Installing the software upon a Debian machine is as simple as using the apt-get command:

apt-get install distcc

When the package is installed you'll be treated with two questions from the debconf process:

The answers you give will be stored away in the configuration file /etc/default/distcc.

Install the package on all the other machines you wish to use as your compilation farm - and make sure that each one is allowed to connect to the other.

In my case I wish to allow all my hosts to connect to each other, so I allow the server to start at boot time and setup the hosts as follows:

192.168.1.0/24 127.0.0.1

The first says that any machine with IP address 192.168.1.x can connect, as can the localhost.

Once the package has been installed on two or more machines you're ready to actually use it. This is as simple as setting your "compiler" to be the distcc process, and specifying the hosts that are to be used in the compilation job.

There are several ways you can specify the hosts which should be used to perform the compilation:

Assuming that you have two machines lappy and mystery and wish to compile a job you could run using distcc as follows:

skx@lappy:~/tmp$ export DISTCC_HOSTS="lappy mystery"
skx@lappy:~/tmp$ make CC=distcc

This specifies the two machines you wish to use for the compilation, and runs make telling it to use distcc as the compiler.

Once you do this you should find that the jobs are spread fairly evenly across the two machines.

The results of the compilation can be seen in the logfile /var/log/distccd.log. Here's an except from the logfile as the server starts, showing it listening and details of the host:

distccd[31929] (dcc_setup_daemon_path) daemon's PATH is /usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
distccd[31929] (dcc_listen_by_addr) listening on :::3632
distccd[31929] (dcc_standalone_server) 1 CPU online on this server
distccd[31929] (dcc_standalone_server) allowing up to 3 active jobs
distccd[31930] (dcc_log_daemon_started) preforking daemon started (2.18.3 i386-pc-linux-gnu, built Mar 12 2005 02:23:29)

Now you can see how a job looks. This snippet shows a job sent from the machine 192.168.1.90:

distccd[31931] (dcc_check_client) connection from ::ffff:192.168.1.90:32903
distccd[31931] compile from lsystem.c to lsystem.o
distccd[31931] (dcc_r_file_timed) 84949 bytes received in 0.006228s, rate 13320kB/s
distccd[31931] (dcc_collect_child) cc times: user 0.129980s, system 0.006998s, 2076 minflt, 1 majflt
distccd[31931] cc lsystem.c on localhost completed ok
distccd[31931] job complete

So, to summerise, once you've installed the distcc program on each machine in your firm you must:

To further speed up compilations you might wish to look at the ccache package - which allows you to cache compilation results. This allows you to avoid recompiling files which haven't changed - it can be used in conjunction with distcc too.

(Update: Speeding up recompilation with ccache has also been covered now)


This article can be found online at the Debian Administration website at the following bookmarkable URL (along with associated comments):

This article is copyright 2005 Steve - please ask for permission to republish or translate.