Speed up compiling software with distcc
Posted by Steve on Mon 21 Mar 2005 at 19:44
Many people use their Debian machines for compiling software and waiting for large compilation jobs to finish can be very frustrating. For example compiling GCC on my home machine takes over ten hours. But by sharing the compilation jobs amongst a number of machines compilation can finished in much less time.
In contrast to many distributed systems the requirments are minimal:
- Install GCC and distcc on all the machines you wish to use for the "compile farm".
As you can see there are no onerous requirements, common with distributed work. Specifically:
- You don't need to have the same libraries and development packages on all systems.
- You don't need to have a single filesystem which all machines can access.
- The clocks on all the machines don't need to be in sync (although synchronising clocks is easy enough to do).
Installing the software upon a Debian machine is as simple as using the apt-get command:
apt-get install distcc
When the package is installed you'll be treated with two questions from the debconf process:
- Should distcc be started on boot?
- Answer yes if think you will wish to use the package often, or no if not. (No is the default)
- Which machines should be allowed to connect to the distcc server?
- Asnwer with the machines you wish to allow, and the localhost
The answers you give will be stored away in the configuration file /etc/default/distcc.
Install the package on all the other machines you wish to use as your compilation farm - and make sure that each one is allowed to connect to the other.
In my case I wish to allow all my hosts to connect to each other, so I allow the server to start at boot time and setup the hosts as follows:
The first says that any machine with IP address 192.168.1.x can connect, as can the localhost.
Once the package has been installed on two or more machines you're ready to actually use it. This is as simple as setting your "compiler" to be the distcc process, and specifying the hosts that are to be used in the compilation job.
There are several ways you can specify the hosts which should be used to perform the compilation:
- Via the environmental variable "DISTCC_HOSTS"
- Via the a per-user configuration file ~/.distcc/hosts
Assuming that you have two machines lappy and mystery and wish to compile a job you could run using distcc as follows:
skx@lappy:~/tmp$ export DISTCC_HOSTS="lappy mystery" skx@lappy:~/tmp$ make CC=distcc
This specifies the two machines you wish to use for the compilation, and runs make telling it to use distcc as the compiler.
Once you do this you should find that the jobs are spread fairly evenly across the two machines.
The results of the compilation can be seen in the logfile /var/log/distccd.log. Here's an except from the logfile as the server starts, showing it listening and details of the host:
distccd (dcc_setup_daemon_path) daemon's PATH is /usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin distccd (dcc_listen_by_addr) listening on :::3632 distccd (dcc_standalone_server) 1 CPU online on this server distccd (dcc_standalone_server) allowing up to 3 active jobs distccd (dcc_log_daemon_started) preforking daemon started (2.18.3 i386-pc-linux-gnu, built Mar 12 2005 02:23:29)
Now you can see how a job looks. This snippet shows a job sent from the machine 192.168.1.90:
distccd (dcc_check_client) connection from ::ffff:192.168.1.90:32903 distccd compile from lsystem.c to lsystem.o distccd (dcc_r_file_timed) 84949 bytes received in 0.006228s, rate 13320kB/s distccd (dcc_collect_child) cc times: user 0.129980s, system 0.006998s, 2076 minflt, 1 majflt distccd cc lsystem.c on localhost completed ok distccd job complete
So, to summerise, once you've installed the distcc program on each machine in your firm you must:
- Tell the local instance which machines it can contact to compile files upon remotely.
- Make sure you use "distcc" instead of "gcc" as the compiler command in your Makefile / build system.
To further speed up compilations you might wish to look at the ccache package - which allows you to cache compilation results. This allows you to avoid recompiling files which haven't changed - it can be used in conjunction with distcc too.
(Update: Speeding up recompilation with ccache has also been covered now)