Speed up compiling software with distcc

Posted by Steve on Mon 21 Mar 2005 at 19:44

Many people use their Debian machines for compiling software and waiting for large compilation jobs to finish can be very frustrating. For example compiling GCC on my home machine takes over ten hours. But by sharing the compilation jobs amongst a number of machines compilation can finished in much less time.

distcc is the Debian package of the software found at the distcc website, and it allows you to easily distribute your compilation jobs over a number of machines.

In contrast to many distributed systems the requirments are minimal:

  • Install GCC and distcc on all the machines you wish to use for the "compile farm".

As you can see there are no onerous requirements, common with distributed work. Specifically:

  • You don't need to have the same libraries and development packages on all systems.
  • You don't need to have a single filesystem which all machines can access.
  • The clocks on all the machines don't need to be in sync (although synchronising clocks is easy enough to do).

Installing the software upon a Debian machine is as simple as using the apt-get command:

apt-get install distcc

When the package is installed you'll be treated with two questions from the debconf process:

  • Should distcc be started on boot?
    • Answer yes if think you will wish to use the package often, or no if not. (No is the default)
  • Which machines should be allowed to connect to the distcc server?
    • Asnwer with the machines you wish to allow, and the localhost

The answers you give will be stored away in the configuration file /etc/default/distcc.

Install the package on all the other machines you wish to use as your compilation farm - and make sure that each one is allowed to connect to the other.

In my case I wish to allow all my hosts to connect to each other, so I allow the server to start at boot time and setup the hosts as follows:

192.168.1.0/24 127.0.0.1

The first says that any machine with IP address 192.168.1.x can connect, as can the localhost.

Once the package has been installed on two or more machines you're ready to actually use it. This is as simple as setting your "compiler" to be the distcc process, and specifying the hosts that are to be used in the compilation job.

There are several ways you can specify the hosts which should be used to perform the compilation:

  • Via the environmental variable "DISTCC_HOSTS"
  • Via the a per-user configuration file ~/.distcc/hosts

Assuming that you have two machines lappy and mystery and wish to compile a job you could run using distcc as follows:

skx@lappy:~/tmp$ export DISTCC_HOSTS="lappy mystery"
skx@lappy:~/tmp$ make CC=distcc

This specifies the two machines you wish to use for the compilation, and runs make telling it to use distcc as the compiler.

Once you do this you should find that the jobs are spread fairly evenly across the two machines.

The results of the compilation can be seen in the logfile /var/log/distccd.log. Here's an except from the logfile as the server starts, showing it listening and details of the host:

distccd[31929] (dcc_setup_daemon_path) daemon's PATH is /usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
distccd[31929] (dcc_listen_by_addr) listening on :::3632
distccd[31929] (dcc_standalone_server) 1 CPU online on this server
distccd[31929] (dcc_standalone_server) allowing up to 3 active jobs
distccd[31930] (dcc_log_daemon_started) preforking daemon started (2.18.3 i386-pc-linux-gnu, built Mar 12 2005 02:23:29)

Now you can see how a job looks. This snippet shows a job sent from the machine 192.168.1.90:

distccd[31931] (dcc_check_client) connection from ::ffff:192.168.1.90:32903
distccd[31931] compile from lsystem.c to lsystem.o
distccd[31931] (dcc_r_file_timed) 84949 bytes received in 0.006228s, rate 13320kB/s
distccd[31931] (dcc_collect_child) cc times: user 0.129980s, system 0.006998s, 2076 minflt, 1 majflt
distccd[31931] cc lsystem.c on localhost completed ok
distccd[31931] job complete

So, to summerise, once you've installed the distcc program on each machine in your firm you must:

  • Tell the local instance which machines it can contact to compile files upon remotely.
  • Make sure you use "distcc" instead of "gcc" as the compiler command in your Makefile / build system.

To further speed up compilations you might wish to look at the ccache package - which allows you to cache compilation results. This allows you to avoid recompiling files which haven't changed - it can be used in conjunction with distcc too.

(Update: Speeding up recompilation with ccache has also been covered now)

 

 


Posted by Anonymous (68.165.xx.xx) on Fri 25 Mar 2005 at 03:44
Is there a way to set DISTCC_HOSTS to something like 192.168.1.0/24, so that it will farm out compile jobs to every pc in your LAN? For example, I use DHCP to assign IP addresses, so I don't know from one day to the next what hostname will have which IP address. I could find out via nmblookup or something, but I'd like to have something general that I can set and forget.

[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Fri 25 Mar 2005 at 10:20
[ View Steve's Scratchpad | View Weblogs ]

It doesn't look like it.

Best thing to do would be for you to fill in all the hosts in the configuration file instead - if a host doesnt respond then it won't be included in the build.

(So there'll be a small overhead with having a dead host in the list, but it won't be huge).

Steve
-- Steve.org.uk

[ Parent | Reply to this comment ]

Posted by Anonymous (68.166.xx.xx) on Sun 27 Mar 2005 at 21:33
Thanks, Steve!

[ Parent | Reply to this comment ]

Posted by Anonymous (84.165.xx.xx) on Thu 11 Aug 2005 at 17:41
how much bandwidth does id need?
do all the "clients" need the same CPU? can i mix amd-athlon with intel-p3 for example?

thx ;-)

[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Thu 11 Aug 2005 at 19:02
[ View Steve's Scratchpad | View Weblogs ]

If you're not doing heavy optimizations you should be ok mixing archs in my experience.

As for bandwidth; usage should be minimal - you'll only be sending and receiving "source files" and "object files".

Neither of these is likely to be large, but I guess it is a case of test it and see ..

Steve
-- Steve.org.uk

[ Parent | Reply to this comment ]

Posted by deadcat (67.166.xx.xx) on Thu 18 Jan 2007 at 06:17
[ View Weblogs ]
root@hades:/home/deadcat/tmp/drivers/orinoco# make CC=distcc
make -C /usr/src/linux-2.6.17.7 M=/home/deadcat/tmp/drivers/orinoco KERNELRELEASE=2.6.17.7+hades+20060811++hannah-grsec modules
make[1]: Entering directory `/usr/src/linux-2.6.17.7'
CC [M] /home/deadcat/tmp/drivers/orinoco/prism_usb.o
distcc[2700] (dcc_writex) ERROR: failed to write: Connection refused
distcc[2700] (dcc_writex) ERROR: failed to write: Broken pipe
distcc[2700] Warning: failed to distribute /home/deadcat/tmp/drivers/orinoco/prism_usb.c to medusa, running locally instead
/home/deadcat/tmp/drivers/orinoco/prism_usb.c:464: error: unknown field 'owner' specified in initializer
/home/deadcat/tmp/drivers/orinoco/prism_usb.c:464: warning: initialization from incompatible pointer type
distcc[2700] ERROR: compile /home/deadcat/tmp/drivers/orinoco/prism_usb.c on localhost failed
make[2]: *** [/home/deadcat/tmp/drivers/orinoco/prism_usb.o] Error 1
make[1]: *** [_module_/home/deadcat/tmp/drivers/orinoco] Error 2
make[1]: Leaving directory `/usr/src/linux-2.6.17.7'
make: *** [modules] Error 2
root@hades:/home/deadcat/tmp/drivers/orinoco#


please tell me this is NOT because of grsecurity again!!! (=
ignore the orinoco compile errors. it hasnt work in a while.. its the distcc error i am confuse with.

thanks

[ Parent | Reply to this comment ]

Posted by Steve (80.68.xx.xx) on Thu 18 Jan 2007 at 09:19
[ View Steve's Scratchpad | View Weblogs ]

The interesting part of your comment was this line

distcc[2700] (dcc_writex) ERROR: failed to write: Connection Refused

I'd suggest you check that the process is listening on the hosts that you're trying to distribute to, and that you can connect to them with something like telnet.

Steve

[ Parent | Reply to this comment ]

Posted by deadcat (67.166.xx.xx) on Thu 18 Jan 2007 at 16:05
[ View Weblogs ]
i cant even telnet to that port which is strange...

heres my distcc file:
# Defaults for distcc initscript
# sourced by /etc/init.d/distcc

#
# should distcc be started on boot?
#
# STARTDISTCC="true"

STARTDISTCC="true"

#
# Which networks/hosts should be allowed to connect to the daemon?
# You can list multiple hosts/networks separated by spaces.
# Networks have to be in CIDR notation, f.e. 192.168.1.0/24
# Hosts are represented by a single IP Adress
#
# ALLOWEDNETS="127.0.0.1"

ALLOWEDNETS="127.0.0.1 10.10.10.0/24 10.10.32.0/24"
LISTENER="127.0.0.1 10.10.10.0/24 10.10.32.0/24"


firewall is off

[ Parent | Reply to this comment ]

Posted by Steve (80.68.xx.xx) on Thu 18 Jan 2007 at 16:16
[ View Steve's Scratchpad | View Weblogs ]

If you run :

/etc/init.d/distcc restart

Does that help? If not take a look at /var/log/distccd.log to see if it logs anything.

I'm suspicious of the "LISTENER" setting that you have since according to the documentation it only allows a single address to be used. I'd suggest leaving LISTENER as blank to make it listen upon all interfaces, (rely upon the ALLOWEDNETS to keep it secure) and restarting it to see if that helps.

Steve

[ Parent | Reply to this comment ]

Posted by deadcat (67.166.xx.xx) on Thu 18 Jan 2007 at 16:44
[ View Weblogs ]
thanks steve.. setting LISTENER to blank works.. i guess dpkg-reconfigure distcc was confusing me about the listener part because i had distcc working before distcc was in debian but rarely used it. (=

[ Parent | Reply to this comment ]

Sign In

Username:

Password:

[Register|Advanced]

 

Flattr

 

Current Poll

What do you use for configuration management?








( 534 votes ~ 7 comments )

 

 

Related Links