Weblog entry #43 for dkg
DHCP itself is one of the single points of failure in the network layout. i'd really like to make this DHCP server redundant (so that i can take that host down for service if needed and leave the rest of the network intact). However, reading dhcpd.conf(5) makes me pretty worried that the failover stuff is not well-tested or widely deployed.
I've read Paul Heinlein's Failover with ISC DHCP, which makes it look not unreasonable, but i was wondering if people have other preferred mechanisms for providing DHCP redundancy. Do you have failover DHCP set up for any LAN that you manage? If so, what do you use? Are there any gotchas to watch out for?
I'm also concerned about the security implications. On a network that's not using IPSEC, i don't see any mechanism for the two DHCP servers to properly mutually authenticate. Is it really just by IP address? Could someone spoofing the IP address of one host corrupt the state of the other DHCP server? (i'm less concerned about them keeping network traffic private, since most of what they communicate is likely to go out in the clear on the wire anyway). Am i missing some clever authentication technique?
From a security point of view, i understand that there are more severe security problems with DHCP itself, of course (the protocol requires that the client trust the (unauthenticated) server), but that doesn't seem likes a good reason to introduce an opportunity to compromise any given server directly.
Your thoughts on DHCP redundancy?
Comments on this Entry
DHCP Failover solves this problem, you configure the same scope on both servers and they exchange lease information. The advantage is that this is transparent for the client, it can keep the same IP address. The disadvantage is the protocol and configuration itself. It depends on having the same time on both servers, so make sure NTP is working. The failover config needs to be exactly the same at both sides, or the association will fail. The protocol changes frequently, so both servers need to be on the same version. Most important, it is a horribly complex protocol to debug - especially if you don't deal with it every day. There are a lot of details you need to be aware of.
A rather different approach I would investigate is using drbd to share the leases file between the two machines and using heartbeat to create an active/passive setup. It is probably just as much work as setting up dhcp failover, but it is a lot easier to troubleshoot afterwards. Also, you can use a single configuration file, which means you don't have to configure all static leases twice. I never tried building such a setup myself, but I know some appliance vendors use a similar setup with great success.
Which setup is the best depends on your needs but personally i would prefer the third option because your DHCP will keep functioning just as it does now. DHCP failover is too complex and quite fragile.
I think your concerns about security are valid. The best setup would be to have the servers in a different network segment than your clients and having a firewall inbetween. That way the firewall can block all traffic to the failover ports and can also detect and drop spoofed packets. All you'll have to do for DHCP is configure a dhcp relay agent on that firewall to forward DCHP requests to the two servers. A simple firewall costs just a couple hundred dollar, or you could build your own of course.
If you want to secure DHCP further check out the layer2 security features that exist in many managed switches. They can filter out quite a lot of bad traffic (rogue dhcp servers, spoofing,..).
[ Parent | Reply to this comment ]
[ Send Message | View dkg's Scratchpad | View Weblogs ]
It seems like another way to resolve the security concerns between the two servers would be to leave them on the LAN as is, throw a spare NIC in each machine, connect via a crossover cable, and configure a physically isolated LAN segment just between the two. Then you could presumably force failover to listen only on that link (if dhcpd isn't capable of being configured that way, you could limit like this with iptables).
As for the LAN-wide layer2 filtering, unfortunately the physical cabling and budget constraints i'm dealing with don't allow me to have the sort of managed switches that we'd need to do that. It would be nice, but it's unrealistic at this point. i'd also be unhappy about having to rely on the non-free firmware most fancy switches use to do such finicky operations. I feel OK expecting black boxes to do store-and-forward of ethernet frames and to maintain per-port MAC tables, but i start feeling nervous when i find myself relying on code that i can't maintain/audit/repair myself for more-advanced functionality. Is anyone making managed switches that run a free software distribution at a reachable price range? or does so much of the switching get done in firmware now that this is not possible?
I've been meaning to experiment with drbd and heartbeat for a while now anyway, so maybe this is the opportunity to try that out. Thanks for the suggestion!
[ Parent | Reply to this comment ]
- Justin
[ Parent | Reply to this comment ]
Keepalived[1] is an open source vrrp (virtual router redundancy protocol)
daemon for linux that "just works TM". The configuration can go from very
simple to very complex.
For this case all you need to do is failover between a few hosts running dhcpd.
Here is the basic design using keepalived.
You have 2 nodes both with dhcpd and keepalived to service requests. We'll call
them dhcp1 and dhcp2. We'll have keepalived manage a virtual ip (vip) named dhcp.
On each node keepalived is tracking a dummy interface we'll name "dummy0". If
you down dummy0 on either node it will transition to a fault state. If you do
this in the master, the backup will soon transition to master. The new master
sends a gratitous arp to the switch to takeover the vip. The failover happens
in < 1 second generally because each heartbeat is sent. Also, you specify the
interface the heartbeat is sent out. If node <--> node communication is lost
over that interface (like a switch outage or something crazy) you will likely
see a transition change.
On each node you have a set of scripts under /etc/vrrp.d that are iterated over
during each state change with something like run-parts. They each take a single
argument of either master, backup, or fault and do the right thing. One of
those script touches /var/run/{MASTER,BACKUP,FAULT} depending on the state.
Setup ssh triggers[2] between the two with root keys so that that only lets you
copy the dhcpd.conf. Here is an example:
######## /root/.ssh/authorized_keys
command="rsync --server -vlHogDtprz --delete --delete-after --ignore-errors /etc/dhcpd.conf",no-port-forwarding,no-X11-forwarding,no-agent-fo rwarding,no-pty ssh-dss AAAAB3NzaC1kc3MAAACBAK/fPV9YvwOqOxNui+dZCoFMhH313D++2Mm1cR/flGCh0 TEe20Vdv8hnFU06khY9ndo9MtPe0LyDw65lLAv1yu4Gjr7vsl6Yt4OC6u1oq3TG/Y dFrMoTpy5yp4bQpfP9tueMj18kPq4RpHVJ2kqeZytwV4p3nnXzwRWs8BPWEuhbAAA AFQD95EKNg5d9sXKZkMPREJ+CTs8UPQAAAIBixMwLitdWz4viUFHkZ8DZIxSQwO5C 5jfTSYizDBCCTmmxJbb/RyKc3Veaar88H7+gBaECvuOI90kZWNzCrjCSpQgLg5AiX vphPbzZdKHmpjzn5pF4908YZESGY0iTpCOTfo3PisUBb9zNh7j/HDI7sS33g7ZwJW nTPOLJCYz3FwAAAIEAgcfMYrt61DUSf175em4WC8MTo002EHewPmfCcFeT8/n9A3h GhmFFY8L4tFXKMeeBASN+LhJJXKcXOiMG+kJkk6ysASrq7psFoJKEGPccaqgZywHD t3icgbJKQjJicujDPr6SdNTdg2m0t+os8YQMlOYRPlQC2gtkbZWU+iV/GQQ=
With that it is ok to have them passwordless because no matter what command
someone tries to run, rsync --server is ran instead.
So for the final part you can do this 2 different ways. Both of them are
assuming the commands / cron is how you would set it up on the master.
A) The sexy way with a bit more setup which is fully event driven
B) The less sexy but perfectly acceptable way
================ OPTION A ================
Setup iwatch[3] to watch /etc/dhcpd.conf with some inotify(7) goodness. In the
config file, tell it to run the rsync command whenever the file changes if and
only if it is master. You can have it log to syslog exactly what command it
runs and what is happening.
Have the command it runs be something along the lines of:
runifmaster rsync -aq -e'ssh -i ~/.ssh/id_dsa-dhcpd-sync-trigger.pub' /etc/dhcpd.conf root@dhcp2:/etc/dhcpd.conf
You might also have the ssh trigger command be a script that appends a date
hash to the existing dhcpd.conf or something and keeps the last 5.
================ OPTION B ================
Run a cronjob every x minutes that does:
test -f /var/run/MASTER && rsync -aq -e'ssh -i ~/.ssh/id_dsa-dhcpd-sync-trigger.pub' /etc/dhcpd.conf root@dhcp2:/etc/dhcpd.conf
I re-implemented the ideas from a previous employer in new keepalived scripts
and open sourced them. You can get it all from:
http://www.digitalprognosis.com/opensource/scripts/keepalived/
I use it for active failover support for these types of servers:
- dhcp
- dns
- openldap
- jabber
It is layer 3 failover in the current setup and does no sort of session
sharing. Keep this in mind if you use it in other things.
Since this blog doesn't allow poster info here is a shameless plug:
My name is Jeff Schroeder and my website is http://www.digitalprognosis.com
Get my email off that website if you have any specific questions. This is
surprisingly robust once you understand how it all works. Oh and I just did your job :)
[1] sudo apt-get install keepalived # http://www.keepalived.org/
[2] http://blog.ganneff.de/blog/2007/12/29/ssh-triggers.html
[3] sudo apt-get install iwatch # http://iwatch.sf.net
[ Parent | Reply to this comment ]
[ Send Message | View dkg's Scratchpad | View Weblogs ]
I don't see in your description above (or in your linked scripts) how the actual lease database itself gets propagated during/before a transition, though. It looks to me like the alternate dhcp server will indeed take over from the old master if the master goes away; but without the old master's lease information, i think it would be likely to try to re-assign addresses that have already been allocated, or to force clients with outstanding leases to re-IP, causing a disruption to any outstanding network connections.
Are you transferring the lease information (usually /var/lib/dhcp3/dhcpd.leases in debian) somehow and i've overlooked it? I'd be interested in seeing how you handle that part of the failover.
[ Parent | Reply to this comment ]
I'm definitely with the make the DHCP server redundant in some way, the DHCP failover is messy. For maintenance one could have two servers, and relatively long leases, thus few are likely to expire during upgrades. Or one of the more advanced strategies discussed above.
[ Parent | Reply to this comment ]
[ Send Message | View dkg's Scratchpad | View Weblogs ]
DHCP by default ping addresses before allocation - so it shouldn't reallocate addresses in use.Ah, good. But it still seems like it might force all client machines to switch IP addresses when their lease time is up, no? If the client asks for a lease renewal, and the new DHCPD doesn't have a record for that lease, wouldn't it NAK the renewal, and ultimately hand out a new address?
[ Parent | Reply to this comment ]
If the client asks for a lease renewal, and the new DHCPD doesn't have a record for that lease, wouldn't it NAK the renewal, and ultimately hand out a new address?
No, so long as the address requested by the client is within the scope defined on the server and not assigned to another machine, the dhcp server will assign the address that was requested by the client.
The problem with ping is that most windows machines won't respond to it. For other operating systems, it depends on the firewall config.
[ Parent | Reply to this comment ]
- Justin
[ Parent | Reply to this comment ]
In any case, it's not Debians fault...
[ Parent | Reply to this comment ]
You could also set the lease to 1 day and then run a cronjob every hour or something.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]