Posted by dldirector on Mon 21 Jan 2008 at 11:07
I am responsible for a production web server that is very critical to our clients and the bread and butter of our company. We have collocated the server, for reliability of power, A/C and Internet connectivity as well as cost effective high bandwidth. Here, we describe how to maintain a redundant server with the configuration of an identical standby machine.For real peace of mind, we lease two identical server boxes from our collocation provider and with a "private rack" option, the two machines can be configured with Internet addresses from the same subnet, so that one can easily take-over for the other. In addition, the two machines are connected via a private local network, handy for mirroring.
For this take-over to be useful, the standby machine needs to be a relatively current copy of the production machine. It turns out that this is a fairly simple 3 step process, but step 2 is not obvious. Here is the process that we have recently developed and tested.
Our configuration is: two identical machines, A, the production server, and B the standby server. Each machine has 2 identical 160 GB disks, no RAID. Both machines are running Debian Etch. Machine A Disk 1 is the live production server, where content is constantly updated. Machine B Disk 2 is another copy of Debian Etch which is usually running and considered in a maintenance mode. The default boot configuration for Machine B is to boot to Disk 2.
1. A shell script was written to use rsync to copy the root partition (in our case the only partition) of the production Machine A Disk 1 to Machine B Disk 1. This a crontab entry on Machine B, I think of it as a content pull and it runs twice daily. The local network is used for this update.
It is important to use the --hard-links and --one-file-system switches to rsync so that hard links are maintained and there is no confusion caused by /proc, and /dev. With the transition to "udev" on Debian systems like Etch, the /dev directory is now virtual and dynamic. What we want is a copy of the "static" /dev directory as it exists on the disk, not as it is seen in the running system. This can be solved by step 2.
2. On a live running system, there is a directory called /dev/.static/dev which appears to be the static /dev directory as it appears on the disk image. So all we have to do for this step is rsync from Machine A Disk 1 /dev/.static/dev to Machine B Disk 1 /dev ( perhaps at /mnt/other/dev ).
3. Finally a little housekeeping. I changed the file /boot/grub/menu.lst on Machine B Disk 2 ( NOTE: that is the maintenance mode system, not the cloned system ) to have a new entry labeled "standby" or something similar with the appropriate information to boot Disk 1. Implied of course is that the default grub configuration on this machine is boot Disk 2 which has the maintenance version of the OS.
In addition, before booting the "standby" version, I like to change the hostname in Machine B Disk 1 to keep the name that I use for Machine B, so that I don't get confused when rebuilding or repairing the downed production server. The shell prompt shows the hostname. The names the server responds to in Apache are domain names not the localhost name, so as a web server, things look the same. I also edit Machine B Disk 1 /etc/network/interfaces so that Machine B keeps the same local network address which I want to follow the hostname. The outside Internet IP address will remain cloned from Machine A.
If there is a failure of Machine A, I reboot Machine B selecting the grub entry for Disk 1, and Machine B takes over with current or nearly current content.
This could also be done for "hot standby" using Heartbeat and a remote monitoring computer. But for now, I will go with reboot required by a human.
This article can be found online at the Debian Administration website at the following bookmarkable URL (along with associated comments):
This article is copyright 2008 dldirector - please ask for permission to republish or translate.