Cloning a Debian Etch system for redundancy

Posted by dldirector on Mon 21 Jan 2008 at 11:07

I am responsible for a production web server that is very critical to our clients and the bread and butter of our company. We have collocated the server, for reliability of power, A/C and Internet connectivity as well as cost effective high bandwidth. Here, we describe how to maintain a redundant server with the configuration of an identical standby machine.

For real peace of mind, we lease two identical server boxes from our collocation provider and with a "private rack" option, the two machines can be configured with Internet addresses from the same subnet, so that one can easily take-over for the other. In addition, the two machines are connected via a private local network, handy for mirroring.

For this take-over to be useful, the standby machine needs to be a relatively current copy of the production machine. It turns out that this is a fairly simple 3 step process, but step 2 is not obvious. Here is the process that we have recently developed and tested.

Our configuration is: two identical machines, A, the production server, and B the standby server. Each machine has 2 identical 160 GB disks, no RAID. Both machines are running Debian Etch. Machine A Disk 1 is the live production server, where content is constantly updated. Machine B Disk 2 is another copy of Debian Etch which is usually running and considered in a maintenance mode. The default boot configuration for Machine B is to boot to Disk 2.

1. A shell script was written to use rsync to copy the root partition (in our case the only partition) of the production Machine A Disk 1 to Machine B Disk 1. This a crontab entry on Machine B, I think of it as a content pull and it runs twice daily. The local network is used for this update.

It is important to use the --hard-links and --one-file-system switches to rsync so that hard links are maintained and there is no confusion caused by /proc, and /dev. With the transition to "udev" on Debian systems like Etch, the /dev directory is now virtual and dynamic. What we want is a copy of the "static" /dev directory as it exists on the disk, not as it is seen in the running system. This can be solved by step 2.

2. On a live running system, there is a directory called /dev/.static/dev which appears to be the static /dev directory as it appears on the disk image. So all we have to do for this step is rsync from Machine A Disk 1 /dev/.static/dev to Machine B Disk 1 /dev ( perhaps at /mnt/other/dev ).

3. Finally a little housekeeping. I changed the file /boot/grub/menu.lst on Machine B Disk 2 ( NOTE: that is the maintenance mode system, not the cloned system ) to have a new entry labeled "standby" or something similar with the appropriate information to boot Disk 1. Implied of course is that the default grub configuration on this machine is boot Disk 2 which has the maintenance version of the OS.

In addition, before booting the "standby" version, I like to change the hostname in Machine B Disk 1 to keep the name that I use for Machine B, so that I don't get confused when rebuilding or repairing the downed production server. The shell prompt shows the hostname. The names the server responds to in Apache are domain names not the localhost name, so as a web server, things look the same. I also edit Machine B Disk 1 /etc/network/interfaces so that Machine B keeps the same local network address which I want to follow the hostname. The outside Internet IP address will remain cloned from Machine A.

If there is a failure of Machine A, I reboot Machine B selecting the grub entry for Disk 1, and Machine B takes over with current or nearly current content.

This could also be done for "hot standby" using Heartbeat and a remote monitoring computer. But for now, I will go with reboot required by a human.

Share/Save/Bookmark


Posted by Anonymous (67.152.xx.xx) on Mon 21 Jan 2008 at 16:38
Your Step 2:
>So all we have to do for this step is rsync from Machine A Disk 1
>/dev/.static/dev to Machine B Disk 1 /dev ( perhaps at /mnt/other/dev ).

could be a lot more clear. Does the -onefilesystem flag do this? Or how would I go about making the /dev/ and /proc/ directories function properly with rsyncing? Currently we just exclude /proc/ and /dev/ and /tmp/ but if there is a better way I wish I knew it...

[ Parent | Reply to this comment ]

Posted by dldirector (69.17.xx.xx) on Mon 21 Jan 2008 at 17:01
[ Send Message ]
Sorry for the lack of clarity. I have a clarification and a correction. In the article I said, that my system is on only one partition. In fact, there is a separate boot partition. Here is a sample shell script that can be used to copy the root partition, boot partition, and the /dev items.
SERVER=root@[production.server.com]

OPTS="  --recursive --times --perms --owner --group \
        --links --hard-links \
        --one-file-system --delete \
        --stats --rsh=/usr/bin/ssh"

# copy system image to partition mounted on /mnt/sysimage
FROMDIR=/
TODIR=/mnt/sysimage
rsync $OPTS $SERVER:$FROMDIR $TODIR

# copy boot image to partition mounted on /mnt/bootimage
FROMDIR=/boot/
TODIR=/mnt/bootimage
rsync $OPTS $SERVER:$FROMDIR $TODIR

# copy the disk image version of /dev
FROMDIR=/dev/.static/dev/
TODIR=/mnt/sysimage/dev
rsync $OPTS $SERVER:$FROMDIR $TODIR

Hope this helps. Thanks for the comment.

[ Parent | Reply to this comment ]

Posted by Anonymous (217.216.xx.xx) on Tue 22 Jan 2008 at 03:03
Hi, dldirector, I have a couple of questions:

- Why do you have just two partitions (/boot and /)? Is it for any particular reason?

- Why do you decide to share, for instance, /var/logs? or are they 'travelling' to another machine?

Thanks in advance.

[ Parent | Reply to this comment ]

Posted by dldirector (69.17.xx.xx) on Thu 24 Jan 2008 at 15:46
[ Send Message ]
I find simpler is better and if I don't have a good reason for wanting more that one or two partitions, I just use one (or two). If I am lazy, during a Debian install, I sometimes let it partition the disk and end up with several.

I share or copy /var/log and everything else for that matter, because in this case I am trying to make a backup server, with everything that existed on the original server.

[ Parent | Reply to this comment ]

Posted by Anonymous (118.90.xx.xx) on Mon 18 Aug 2008 at 13:55
From http://packages.debian.org/changelogs/pool/main/u/udev/udev_0.125 -5/changelog#versionversion0.124-1 :

udev (0.124-1) unstable; urgency=low
...
* Removed the /dev/.static/dev/ hack. It was cool, but its complexity
is not justified anymore. (Closes: #444337, #481559)
...

Thus you can skip the advice about rsyncing /dev/.static/dev

[ Parent | Reply to this comment ]

Posted by rak (164.73.xx.xx) on Mon 21 Jan 2008 at 21:23
[ Send Message | View Weblogs ]
Didn't you evaluate the use of heartbeat or something similar?
I'm about to begin something similar for my employers and would like to have some intel in the topic, since the procedure you describe looks simple in some way though my previous idea was more by using this software.

[ Parent | Reply to this comment ]

Posted by dldirector (69.17.xx.xx) on Mon 21 Jan 2008 at 22:07
[ Send Message ]
I have installed the Debian package heartbeat in order to get the program send_arp. Send_arp solves a problem that I have experienced before when having one machine take over an IP address from another. If this portable IP address is not the primary IP, it is not used on outgoing packets, so ARP tables don't get updated right away and packets can be lost as they continue to be routed to the network card that previously "owned" the IP address. Send_arp can be used to force immediate ARP updates. As far as using heartbeat, I want to maintain manual control of the switchover for now. As the system matures, and I become more comfortable that the new setup works as desired, I may indeed consider using heartbeat to automate the failover.

[ Parent | Reply to this comment ]

Posted by botox (91.17.xx.xx) on Tue 22 Jan 2008 at 07:25
[ Send Message ]
If you just want to send the arp packets you should give fake a chance.

http://packages.debian.org/etch/fake

[ Parent | Reply to this comment ]

Posted by rak (190.64.xx.xx) on Tue 22 Jan 2008 at 16:51
[ Send Message | View Weblogs ]
Other option is to give the same MAC to the other interface, in the backup host. Using ifconfig or setting in the /etc/network/interfaces file.
For example

# ifconfig eth0 down
# ifconfig eth0 hw ether 00:80:48:BA:d1:20
# ifconfig eth0 up
# ifconfig eth0 |grep HWaddr


Taken from http://linuxhelp.blogspot.com/2005/09/how-to-change-mac-address-o f-your.html

Cheers

[ Parent | Reply to this comment ]

Posted by botox (91.89.xx.xx) on Tue 22 Jan 2008 at 17:00
[ Send Message ]
Could be a problem if the switch is remembering which MAC was on a specific port. As far as I know the most reliable procedure for sensitive environments is gratuitous arp.

[ Parent | Reply to this comment ]

Posted by rak (200.40.xx.xx) on Tue 22 Jan 2008 at 18:05
[ Send Message | View Weblogs ]
Yup, didn't consider the switch memory, that could make some trouble.

[ Parent | Reply to this comment ]

Posted by stoffell (81.165.xx.xx) on Tue 22 Jan 2008 at 21:14
[ Send Message ]
Great you shared how you solved your issue on this!

Just want to add some extra stuff to consider.

Using drbd is also an option, can be combined with heartbeat, etc..

I have also used OpenVZ for something like you posted. (taking a nightly dump from a virtual machine, and copy the dumped file to a different server where it can be restored or just archived)

---
stoffell

[ Parent | Reply to this comment ]

Posted by yarikoptic (69.125.xx.xx) on Fri 25 Jan 2008 at 03:27
[ Send Message ]
Well, it is sweet but seems to be overly simplistic for a stable use. Rsyncing on strict cron without consideration at current system IO load in some production server which has running DB and/or tons of opened files, ie which is a moving target all the time, sounds like a search for trouble to me. The least I would do (I've not done it yet so even this plan is lacking all the details) is to
  • setup filesystem under LVM,
  • make cron job wait a bit if IO decreases to some prespecified level within reasonable amount of time
  • cause flush of DBs and filesystem,
  • create a snapshot of filesystem and use that one for running rsync on.

[ Parent | Reply to this comment ]

Posted by Anonymous (86.7.xx.xx) on Sun 17 Feb 2008 at 00:42
This confuses me. I cannot understand why you include /proc and /dev. As these are maintained by the running kernel - how can rsyncing them from one machine to another be a good thing?
In my understanding of Linux, if the two machines use the same hardware, those directories will automomatically be very similar on both machines - in fact they would be identical except for things that should be different - eg files that contain packet counts, mac addresses etc.

[ Parent | Reply to this comment ]

Posted by dldirector (64.81.xx.xx) on Sun 17 Feb 2008 at 13:50
[ Send Message ]
I think you misunderstood. /proc and /dev look like mounted file systems. In rsync the --one-file-system switch means DON'T copy other file systems, i.e., don't copy /dev or /proc. In fact, figuring out the correct way to handle /dev was the biggest part of this puzzle, and finding the solution was what motivated me to document it in an article. Hope this helps.

[ Parent | Reply to this comment ]


User Login

Username:

Password:

[ Advanced Login ]

Register Account

Quick Site Search