Cloning a Debian Etch system for redundancy
Posted by dldirector on Mon 21 Jan 2008 at 11:07
For real peace of mind, we lease two identical server boxes from our collocation provider and with a "private rack" option, the two machines can be configured with Internet addresses from the same subnet, so that one can easily take-over for the other. In addition, the two machines are connected via a private local network, handy for mirroring.
For this take-over to be useful, the standby machine needs to be a relatively current copy of the production machine. It turns out that this is a fairly simple 3 step process, but step 2 is not obvious. Here is the process that we have recently developed and tested.
Our configuration is: two identical machines, A, the production server, and B the standby server. Each machine has 2 identical 160 GB disks, no RAID. Both machines are running Debian Etch. Machine A Disk 1 is the live production server, where content is constantly updated. Machine B Disk 2 is another copy of Debian Etch which is usually running and considered in a maintenance mode. The default boot configuration for Machine B is to boot to Disk 2.
1. A shell script was written to use rsync to copy the root partition (in our case the only partition) of the production Machine A Disk 1 to Machine B Disk 1. This a crontab entry on Machine B, I think of it as a content pull and it runs twice daily. The local network is used for this update.
It is important to use the --hard-links and --one-file-system switches to rsync so that hard links are maintained and there is no confusion caused by /proc, and /dev. With the transition to "udev" on Debian systems like Etch, the /dev directory is now virtual and dynamic. What we want is a copy of the "static" /dev directory as it exists on the disk, not as it is seen in the running system. This can be solved by step 2.
2. On a live running system, there is a directory called /dev/.static/dev which appears to be the static /dev directory as it appears on the disk image. So all we have to do for this step is rsync from Machine A Disk 1 /dev/.static/dev to Machine B Disk 1 /dev ( perhaps at /mnt/other/dev ).
3. Finally a little housekeeping. I changed the file /boot/grub/menu.lst on Machine B Disk 2 ( NOTE: that is the maintenance mode system, not the cloned system ) to have a new entry labeled "standby" or something similar with the appropriate information to boot Disk 1. Implied of course is that the default grub configuration on this machine is boot Disk 2 which has the maintenance version of the OS.
In addition, before booting the "standby" version, I like to change the hostname in Machine B Disk 1 to keep the name that I use for Machine B, so that I don't get confused when rebuilding or repairing the downed production server. The shell prompt shows the hostname. The names the server responds to in Apache are domain names not the localhost name, so as a web server, things look the same. I also edit Machine B Disk 1 /etc/network/interfaces so that Machine B keeps the same local network address which I want to follow the hostname. The outside Internet IP address will remain cloned from Machine A.
If there is a failure of Machine A, I reboot Machine B selecting the grub entry for Disk 1, and Machine B takes over with current or nearly current content.
This could also be done for "hot standby" using Heartbeat and a remote monitoring computer. But for now, I will go with reboot required by a human.
>So all we have to do for this step is rsync from Machine A Disk 1
>/dev/.static/dev to Machine B Disk 1 /dev ( perhaps at /mnt/other/dev ).
could be a lot more clear. Does the -onefilesystem flag do this? Or how would I go about making the /dev/ and /proc/ directories function properly with rsyncing? Currently we just exclude /proc/ and /dev/ and /tmp/ but if there is a better way I wish I knew it...
[ Parent | Reply to this comment ]
SERVER=root@[production.server.com]
OPTS=" --recursive --times --perms --owner --group \
--links --hard-links \
--one-file-system --delete \
--stats --rsh=/usr/bin/ssh"
# copy system image to partition mounted on /mnt/sysimage
FROMDIR=/
TODIR=/mnt/sysimage
rsync $OPTS $SERVER:$FROMDIR $TODIR
# copy boot image to partition mounted on /mnt/bootimage
FROMDIR=/boot/
TODIR=/mnt/bootimage
rsync $OPTS $SERVER:$FROMDIR $TODIR
# copy the disk image version of /dev
FROMDIR=/dev/.static/dev/
TODIR=/mnt/sysimage/dev
rsync $OPTS $SERVER:$FROMDIR $TODIR
Hope this helps. Thanks for the comment.
[ Parent | Reply to this comment ]
- Why do you have just two partitions (/boot and /)? Is it for any particular reason?
- Why do you decide to share, for instance, /var/logs? or are they 'travelling' to another machine?
Thanks in advance.
[ Parent | Reply to this comment ]
I share or copy /var/log and everything else for that matter, because in this case I am trying to make a backup server, with everything that existed on the original server.
[ Parent | Reply to this comment ]
I'm about to begin something similar for my employers and would like to have some intel in the topic, since the procedure you describe looks simple in some way though my previous idea was more by using this software.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
http://packages.debian.org/etch/fake
[ Parent | Reply to this comment ]
For example
# ifconfig eth0 down
# ifconfig eth0 hw ether 00:80:48:BA:d1:20
# ifconfig eth0 up
# ifconfig eth0 |grep HWaddr
Taken from http://linuxhelp.blogspot.com/2005/09/how-to-change-mac-address-o f-your.html
Cheers
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
Just want to add some extra stuff to consider.
Using drbd is also an option, can be combined with heartbeat, etc..
I have also used OpenVZ for something like you posted. (taking a nightly dump from a virtual machine, and copy the dumped file to a different server where it can be restored or just archived)
---
stoffell
[ Parent | Reply to this comment ]
- setup filesystem under LVM,
- make cron job wait a bit if IO decreases to some prespecified level within reasonable amount of time
- cause flush of DBs and filesystem,
- create a snapshot of filesystem and use that one for running rsync on.
[ Parent | Reply to this comment ]
In my understanding of Linux, if the two machines use the same hardware, those directories will automomatically be very similar on both machines - in fact they would be identical except for things that should be different - eg files that contain packet counts, mac addresses etc.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]