Automated distributed backups for laptops
Posted by mvanbaak on Tue 14 Feb 2006 at 09:19
This document will describe the setup I made for automating the backup tasks for all laptops here in the house. My servers use the same backup server and infrastructure, but right now they don't have the checks and scripts because they are online 24/7 and my backup server is triggering the backup process. This is however not true at all for the laptops.
Laptops can be at different places, powered down, suspended, put to sleep etc. So I needed a different approach for them.
Besides those things, all laptops get their IP address from DHCP, so triggering the backup from my server was not an option, since the clients IPs are unknown.
The idea I had was that each laptop should trigger the backup process. This has to be something that happens automagically, because if it's not, users forget to backup. (I know I forget).
What better place to handle this than the boot process?
The laptops should check if the backup server is there, and that it is actually our backup server and not a server with the same IP but on a different network.
The server should have some form of message system to tell the laptops backups cannot be made if there's something wrong. (Such as full disks, services not running, backup space not mounted etc)
Required Software
Server
Clients:
- rdiff-backup
- ssh client
- netcat
- ping
- arp
- bash
If you happen to run Debian (which is what I use for every system except firewalls, which run OpenBSD), all the software can be installed with apt-get
Debian sarge has all the software available at no charge.
Server Setup
The clients will use the root account to login. To make this as secure as possible, you need to tell ssh root may only startup the "rdiff-backup --server" program.
This can be done in the .ssh/authorized_keys file. Simply prepend the ssh-[d|r]sa line for the clients with: command='rdiff-backup --server'.
As an example:
rdiff-backup --server ssh-rsa AAAAB3N..... username@hostname
Create a directory that will hold the backups, and create a dir for every system that should be backed up.
The following step is optional. I use it to tell my laptops if the server is in good shape and ready to receive the backups. You can skip this if you like.
DataQDataQ is a small message and data queueing server written in Python, featuring a very simple text-based protocol which makes it very easy to implement clients for it.
It features FILO and FIFO queues and various queue restricitions. The basic idea behind DataQ is to make it easy to have multiple clients at various locations to report to a single target which, in turn, can be queried from a single or multiple sources.
For more information see its website
Installation is simple:
- Untar the download.
- Copy the configuration file config/dataq.xml.example to /etc/dataq.xml
- Copy src/dataq.py to /usr/local/sbin
- Modify /etc/dataq.xml ou should have the following line in it though:
- <queue overflow="pop" size="1" name="backup">
Now start DataQ:
dataq -c /etc/dataq.xmlOnce it has been started you can store a message in the queue. To allow backups:
echo 'PUSH username:password@backup allowed' | nc server_ip 50000To disallow backups:
echo 'PUSH username:password@backup disallowed' | nc server_id 50000My backup server runs several scripts every minute to check diskspace, mountpoints, connectivity, systemload and some more and updates the backup queue information as needed. Only if everything is in good shape backups are allowed.
Client Setup
The client needs some more work. Lucky you I did all the work for you ;)
I made a script for the clients that does some checking and the backups
It will prompt the user to press 'Q' to abort. If there is no input in 10 seconds, the backup will continue.
The script will then issue 1 ping to the IP address of the backup server. Ferry Boender told me this is sometimes needed for clients because the ARP table has no entry for the server if it has not connected it before. (Since this script is run at boot, and most probably the backup server is not your DHCP server or router, no ARP entry will be there.)
It will then check the ARP table if the hardware address (MAC) for the IP is what we expect. If the MAC address matches, it will consult DataQ if we are allowed to make backups. If the server agrees, rdiff-backup will start sending the changes.
If all changes are sent, the script will clean history older then 30 days.
Since the client triggers all this, you can even restore a machine that was dead for years.
I can post script snippets now, but you can get the whole example backup script here, so that should be easier.
If you also want the boot setup download the whole package here.. (Local copies of these files are also available.)
I know there is no documentation in there which is why I wrote this webarticle.
You should open the backup-to-server.sh script with your favourite editor and change the variables at the start of the file.
If you did setup DataQ set use_data='YES' (it's off by default).
One last tip: Backup the server's backup space to DVD or CD on regular intervals. The server runs on harddisks too, and they WILL fail on you sometime!
Credits and Thanks
- Ferry Boender for
- DataQ.
- Tips on 'read' and 'ping'.
- Leonieke Aalders for
- Reading this article and giving me positive feedback.
- Nancy van Baak for
-
- Listening to my techy talk.
- waking me up on sunday morning so I could do this during the weekend.
Personally, I really like BackupPC for it's non-intrusiveness, OS-independence and disk-space saving pooling feature, and since it 'pings' computers before trying to do a backup it's working very well for laptops as well. (and no, I'm not in any way connected to that project, just a very happy user)
--
Debian GNU/Linux on an IBM Thinkpad T43p
[ Parent | Reply to this comment ]
But a default install of debian will have:
ssh client, ping, arp, bash
So that leaves rdiff-backup and netcat.
Netcat is only needed when you are going to use DataQ.
This way I can bring it down to 1.
BackupPC looks nice. Thnx.
[ Parent | Reply to this comment ]
it just WORKS when configured correctly.
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
Nice setup, but I have two comments to add.
First, I wouldn't use the boot process to trigger the backup but anacron which is about like cron but handles the fact that the machine may not be on at all times of the day : anacron runs its jobs if they have not been run for more than a day.
Second, instead of searching for a MAC/IP pair to know where to backup to, I would do some kind of ssh ping (like ssh server hostname and check that it would return the hostname of the server) that would garantee the server's identity and prevent me from changing all the laptop scripts when I change the server's NIC.
I wouldn't use a root login on the server either. I just don't see the need for this.
Anyway, the use of DataQ is interesting. Thanks for the article!
[ Parent | Reply to this comment ]
The "ssh ping" is not a bad idea, but still it got me several times when someone had the same idea about naming their network. That's why I implemented the MAC/IP thing.
The root account is needed if you care about file permissions and acl settings.
Only the root user will be allowed to set those on the backup server.
Using the options one can set in the authorized_keys file it can be locked down to only run rdiff-backup.
[ Parent | Reply to this comment ]
command="rdiff-backup --server" ssh-rsa AAAAB3N..... username@hostnameinstead of
rdiff-backup --server ssh-rsa AAAAB3N..... username@hostnameAlso, one could enable the no-port-forwarding, no-X11-forwarding, and no-agent-forwarding options, just to be save :)
[ Parent | Reply to this comment ]
[ Parent | Reply to this comment ]
You ment "My laptops", right?
rjc
[ Parent | Reply to this comment ]
Monitoring server, asterisk voip server, firewall machine, and mythtv server.
But they are not using the "check ip of server and dataq" script parts.
There is an sh script running on my file/backup server triggering the rdiff-backups on them, cause they are always on and always on the same ip address.
[ Parent | Reply to this comment ]