Automated distributed backups for laptops
Posted by mvanbaak on Tue 14 Feb 2006 at 09:19
This document will describe the setup I made for automating the backup tasks for all laptops here in the house. My servers use the same backup server and infrastructure, but right now they don't have the checks and scripts because they are online 24/7 and my backup server is triggering the backup process. This is however not true at all for the laptops.
Laptops can be at different places, powered down, suspended, put to sleep etc. So I needed a different approach for them.
Besides those things, all laptops get their IP address from DHCP, so triggering the backup from my server was not an option, since the clients IPs are unknown.
The idea I had was that each laptop should trigger the backup process. This has to be something that happens automagically, because if it's not, users forget to backup. (I know I forget).
What better place to handle this than the boot process?
The laptops should check if the backup server is there, and that it is actually our backup server and not a server with the same IP but on a different network.
The server should have some form of message system to tell the laptops backups cannot be made if there's something wrong. (Such as full disks, services not running, backup space not mounted etc)
- ssh client
If you happen to run Debian (which is what I use for every system except firewalls, which run OpenBSD), all the software can be installed with apt-get
Debian sarge has all the software available at no charge.
The clients will use the root account to login. To make this as secure as possible, you need to tell ssh root may only startup the "rdiff-backup --server" program.
This can be done in the .ssh/authorized_keys file. Simply prepend the ssh-[d|r]sa line for the clients with: command='rdiff-backup --server'.
As an example:
rdiff-backup --server ssh-rsa AAAAB3N..... username@hostname
Create a directory that will hold the backups, and create a dir for every system that should be backed up.
The following step is optional. I use it to tell my laptops if the server is in good shape and ready to receive the backups. You can skip this if you like.DataQ
DataQ is a small message and data queueing server written in Python, featuring a very simple text-based protocol which makes it very easy to implement clients for it.
It features FILO and FIFO queues and various queue restricitions. The basic idea behind DataQ is to make it easy to have multiple clients at various locations to report to a single target which, in turn, can be queried from a single or multiple sources.
For more information see its website
Installation is simple:
- Untar the download.
- Copy the configuration file config/dataq.xml.example to /etc/dataq.xml
- Copy src/dataq.py to /usr/local/sbin
- Modify /etc/dataq.xml ou should have the following line in it though:
- <queue overflow="pop" size="1" name="backup">
Now start DataQ:dataq -c /etc/dataq.xml
Once it has been started you can store a message in the queue. To allow backups:echo 'PUSH username:password@backup allowed' | nc server_ip 50000
To disallow backups:echo 'PUSH username:password@backup disallowed' | nc server_id 50000
My backup server runs several scripts every minute to check diskspace, mountpoints, connectivity, systemload and some more and updates the backup queue information as needed. Only if everything is in good shape backups are allowed.
The client needs some more work. Lucky you I did all the work for you ;)
I made a script for the clients that does some checking and the backups
It will prompt the user to press 'Q' to abort. If there is no input in 10 seconds, the backup will continue.
The script will then issue 1 ping to the IP address of the backup server. Ferry Boender told me this is sometimes needed for clients because the ARP table has no entry for the server if it has not connected it before. (Since this script is run at boot, and most probably the backup server is not your DHCP server or router, no ARP entry will be there.)
It will then check the ARP table if the hardware address (MAC) for the IP is what we expect. If the MAC address matches, it will consult DataQ if we are allowed to make backups. If the server agrees, rdiff-backup will start sending the changes.
If all changes are sent, the script will clean history older then 30 days.
Since the client triggers all this, you can even restore a machine that was dead for years.
I can post script snippets now, but you can get the whole example backup script here, so that should be easier.
If you also want the boot setup download the whole package here.. (Local copies of these files are also available.)
I know there is no documentation in there which is why I wrote this webarticle.
You should open the backup-to-server.sh script with your favourite editor and change the variables at the start of the file.
If you did setup DataQ set use_data='YES' (it's off by default).
One last tip: Backup the server's backup space to DVD or CD on regular intervals. The server runs on harddisks too, and they WILL fail on you sometime!
Credits and Thanks
- Ferry Boender for
- Tips on 'read' and 'ping'.
- Leonieke Aalders for
- Reading this article and giving me positive feedback.
- Nancy van Baak for
- Listening to my techy talk.
- waking me up on sunday morning so I could do this during the weekend.