Automated distributed backups for laptops

Posted by mvanbaak on Tue 14 Feb 2006 at 09:19

Tags: ,

This document will describe the setup I made for automating the backup tasks for all laptops here in the house. My servers use the same backup server and infrastructure, but right now they don't have the checks and scripts because they are online 24/7 and my backup server is triggering the backup process. This is however not true at all for the laptops.

Laptops can be at different places, powered down, suspended, put to sleep etc. So I needed a different approach for them.

Besides those things, all laptops get their IP address from DHCP, so triggering the backup from my server was not an option, since the clients IPs are unknown.

The idea I had was that each laptop should trigger the backup process. This has to be something that happens automagically, because if it's not, users forget to backup. (I know I forget).

What better place to handle this than the boot process?

The laptops should check if the backup server is there, and that it is actually our backup server and not a server with the same IP but on a different network.

The server should have some form of message system to tell the laptops backups cannot be made if there's something wrong. (Such as full disks, services not running, backup space not mounted etc)

Required Software

Server

Clients:

  • rdiff-backup
  • ssh client
  • netcat
  • ping
  • arp
  • bash

If you happen to run Debian (which is what I use for every system except firewalls, which run OpenBSD), all the software can be installed with apt-get

Debian sarge has all the software available at no charge.

Server Setup

The clients will use the root account to login. To make this as secure as possible, you need to tell ssh root may only startup the "rdiff-backup --server" program.

This can be done in the .ssh/authorized_keys file. Simply prepend the ssh-[d|r]sa line for the clients with: command='rdiff-backup --server'.

As an example:

rdiff-backup --server ssh-rsa AAAAB3N..... username@hostname

Create a directory that will hold the backups, and create a dir for every system that should be backed up.

The following step is optional. I use it to tell my laptops if the server is in good shape and ready to receive the backups. You can skip this if you like.

DataQ

DataQ is a small message and data queueing server written in Python, featuring a very simple text-based protocol which makes it very easy to implement clients for it.

It features FILO and FIFO queues and various queue restricitions. The basic idea behind DataQ is to make it easy to have multiple clients at various locations to report to a single target which, in turn, can be queried from a single or multiple sources.

For more information see its website

Installation is simple:

  • Untar the download.
  • Copy the configuration file config/dataq.xml.example to /etc/dataq.xml
  • Copy src/dataq.py to /usr/local/sbin
  • Modify /etc/dataq.xml ou should have the following line in it though:
    • <queue overflow="pop" size="1" name="backup">

Now start DataQ:

dataq -c /etc/dataq.xml

Once it has been started you can store a message in the queue. To allow backups:


echo 'PUSH username:password@backup allowed' | nc server_ip 50000

To disallow backups:

echo 'PUSH username:password@backup disallowed' | nc server_id 50000

My backup server runs several scripts every minute to check diskspace, mountpoints, connectivity, systemload and some more and updates the backup queue information as needed. Only if everything is in good shape backups are allowed.

Client Setup

The client needs some more work. Lucky you I did all the work for you ;)

I made a script for the clients that does some checking and the backups

It will prompt the user to press 'Q' to abort. If there is no input in 10 seconds, the backup will continue.

The script will then issue 1 ping to the IP address of the backup server. Ferry Boender told me this is sometimes needed for clients because the ARP table has no entry for the server if it has not connected it before. (Since this script is run at boot, and most probably the backup server is not your DHCP server or router, no ARP entry will be there.)

It will then check the ARP table if the hardware address (MAC) for the IP is what we expect. If the MAC address matches, it will consult DataQ if we are allowed to make backups. If the server agrees, rdiff-backup will start sending the changes.

If all changes are sent, the script will clean history older then 30 days.

Since the client triggers all this, you can even restore a machine that was dead for years.

I can post script snippets now, but you can get the whole example backup script here, so that should be easier.
If you also want the boot setup download the whole package here.. (Local copies of these files are also available.)

I know there is no documentation in there which is why I wrote this webarticle.

You should open the backup-to-server.sh script with your favourite editor and change the variables at the start of the file.

If you did setup DataQ set use_data='YES' (it's off by default).

One last tip: Backup the server's backup space to DVD or CD on regular intervals. The server runs on harddisks too, and they WILL fail on you sometime!

Credits and Thanks

Ferry Boender for
  • DataQ.
  • Tips on 'read' and 'ping'.
Leonieke Aalders for
  • Reading this article and giving me positive feedback.
Nancy van Baak for
  • Listening to my techy talk.
  • waking me up on sunday morning so I could do this during the weekend.

 

 


Posted by spiney (85.124.xx.xx) on Tue 14 Feb 2006 at 10:20
[ Send Message ]
Nice article, the only thing I don't like about the setup is the rather big list of requirements on the clients.

Personally, I really like BackupPC for it's non-intrusiveness, OS-independence and disk-space saving pooling feature, and since it 'pings' computers before trying to do a backup it's working very well for laptops as well. (and no, I'm not in any way connected to that project, just a very happy user)
--
Debian GNU/Linux on an IBM Thinkpad T43p

[ Parent | Reply to this comment ]

Posted by mvanbaak (80.126.xx.xx) on Tue 14 Feb 2006 at 17:51
[ Send Message ]
The requirements list looks rather big.
But a default install of debian will have:
ssh client, ping, arp, bash

So that leaves rdiff-backup and netcat.
Netcat is only needed when you are going to use DataQ.

This way I can bring it down to 1.

BackupPC looks nice. Thnx.

[ Parent | Reply to this comment ]

Posted by Anonymous (203.20.xx.xx) on Tue 14 Feb 2006 at 23:57
I use backuppc on SEVERAL networks of between 2 and 50 PC's

it just WORKS when configured correctly.

[ Parent | Reply to this comment ]

Posted by Anonymous (199.223.xx.xx) on Wed 15 Feb 2006 at 10:37
I coulsn't agree more. I've been using BackupPC for years at work and home. It's a great package! Everyone, please check it out.

[ Parent | Reply to this comment ]

Posted by niol (143.196.xx.xx) on Tue 14 Feb 2006 at 10:32
[ Send Message | View Weblogs ]

Nice setup, but I have two comments to add.

First, I wouldn't use the boot process to trigger the backup but anacron which is about like cron but handles the fact that the machine may not be on at all times of the day : anacron runs its jobs if they have not been run for more than a day.

Second, instead of searching for a MAC/IP pair to know where to backup to, I would do some kind of ssh ping (like ssh server hostname and check that it would return the hostname of the server) that would garantee the server's identity and prevent me from changing all the laptop scripts when I change the server's NIC.

I wouldn't use a root login on the server either. I just don't see the need for this.

Anyway, the use of DataQ is interesting. Thanks for the article!

[ Parent | Reply to this comment ]

Posted by mvanbaak (213.154.xx.xx) on Tue 14 Feb 2006 at 12:22
[ Send Message ]
You could use anacron indeed, but it will kill the ability to abort the backup process.

The "ssh ping" is not a bad idea, but still it got me several times when someone had the same idea about naming their network. That's why I implemented the MAC/IP thing.

The root account is needed if you care about file permissions and acl settings.
Only the root user will be allowed to set those on the backup server.
Using the options one can set in the authorized_keys file it can be locked down to only run rdiff-backup.

[ Parent | Reply to this comment ]

Posted by isilmendil (193.171.xx.xx) on Tue 14 Feb 2006 at 11:22
[ Send Message ]
Maybe it's just me, but I think the example for the .ssh/authorized_keys file could make it a little bit clearer, that the command needs to be inside a 'command="blah"' option:
command="rdiff-backup --server" ssh-rsa AAAAB3N..... username@hostname
instead of
rdiff-backup --server ssh-rsa AAAAB3N..... username@hostname
Also, one could enable the no-port-forwarding, no-X11-forwarding, and no-agent-forwarding options, just to be save :)

[ Parent | Reply to this comment ]

Posted by oxtan (195.86.xx.xx) on Tue 14 Feb 2006 at 12:35
[ Send Message | View Weblogs ]
what you need is bacula (www.bacula.org)

[ Parent | Reply to this comment ]

Posted by rjc (87.74.xx.xx) on Tue 14 Feb 2006 at 20:18
[ Send Message ]
"My servers use the same backup server and infrastructure, ..."

You ment "My laptops", right?

rjc

[ Parent | Reply to this comment ]

Posted by mvanbaak (80.126.xx.xx) on Tue 14 Feb 2006 at 20:23
[ Send Message ]
No, I meant my internal servers here at home.
Monitoring server, asterisk voip server, firewall machine, and mythtv server.
But they are not using the "check ip of server and dataq" script parts.
There is an sh script running on my file/backup server triggering the rdiff-backups on them, cause they are always on and always on the same ip address.

[ Parent | Reply to this comment ]

Sign In

Username:

Password:

[Register|Advanced]

 

Flattr

 

Current Poll

Which init system are you using in Debian?






( 1036 votes ~ 6 comments )