Unattended, Encrypted, Incremental Network Backups: Part 1

Posted by Kellen on Wed 10 Aug 2005 at 11:46

This article describes a complete system for creating a centralised backup system, complete with strong encryption. Incremental backups are used to minimize the bandwidth, and time, used.

1. Introduction

This method for backups has progressed out of my desire to create a centralized backup system for users who do not necessarily trust me and whom I (as a sysadmin) may not completely trust. This system is just as easily used between a group of peers, each backing up to each other, but requires each peer to do some systems maintenance which can be problematic in the real world.

In this part, we'll cover the client needs, software and configuration. In part 2, we'll cover the server software and configuration.

For those too impatient to read the entire document, there is a quick summary at the end.

This document is available in docbook format at http://projects.cretin.net/backup/duplicity-backups-part1.xml

2. Backup Client Requirements

The backup client wants to be able to backup a local machine over the network to a potentially untrusted server (both in terms of its sysadmins and its users).

Since the backups are over the network, communications with the backup server should be encrypted, also since network transmission is expensive it is desirable to send as little data across the wire as possible and thus incremental backups should be used. Since the backup server is untrusted, all backups should be encrypted to prevent against spying by sysadmins and users alike. Finally, backups should be able to be scripted and should be able to occur unattended.

  • Backups should occur over a network
  • Communications with the backup server should be encrypted
  • Data should be backed up incrementally
  • Backed up data should be encrypted
  • Backups should be unattended

3. Backup Client Software

Duplicity is a backup utility which provides incremental encrypted network backups using the rsync algorithm, scp/ftp/rsync as a transfer mechanism, and support for gpg.

Incremental backups can be achieved in several ways, the most straightforward is to use find to locate files changed since the last backup. Just looking for updated files has the disadvantage of retransmitting the entire (possibly large) file when there is only a slight change.

Another approach is to use rsync or another utility which uses the rsync (or similar) algorithm. Since rsync only sends the differences in files, this will greatly reduce the size of the incremental backup files; this cuts down on disk and network usage for both the client and server. Duplicity uses rsync.

We'll use duplicity with gpg and scp. Duplicity natively supports four of our five requirements:

  • Backups should occur over a network (via scp)
  • Communications with the backup server should be encrypted (also via scp
  • Data should be backed up incrementally (via the rsync algorithm)
  • Backed up data should be encrypted (via gpg)

Figure 1. Overview of a duplicity backup

Overview of a duplicity backup

To get duplicity, run:

~# apt-get install duplicity

The OpenSSH package (ssh) provides scp and the GNU privacy guard package (gnupg) provides gpg. These are installed by default on most Debian systems, but if you need them, run:

~# apt-get install ssh gnupg

Note

Duplicity is not under active development. The most recent version at the time of writing is 0.4.1, released August 9, 2003. The author of duplicity states that "duplicity is not stable yet," but he has turned his efforts to the similar, but unencrypted, rdiff-backup. I have personally contacted the Debian package maintainer for duplicity, who said that he and others were using it in a production environment.

4. Encryption Matters

One of the advantages of duplicity is that it provides for integrated encryption of backup files. We'll need two gpg keys for our backups; an "encryption key" and a "signature key." Our encryption key is used to protect the data in the backup files from snooping on the backup server, while the signature key is used to ensure the integrity of the backup files.

Duplicity's --encrypt-key option allows a user to specify either a symmetric or public key with which to encrypt the backup archives. Duplicity's --sign-key option specifies either a symmetric or public key with which to sign the backup archives. If encryption is turned on, and the --sign-key option is omitted, the --encrypt-key key is also used to sign the archives.

Note that the private key for the signature key (or the encryption key if --sign-key is omitted) must be available to duplicity when it runs. Duplicity also requires the passphrase for the signing key be either entered manually or stored in an environment variable. If our encryption key and signature key are the same, then a compromise of the server means a compromise of the backed up data as well. We'll therefore use separate encryption and signature keys.

4.1. Keys Need Homes

Since the encryption key we're about to generate is going to be used to protect important data (passwords, email, documents) the private key should itself be well protected. Ideally, the keys should be stored on a secure drive that you keep on your person at all times, or failing that, a laptop that is well protected and not usually connected to a network.

4.2. Generating GPG Keys

First, we'll generate the signature key, owned by root, with the passphrase "signtest".

~# gpg --gen-key
gpg (GnuPG) 1.2.5; Copyright (C) 2004 Free Software Foundation, Inc.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions. See the file COPYING for details.
Please select what kind of key you want:
   (1) DSA and ElGamal (default)
   (2) DSA (sign only)
   (4) RSA (sign only)
Your selection? 1
DSA keypair will have 1024 bits.
About to generate a new ELG-E keypair.
              minimum keysize is  768 bits
              default keysize is 1024 bits
    highest suggested keysize is 2048 bits
What keysize do you want? (1024) 1024
Requested keysize is 1024 bits       
Please specify how long the key should be valid.
         0 = key does not expire
      <n>  = key expires in n days
      <n>w = key expires in n weeks
      <n>m = key expires in n months
      <n>y = key expires in n years
Key is valid for? (0) 1y
Key expires at Tue Mar  7 16:45:18 2006 PST
Is this correct (y/n)? y
                        
You need a User-ID to identify your key; the software constructs the user id
from Real Name, Comment and Email Address in this form:
    "Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>"
Real name: signtest
Email address: signtest@example.com
Comment: signtest                 
You selected this USER-ID:
    "signtest (signtest) <signtest@example.com>"
Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O
You need a Passphrase to protect your secret key.    
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
++++++++++++++++++++++++++++++.+++++++++++++++++++++++++++++++++++++++++++++++++
+.++++++++++..+++++.+++++.++++++++++.++++++++++++++++++++>++++++++++.......>..++
+++..+++++
+++++++++++++++++++++++++++++++++++.++++++++++.+++++.+++++++++++d+++++++++++++++
+++++++++++++++++++.+++++++++++++++.++++++++++++++++++++f.....>+++++.d....f.....
..>+++++.....+++++^^^
public and secret key created and signed.
key marked as ultimately trusted.
pub  1024D/B036117C 2005-03-08 signtest (signtest) <signtest@example.com>
     Key fingerprint = F278 70A7 656A 7692 4453  6F3D 7A5C 98A1 B036 117C
sub  1024g/5D2059A1 2005-03-08 [expires: 2006-03-08]

For convenience, we'll create a local user to own the encryption key. Do not do this for a production environment.

~# adduser
Enter a username to add: backuptest
Adding user `backuptest'...
Adding new group `backuptest' (1001).
Adding new user `backuptest' (1001) with group `backuptest'.
Creating home directory `/home/backuptest'.
Copying files from `/etc/skel'
Enter new UNIX password: 
Retype new UNIX password: 
passwd: password updated successfully
Changing the user information for backuptest
Enter the new value, or press ENTER for the default
        Full Name []: 
        Room Number []: 
        Work Phone []: 
        Home Phone []: 
        Other []: 
Is the information correct? [y/N] y

Next, we'll generate the encryption key, owned by our backuptest user, with the passphrase "backuptest".

~# su - backuptest
backuptest@debian:~$ gpg --gen-key
gpg (GnuPG) 1.2.5; Copyright (C) 2004 Free Software Foundation, Inc.
This program comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions. See the file COPYING for details.
gpg: /home/backuptest/.gnupg: directory created
gpg: new configuration file `/home/backuptest/.gnupg/gpg.conf' created
gpg: WARNING: options in `/home/backuptest/.gnupg/gpg.conf' are not yet active d
uring this run
gpg: keyring `/home/backuptest/.gnupg/secring.gpg' created
gpg: keyring `/home/backuptest/.gnupg/pubring.gpg' created
Please select what kind of key you want:
   (1) DSA and ElGamal (default)
   (2) DSA (sign only)
   (4) RSA (sign only)
Your selection? 1
DSA keypair will have 1024 bits.
About to generate a new ELG-E keypair.
              minimum keysize is  768 bits
              default keysize is 1024 bits
    highest suggested keysize is 2048 bits
What keysize do you want? (1024) 1024
Requested keysize is 1024 bits       
Please specify how long the key should be valid.
         0 = key does not expire
      <n>  = key expires in n days
      <n>w = key expires in n weeks
      <n>m = key expires in n months
      <n>y = key expires in n years
Key is valid for? (0) 1y
Key expires at Tue Mar  7 16:46:18 2006 PST
Is this correct (y/n)? y
                        
You need a User-ID to identify your key; the software constructs the user id
from Real Name, Comment and Email Address in this form:
    "Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>"
Real name: backuptest
Email address: backuptest@example.com
Comment: backuptest                 
You selected this USER-ID:
    "backuptest (backuptest) <backuptest@example.com>"
Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O
You need a Passphrase to protect your secret key.    
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
+++++.+++++++++++++++++++++++++++++++++++++++++++++.+++++..++++++++++++++++++++.
++++++++++++++++++++.+++++.++++++++++++++++++++.++++++++++>+++++.+++++>+++++....
.................................................+++++
+++++++++++++++.+++++.+++++.+++++++++++++++++++++++++.....++++++++++...+++++++++
++++++.+++++++++++++++.++++++++++++++++++++++++++++++.++++++++++>.+++++.........
......+++++^^^^^^^^^^^^^^^^^
gpg: /home/backuptest/.gnupg/trustdb.gpg: trustdb created
public and secret key created and signed.
key marked as ultimately trusted.
pub  1024D/AFC6DCD1 2005-03-08 backuptest (backuptest) <backuptest@example.com>
     Key fingerprint = D025 7E7A 1BF2 5CA9 82DD  EC54 2555 D7B6 AFC6 DCD1
sub  1024g/FEB03CBA 2005-03-08 [expires: 2006-03-08]

We'll export the public key for backuptest to a file:

backuptest:~$ gpg --armor --export backuptest > key.out

Alternatively, we could mail this key to a user on another domain:

backuptest:~$ gpg --armor --export backuptest | mail root@example.com

As root, we import the key sent to us by backuptest:

~# gpg --import /home/backuptest/key.out 
gpg: key AFC6DCD1: public key "backuptest (backuptest) <backuptest@example.com>" 
imported
gpg: Total number processed: 1
gpg:               imported: 1

And list our available keys:

~# gpg --list-keys
/root/.gnupg/pubring.gpg
------------------------
pub  1024D/B036117C 2005-03-08 signtest (signtest) <signtest@example.com>
sub  1024g/5D2059A1 2005-03-08 [expires: 2006-03-08]
pub  1024D/AFC6DCD1 2005-03-08 backuptest (backuptest) <backuptest@example.com>
sub  1024g/FEB03CBA 2005-03-08 [expires: 2006-03-08]

Finally, we must sign the key sent to us by backuptest (after checking the fingerprint, if backuptest is another human being):

~# gpg --sign-key backuptest
pub  1024D/AFC6DCD1  created: 2005-03-08 expires: 2006-03-08 trust: -/-
sub  1024g/FEB03CBA  created: 2005-03-08 expires: 2006-03-08
(1). backuptest (backuptest) <backuptest@example.com>
pub  1024D/AFC6DCD1  created: 2005-03-08 expires: 2006-03-08 trust: -/-
 Primary key fingerprint: D025 7E7A 1BF2 5CA9 82DD  EC54 2555 D7B6 AFC6 DCD1
     backuptest (backuptest) <backuptest@example.com>
This key is due to expire on 2006-03-08.
Do you want your signature to expire at the same time? (Y/n) y
How carefully have you verified the key you are about to sign actually belongs
to the person named above?  If you don't know what to answer, enter "0".
   (0) I will not answer. (default)
   (1) I have not checked at all.
   (2) I have done casual checking.
   (3) I have done very careful checking.
Your selection? (enter '?' for more information): 3
Are you really sure that you want to sign this key
with your key: "signtest (signtest) <signtest@example.com>" (B036117C)
I have checked this key very carefully.
Really sign? yes
                
You need a passphrase to unlock the secret key for
user: "signtest (signtest) <signtest@example.com>"
1024-bit DSA key, ID B036117C, created 2005-03-08

5. A Network Home

Our backup system needs a place on the network to which to backup. The configuration for the backup server will be covered more in part 2, but here is the minimal server configuration needed (which could be used by a lazy and trusting friend).

First create a user account for the backup client:

remote:~# adduser
Enter a username to add: abackupuser
Adding user `abackupuser'...
Adding new group `abackupuser' (1001).
Adding new user `abackupuser' (1001) with group `abackupuser'.
Creating home directory `/home/abackupuser'.
Copying files from `/etc/skel'
Enter new UNIX password: 
Retype new UNIX password: 
passwd: password updated successfully
Changing the user information for abackupuser
Enter the new value, or press ENTER for the default
        Full Name []: 
        Room Number []: 
        Work Phone []: 
        Home Phone []: 
        Other []: 
Is the information correct? [y/N] y

Next, create a location for the backup files to live:

remote:~# su - abackupuser
abackupuser@remote:~$ mkdir backup

6. Backup Script

Now that our prerequisites are in place, we need a backup script.

Figure 2. The backup script

#!/bin/bash
export PASSPHRASE=signtest
duplicity --encrypt-key "AFC6DCD1" --sign-key "B036117C" \
--exclude /proc --exclude /mnt --exclude /tmp \
/ scp://abackupuser@remote/backup

The duplicity command in the script breaks down as follows:

export PASSPHRASE=signtest

Set the PASSPHRASE environment variable to the passphrase for the signature key.

--encrypt-key "AFC6DCD1"

Set the id of the encryption key to which backup archives should be encrypted.

-sign-key "B036117C"

Set the id of the signature key with which backup archives should be signed.

--exclude /proc --exclude /mnt --exclude /tmp

Don't backup the virtual /proc filesystem, any temporarily mounted drives, or temporary files.

/

Backup starting at the root directory. Basically, back up everything. You could use the path of any directory you'd like to back up here.

scp://abackupuser@remote/backup

The location to which backup files should be sent, as follows:

scp

Specify that duplicity should use scp to transfer files.

abackupuser

The username with which to connect to the remote system.

remote

The remote system to which to send backup files.

/backup

The relative path (from the user's home) to which to save backup files.

Note that the ids for --encrypt-key and --sign-key can be obtained from gpg --list-keys (see Generating Keys).

Also note that the user@host/relativepath format is different than the normal scp format of user@host:/path.

6.1. Running the Backup Script

Since this is the first time we've backed anything up, this will be a full backup; duplicity automatically detects this.

~# ./backup.sh 
Password: 
No signatures found, switching to full backup.
Password: 
tmpkHY9NY                                                               100% 1509KB   1.5MB/s   00:00    
Password: 
tmpfcOHU-                                                               100%  528     0.5KB/s   00:00    
Password: 
tmpKSC17j                                                               100%   78KB  78.3KB/s   00:00    
Password: 
--------------[ Backup Statistics ]--------------
StartTime 1110490849.31 (Thu Mar 10 13:40:49 2005)
EndTime 1110490851.11 (Thu Mar 10 13:40:51 2005)
ElapsedTime 1.80 (1.80 seconds)
SourceFiles 90
SourceFileSize 3364455 (3.21 MB)
NewFiles 90
NewFileSize 3364455 (3.21 MB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 90
RawDeltaSize 3360224 (3.20 MB)
TotalDestinationSizeChange 1546182 (1.47 MB)
Errors 0
-------------------------------------------------

This is backing up a very small directory, for example purposes. The user is prompted for the password for abackuptest@remote five separate times! This is unacceptable for an unattended backup, so we'll have to resolve this later.

We can see the files generated on remote:

abackupuser@remote:~/backup$ ls -1
duplicity-full-signatures.2005-03-10T13:40:45-07:00.sigtar.gpg
duplicity-full.2005-03-10T13:40:45-07:00.manifest.gpg
duplicity-full.2005-03-10T13:40:45-07:00.vol1.difftar.gpg

A duplicity backup has 3 types of files:

  • "difftar" files; tar and gz compressed files which contain the actual backed up data.

  • a "sigtar" file; a tar and gz compressed file containing the signatures for each file backed up.

  • a "manifest" file which essentially contains a listing of the starting and ending files for each difftar archive, plus SHA-1 hashes of each difftar archive.

With full backups, there will be a good number of difftar files.

So, we've got a working remote backup, but we'll have to fix it so that the user doesn't have to enter a password each time to upload a file.

7. Making it Unattended

To make our backup work unattended (i.e. via cron), we'll need to set up a method to automatically log in to our remote host. The standard method to do this is to use ssh keys. There is an excellent discussion of key management on the IBM website.

7.1. Generating a SSH Key

For now, we'll create a key normal with a passphrase. This kind of setup will still require us to enter a passphrase (for the key, not the ssh connection), but it'll simplify our next steps towards an unattended backup.

As root on our backup client, run ssh-keygen to create our ssh key:

~# ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/root/.ssh/id_dsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
65:73:77:bd:83:5c:4a:6c:17:88:e0:43:1b:0c:ab:d4 root@local

Your key fingerprint will be different.

Next we need to send our public key to our backup server:

~# scp ~/.ssh/id_dsa.pub abackupuser@remote:/home/abackupuser

On remote we add this key to the authorized keys of abackupuser:

abackupuser@remote:~$ cat id_dsa.pub >> ~/.ssh/authorized_keys

Now if we try to log in as abackupuser@remote, we get a prompt for our passphrase instead of for our password:

~# ssh abackupuser@192.168.0.17
Enter passphrase for key '/root/.ssh/id_dsa':

This is fine, but we still have a passphrase that will have to be entered at backup time.

7.2. SSH Key Caching

In order to script our backups while using a ssh key with a passphrase, we need an application which will retain use of the key over a long term. We'll use keychain, a tool which itself uses ssh-agent, which does the actual caching.

Install keychain:

~# apt-get install keychain

To set up local root's bash profile to run keychain on login, add these lines to ~/.bash_profile:

keychain --clear id_dsa
. ~/.keychain/$HOSTNAME-sh

This will have keychain first clear the existing keys (in the case of a normal root compromise, the attacker can't access the remote systems), then attempt to load the id_dsa key and finally source the appropriate output from ssh-agent.

The next time root logs in, she will be prompted to enter the passphrase for the ssh key, and any subsequent process running as root will not need to use a password to log into remote, as when backup.sh is run:

user@local:~$ su -
Password:
KeyChain 2.5.1; http://www.gentoo.org/proj/en/keychain/
Copyright 2002-2004 Gentoo Foundation; Distributed under the GPL
 * Found existing ssh-agent (11726)
 * ssh-agent: All identities removed.
 * Adding 1 ssh key(s)...
Enter passphrase for /root/.ssh/id_dsa:
Identity added: /root/.ssh/id_dsa (/root/.ssh/id_dsa)
~# ./backup.sh
No signatures found, switching to full backup.
tmpyGNA5U                                     100% 5110KB   2.5MB/s   00:02
tmp5MPT6X                                     100% 5123KB   1.3MB/s   00:04
tmpQli3JE                                     100% 5121KB   5.0MB/s   00:01
tmp2T9fis                                     100% 5120KB   5.0MB/s   00:01
tmpiYM1J7                                     100% 5126KB   2.5MB/s   00:02
tmpMiQ4Lh                                     100% 1330KB   1.3MB/s   00:00
tmpGsz2q4                                     100%  879     0.9KB/s   00:00
tmp_VvUet                                     100%  129KB 129.5KB/s   00:00
--------------[ Backup Statistics ]--------------
StartTime 1116749899.09 (Sun May 22 01:18:19 2005)
EndTime 1116749925.78 (Sun May 22 01:18:45 2005)
ElapsedTime 26.68 (26.68 seconds)
SourceFiles 6
SourceFileSize 27933523 (26.6 MB)
NewFiles 6
NewFileSize 27933523 (26.6 MB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 6
RawDeltaSize 10670280 (10.2 MB)
TotalDestinationSizeChange 27576857 (26.3 MB)
Errors 0
-------------------------------------------------

The backup script would now be able to be run from cron since there is no requirement for passwords to be entered during the backup process.

Please note that using keychain is not a silver bullet to prevent an attacker from using your ssh keys. It may be possible for a determined attacker to compromise keys from system memory; security thus must also be added on the backup server side to prevent tampering, especially to the authorized_keys file.

7.3. Running the Backup via cron

Our client setup is now ready for the backup script to be run via cron. If our script is in /root/backup.sh, and we wanted to run the backup every night at 10:00pm, we'd edit the crontab:

~# crontab -e

and add:

0 22 * * * /root/backup.sh

8. Verify Script

The first thing to do after running a backup is of course to verify that it worked and for that we need a verify script.

Figure 3. The verify script

#!/bin/bash
export PASSPHRASE=signtest
duplicity --encrypt-key "AFC6DCD1" --sign-key "B036117C" \
--verbosity 4 --verify \
--exclude /proc --exclude /mnt --exclude /tmp \
scp://abackupuser@remote/backup /

The verify script is very similar to the backup script, but we've reversed the order of the source and the destination (the / and scp:// sections), and added two options:

--verbosity 4

When verbosity for duplicity is 4 or higher, we'll see a message for each file that's changed since the backup.

--verify

Verifies the backup rather than restoring or backing up.

If this verification script is run immediately after a full backup, very few files will have changed, perhaps only a few log files.

~# ./verify.sh
Error initializing file /dev/log
Error initializing file /dev/printer
Difference found: File dev/ptmx has mtime Sat Mar  5 20:09:11 2005, expected Sat Mar  5 19:40:43 2005
Difference found: File dev/pts/0 has mtime Sat Mar  5 20:09:18 2005, expected Sat Mar  5 19:41:01 2005
Difference found: File var/log/auth.log has mtime Sat Mar  5 20:09:01 2005, expected Sat Mar  5 19:39:01 2005
Difference found: File var/log/exim/mainlog has mtime Sat Mar  5 20:08:01 2005, expected Sat Mar  5 19:38:01 2005
Difference found: File var/log/messages has mtime Sat Mar  5 20:10:58 2005, expected Sat Mar  5 19:39:52 2005
Difference found: File var/log/syslog has mtime Sat Mar  5 20:09:01 2005, expected Sat Mar  5 19:39:52 2005
Verify complete: 31097 files compared, 6 differences found.

9. Restore Script

Restoring traditional incremental backups can be painful if one has to apply each incremental backup by hand. Duplicity automatically applies the incrementals to the full backup without any extra intervention.

If we are on a system and we have a disk we would like to restore to mounted on /mnt, a restore script for a full backup might look like this:

Figure 4. The restore script

#!/bin/bash
export PASSPHRASE=signtest
duplicity --encrypt-key "AFC6DCD1" --sign-key "B036117C" \
scp://abackupuser@remote/backup /mnt/restore
cd /mnt
mv restore/* .
rmdir restore
mkdir mnt
mkdir proc
mkdir tmp

Duplicity will not allow restoration to an existing directory (e.g. /mnt), so pointing it to /mnt/restore makes us do a little extra work in moving the restored data around, but it's relatively simple. We also must recreate our excluded system directories.

9.1. Running the Restore Script

Next, we run the restore script, which gives no feedback at the default verbosity.

~# mount /dev/hdc1 /mnt
~# ./restore.sh

We can check to see what the restore produced:

~# ls -al /mnt/restore
total 92
drwxr-xr-x  18 root root   4096 Mar  5 20:24 .
drwxr-xr-x   3 root root   4096 Mar  5 20:15 ..
drwxr-xr-x   2 root root   4096 Feb 20 07:42 bin
drwxr-xr-x   3 root root   4096 Feb 20 07:46 boot
drwxr-xr-x   2 root root   4096 Jan 17 07:30 cdrom
drwxr-xr-x  11 root root  24576 Mar  5 05:50 dev
drwxr-xr-x  48 root root   4096 Mar  5 19:39 etc
drwxr-xr-x   2 root root   4096 Jan 17 07:30 floppy
drwxrwsr-x   3 root staff  4096 Jan 17 07:41 home
drwxr-xr-x   2 root root   4096 Jan 17 07:30 initrd
lrwxrwxrwx   1 root root     28 Mar  5 20:16 initrd.img -> boot/initrd.img-2.4.27-1-386
lrwxrwxrwx   1 root root     27 Mar  5 20:16 initrd.img.old -> /boot/initrd.img-2.4.18-386
drwxr-xr-x   8 root root   4096 Feb 20 07:42 lib
drwx------   2 root root   4096 Jan 17 07:25 lost+found
drwxr-xr-x   2 root root   4096 Jan 17 07:30 opt
drwxr-xr-x   3 root root   4096 Jan 17 15:10 root
drwxr-xr-x   2 root root   4096 Feb 20 07:42 sbin
drwxr-xr-x   2 root root   4096 Dec 26 18:40 sys
drwxr-xr-x  11 root root   4096 Jan 17 09:25 usr
drwxr-xr-x  14 root root   4096 Feb 20 17:48 var
lrwxrwxrwx   1 root root     25 Mar  5 20:19 vmlinuz -> boot/vmlinuz-2.4.27-1-386
lrwxrwxrwx   1 root root     23 Mar  5 20:19 vmlinuz.old -> boot/vmlinuz-2.4.18-386

As a final step to a restore, we also have to make the system bootable. Assuming we're using grub, and assuming we've restored to hdc1:

~# /sbin/grub-install /dev/hdc1

Next, test our new system.

~# shutdown -h now

Once the machine is off, switch out hda for hdc (put the disk we've just restored onto ide0), and turn the power back on. If all went well, you should have a fully working system from your backup.

10. Summary

This is a quick summary for the forgetful or the impatient.

  • "client" is the backup client, i.e. the computer whose data we are backing up.
  • "server" is the backup server, i.e. the computer to which we are backing up.
  • "laptop" is the client administrator's secured location for their GPG keys.
  1. on client, install duplicity: apt-get install duplicity

  2. on client, generate signature key: gpg --gen-key

  3. on laptop, generate encryption key: gpg --gen-key

  4. on laptop, send public encryption key to root on client

  5. on client, import encryption key: gpg --import-key backuptest.asc

  6. on client, sign encryption public key: gpg --sign-key backuptest

  7. on client, generate ssh key: ssh-keygen -t dsa

  8. on client, send public ssh key to backup user on server (or to server administrator)

  9. on server, add ssh key to authorized_keys: cat id_dsa.pub >> ~/.ssh/authorized_keys

  10. on client, install keychain: apt-get install keychain

  11. on client, add to .bash_profile:

    keychain --clear id_dsa
    . ~/.keychain/$HOSTNAME-sh
    
  12. on client, set up cron to regularly run the backup script: crontab -e and add:

    0 22 * * * /root/backup.sh
    

11. Conclusion

We've created a nice system for clients; unattended backups that occur over the network, provide incremental granularity with only differential changes sent, and are encrypted to prevent against snooping.

There are still some areas of concern, specifically that if a client machine is compromised an attacker could gain access to the backup server and destroy existing backups, or that another backup server user might destroy existing backups. The defenses against these potential problems are server-side, so we'll take a look at these next time.

12. Next Time

In Part 2, we'll discuss the issues in running a multiuser backup system for untrusted users, the software we can use to protect our server from our users, our users from each other, and our users' backups against successful client machine compromises.

 

 


Posted by Anonymous (203.122.xx.xx) on Wed 10 Aug 2005 at 12:50
Excellent article. I have been planning on doing an automated incremental encrypted backup for a while. It is good to see somebody shed some light on how they do it.



One common thing to watch out for during back up is copying the database across cleanly. There's an article on O'Reilly about a way to do it without downtime for mysql. (essentially: turning off replication on a database box that is replicating the database from the main box, backing up the no-longer-replicating database from that box onto a backup (third) box, and then restarting the database replication again).



PJ

[ Parent | Reply to this comment ]

Posted by Anonymous (80.177.xx.xx) on Wed 10 Aug 2005 at 13:40
"Restoring traditional incremental backups can be painful if one has to apply each incremental backup by hand. Duplicity automatically applies the incrementals to the full backup without any extra intervention."

Restoring incrementals can be painful, but useful if you want to restore back to a specific date, rather than the latest backup. Backups are about recovering from user screwups which may go unnoticed for days, as much as recovering from a fried hard drive. So I'd be interested to know: do you just end up with a single snapshot style backup, or can you restore to any one of the incrementals?

--
mark

[ Parent | Reply to this comment ]

Posted by Kellen (132.239.xx.xx) on Wed 10 Aug 2005 at 18:08
[ View Weblogs ]
You end up with sets of date-specific duplicity archives. You can restore to a particular date by invoking --restore-time.

[ Parent | Reply to this comment ]

Posted by Anonymous (66.179.xx.xx) on Wed 10 Aug 2005 at 14:37
Does duplicity come standard on Knoppix? or some other LiveCD? I think that'd be the best way to restore: boot up a LiveCD, plug in the USB stick with the keys on it, format the new drive, and do the restore.

[ Parent | Reply to this comment ]

Posted by Kellen (132.239.xx.xx) on Wed 10 Aug 2005 at 18:24
[ View Weblogs ]
No, it's not in Knoppix, nor do I know of any other liveCD it is on, but it's rather small and you could fit it on your USB stick.

[ Parent | Reply to this comment ]

Posted by Piem (81.178.xx.xx) on Thu 11 Aug 2005 at 03:52
[ View Piem's Scratchpad ]
Thanks for the nice article.

There is something i don't get though: you make sure to explain the difference between signing and encryption keys, but it seems the encryption private key is required to increment, verify and restore backups. According to this article, it is safe to store the encryption private key on the client, since however has access to it would have access to the files we want to backup.

But above it reads:
4. on laptop, send public encryption key to root on client

Have I missed something?

[ Parent | Reply to this comment ]

Posted by Kellen (132.239.xx.xx) on Tue 16 Aug 2005 at 18:53
[ View Weblogs ]
You are completely correct and I must sheepishly admit that I did not test a verify or more than a single backup with both keys (the hazard of writing a forward-looking doc). In fact, duplicity will not even work in this scenario for an unattended incremental backup (due to a bug) since it expects a single passphrase to be passed via the PASSPHRASE environment variable but must use two distinct passphrases to do the backup. Thus, as you suggest, you must use a single key and this single key and its passphrase must be available on the client at backup-time. I will be making changes to the doc to make sure this is all corrected. sigh =(

You also are correct that it is not that much of a security problem. My intention in using different keys was to avoid a situation where you could have data that is only kept on the server for a small amount of time, then deleted, from being compromised via backups. For most people this isn't such a big deal.

[ Parent | Reply to this comment ]

Posted by Kellen (132.239.xx.xx) on Tue 16 Aug 2005 at 18:57
[ View Weblogs ]
Also, if it wasn't also clear: you want the secret key on your laptop so that if the client HDD crashes, you still have a copy of the key you need to decrypt the backed up data.

[ Parent | Reply to this comment ]

Posted by pgquiles (81.202.xx.xx) on Thu 11 Aug 2005 at 20:57
Why don't simply use BackupPC? It's easy to setup & manage, actively developed and is able to backup Windows machines (even Mac OS X machines if you enable Samba).

[ Parent | Reply to this comment ]

Posted by Anonymous (203.122.xx.xx) on Fri 12 Aug 2005 at 05:52
One of the requirements was that the backup be encrypted too (not just transmitted securely).

I think this is an important feature missing in most backup solutions. It could no doubt be patched into backuppc and similar pretty easily, but it is not there yet AFAIK.

The store-encrypted feature means that you can have the data backed up in many separate places with relatively low security, because seized data cannot be read by someone evil.
If you are worried about black helicopters and such, this is a good solution.

PJ

[ Parent | Reply to this comment ]

Posted by Arthur (194.109.xx.xx) on Fri 12 Aug 2005 at 02:54
[ View Weblogs ]
hdup is pretty slick, too.

[ Parent | Reply to this comment ]

Posted by rajiv (66.92.xx.xx) on Fri 12 Aug 2005 at 22:13
take a look at http://www.fluffy.co.uk/boxbackup/ and also the comparison between boxbackup and duplicity: http://www.fluffy.co.uk/boxbackup/comparison.html.

[ Parent | Reply to this comment ]

Posted by Anonymous (203.122.xx.xx) on Sat 13 Aug 2005 at 05:57
I was unaware of boxbackup.

Yup, boxbackup seems to have all the features in place already. The only thing I am unsure about is the licence - it seems to be a BSD old-style licence, where the credit notice is mandatory (not discretionary) during use.

I am unsure of its GPL-compatibility, which is important for many debian users.


PJ

[ Parent | Reply to this comment ]

Posted by Anonymous (151.202.xx.xx) on Sat 13 Aug 2005 at 15:13
A couple useful links, these folks have been thinking on similar lines: https://wiki.boum.org/TechStdOut/EncryptedBackupsForParanoiacs

Also, a really easy backup solution that you can use for this setup is backupninja (packaged in debian), it has duplicity capability and is really easy to use, the home page is here

[ Parent | Reply to this comment ]

Posted by Eirik (129.177.xx.xx) on Tue 16 Aug 2005 at 16:20
I'm at a loss to explain why so many of the various programs uses asymetric cryptography, rather than symetric keys. Just about all these systems stores the passphrase along with the secret keys used for signing; The odds that someone is able to steal only the secret key is therfore rather slim. Trust in the archive signatures aren't increased significantly by using asymetric keys.

Using asymetric keys for encrypting the data *may* make sense; the actual encryption key is only available at the time of encryption (random symetric session key, subsequently encrypted with asymetric cipher). However, as stated earlier, if the source system is compromised, data is already available to the attacker. The only gain here is that if the system (say a laptop) is stolen, access to previous backups isn't also given to the attacker (but a better protection would probably be to increase physical security of the system, as well as encrypting the filesystem).

When one keeps in mind that the security of asymetric ciphers is significantly less certain than that of symetric cipher, as well as the fact that symestric ciphers are signficantly faster; it would appear to make more sense to chose a strong, long easily remembered passphrase, and use that for encryption.

While the weak "two-factor" security inherent in the other secret-key;secret-passphrase system disappears, the backups would be signficantly easier to recover -- loss of the secret key no longer means unreadable backups -- only remembering the passphrase is needed. And a passphrase is very easy to backup in hard-copy and put in a safety deposit box, etc -- this is possible with a gpg-key, but not very practical (even if you have a scanner and ocr software you'd be forced to upload the secret key to the system with the scanner; making sure it's not written to disk would be a pain).

It would appear a lot of people fail to realize that the only problem asymetric ciphers solve is that of key distribution; in this case that isn't a problem (only the owner needs to be able to decrypt the files).

But the article is interesting non the less.

[ Parent | Reply to this comment ]

Posted by Anonymous (193.28.xx.xx) on Mon 29 Nov 2010 at 11:58
I second your opinion about the unnecessary and even harmful use of asymmetric crypto for backup storage. ALL of the current asymmetric ciphers are vulnerable to quantum cryptoanalysis, i.e. all your data from encrypted backups could be easily recovered by interested third-party with quantum computer.

Appropriate quantum computers are still not available, but they can appear in the nearest future. And if you store some long-term sensitive data in your backups then sooner or later it will be deciphered by the interested party who once had access to it.

Traditional symmetric ciphers are much more robust against quantum cryptoanalysis, so you can feel safe for your backups even when quantum computers become available. Symmetric crypto gives you a larger security margin.

--
Alexey

[ Parent | Reply to this comment ]

Posted by Anonymous (62.245.xx.xx) on Mon 15 Aug 2005 at 13:08
I preffer Dar archiver instead of find, tar and gzip. Try it: http://dar.sourceforge.net/

Georgik

[ Parent | Reply to this comment ]

Posted by Anonymous (66.159.xx.xx) on Mon 15 Aug 2005 at 19:39
Encrypted backups are dangerous and should be used sparingly. The problem with compressed or encrypted files is that a single bit error can corrupt all the data. By contrast a single bit error in a tar or cpio archive will typically corrupt a single file, and can be corrected. Encrypted backups also depend entirely on the decryption key. Not only must the key be secure from theft, but the key must also be secure from deletion, fire, and similar disasters.

Encrypted backups could be usefull if the target system is untrusted. However in that case there is always the risk that the backups could be deleted rendering the backups useless.

Making data backups is a precaution against disaster. Encrypting backups increases the risk of a major disaster. A failed harddrive switches from a minor inconvinience to a major disaster when the decryption key can't be found or is corrupt. Using encryption should be considered carefully. In most cases it will make the backups less secure due to loss of availability.

[ Parent | Reply to this comment ]

Posted by Anonymous (139.11.xx.xx) on Thu 8 Jun 2006 at 10:10
IMHO this is not such a big problem.

Nowadays, "single bit errors" don't happen on harddisks. If your harddisk breaks, everything will be unreadable. If you have a RAID system that uses SMART capable harddisks, a 16-bit CRC is applied on the wire and the chance of a bit error slipping by is almost zero - at least compared to the chance of a harddisk breaking, SMART failing, and the RAID system disabling the disk before corrupted data gets into userspace.

Non-encrypted backups require that the backup server is at least(!) as secure as the main server. It must not be at the same location or the redundability would not be great enough, but it must be cared for just as much, meaning duplicate work (two servers, two datacenters, twice the amount of maintenance and monthly datacenter payments).

Encrypted backups can be stored in multiple locations where the only requirement is that the data can be stored reliably. You do not have to care about the server yourself, it can be done by somebody whom you trust to maintain the machine, but would not otherwise trust your data. This saves a whole lot of work if you consider "backup-sharing" configurations where e.g. three admins each provide backup space for the other two.

Of course, this is not ideal for high profile professional systems where a minute of downtime translates into a thousand (or a hundred thousand) Euros. However, in these scenarios people will employ "hot spares" everywhere anyway and have enough redundant systems available so that one more backup server to maintain will not make much of a difference.

[ Parent | Reply to this comment ]

Posted by Anonymous (82.231.xx.xx) on Tue 10 Apr 2012 at 10:22
When a corruption occurs in a DAR (Disk Archive) archive, only the corresponding file is lost, even if the archive is compressed and/or encrypted.

[ Parent | Reply to this comment ]

Posted by Anonymous (83.208.xx.xx) on Mon 15 Aug 2005 at 23:14
I use this script for backup with duplicity and it works very good for me.
#!/bin/bash

export PASSPHRASE="secretpass"

USER="backup_server1"
GLOB="--no-print-statistics --full"
PROT="scp"
TIME="--remove-older-than 1M"
HOME="/home/backup"

mysqldump --password=secretpass db_name1 > ${HOME}/tmp/mysql/db_name1/db_name1-mysql.sql
mysqldump --password=secretpass db_name2 > ${HOME}/tmp/mysql/db_name1/db_name2-mysql.sql

for SERV in "server1" "server2" "server3"
do
        echo "### Backup to '${SERV}'..."

        echo "*** cgi:"
        duplicity ${GLOB} /var/www/localhost/cgi-bin ${PROT}://${USER}@${SERV}/data/cgi

        echo "*** email:"
        duplicity ${GLOB} /var/vpopmail/domains ${PROT}://${USER}@${SERV}/data/email

        echo "*** etc:"
        duplicity ${GLOB} /etc ${PROT}://${USER}@${SERV}/data/etc

        echo "*** home:"
        duplicity ${GLOB} --exclude ${HOME} --exclude /home/p2p --exclude /home/ftp /home ${PROT}://${USER}@${SERV}/data/home

        echo "*** htdocs:"
        duplicity ${GLOB} /var/www/localhost/htdocs ${PROT}://${USER}@${SERV}/data/htdocs

        echo "*** mysql:"
        duplicity ${GLOB} ${HOME}/tmp/mysql ${PROT}://${USER}@${SERV}/data/mysql

        echo "*** root:"
        duplicity ${GLOB} --include /root/bin --exclude '**' /root ${PROT}://${USER}@${SERV}/data/root


        echo "### Clean backup on '${SERV}'..."

        for ADDR in "cgi" "email" "etc" "home" "htdocs" "mysql" "root"
        do
                echo "*** ${ADDR}:"
                duplicity ${TIME} ${PROT}://${USER}@${SERV}/data/${ADDR}
        done
done

rm -f ${HOME}/tmp/mysql/db_name1/*
rm -f ${HOME}/tmp/mysql/db_name2/*

unset $PASSPHRASE

[ Parent | Reply to this comment ]

Posted by Anonymous (72.92.xx.xx) on Wed 23 Nov 2011 at 03:56
When you generate the sign and encryption keys above, what user are you generating them for?

[ Parent | Reply to this comment ]

Posted by Anonymous (24.16.xx.xx) on Tue 16 Aug 2005 at 06:22
Thanks for the great article - but ...

My big question is - what restore choices do you get with this system? I took positive notice of the earlier comment noting that point-in-time restores are possible. But if you're backing up an entire filesystem, will you have the ability to restore a single file - or must you restore the entire filesystem, then retreive your single file?

[ Parent | Reply to this comment ]

Posted by Anonymous (83.99.xx.xx) on Sun 30 Oct 2005 at 21:31
Have you tried to restore a whole system?

What's happening with the hard links?
If they are not preserved, the result will use more disk space than the original, and probably havfe some other issues too.

George

[ Parent | Reply to this comment ]

Posted by Anonymous (62.165.xx.xx) on Wed 2 Nov 2005 at 09:15
From the man page of duplicity:
"Currently duplicity supports deleted files, full unix permissions, directories, symbolic links, fifos, etc., but not hard links."

btw: what about part2 ?

Thanks for this article ...

[ Parent | Reply to this comment ]

Posted by Anonymous (64.108.xx.xx) on Sun 6 Nov 2005 at 23:27
Bravo! One of the best tutorials I've seen in a long time.

But duplicity hasn't released a stable version and seems to have stalled since August 2003.

I'd love to see a similarly well-written tutorial for secure remote backups using one of these:

1. dervish - selected by OSL in October 2005 to backup mozilla, linux kernel, etc.

2. rlbackup - used by phy.bnl.gov on Debian Sarge but there's no debian package yet! See http://www.phy.bnl.gov/computing/rlbackup/

3. rsnapshot - sadly, maintainer looking for someone to take over as of Oct 2005

4. dar - popular for backing up to cd or dvd so a tutorial using remote incremental backups would be great. There are wrapper projects such as kdar that provides a nice GUI.

[ Parent | Reply to this comment ]

Posted by Anonymous (66.73.xx.xx) on Wed 9 Nov 2005 at 07:09
A new maintainer took over rsnapshot in November 2005 with help from the original maintainer.

Based on my limited testing, it looks like rsnapshot 1.2.1 is rock-solid. Highly recommended if you are looking for a decent backup solution based on rsync and do not need built-in support for encryption.

I really like the security feature of duplicity but it doesn't look like any work has been done in duplicity in over 2 years--is the project abandoned? If the project is bug-free, then they should release it as 1.0 instead of keeping it version 0.4.1.

[ Parent | Reply to this comment ]

Posted by Anonymous (24.52.xx.xx) on Sun 11 Dec 2005 at 10:16
Duplicity is still the best option for those of us who have to do remote backups to a system we don't have root on. It retains the most file parameters because of the fact the snapshots are tared and does not require the remote server to have any extra software installed like rdiff. I have yet to find any other solutions that operate in this method.

Really wishing duplicity was still supported.... but I do use it with some success as is.

[ Parent | Reply to this comment ]

Posted by Anonymous (82.85.xx.xx) on Tue 23 May 2006 at 08:25
I'm using debian stable, duplicity 0.4.1 and gnupg 1.4.1.
When I run backup script for the first time it's all ok.
But when I run backup script for a second time duplicity crashes whith IOError: GnuPG exited non-zero, with code 2.
It's a duplicity bug?

[ Parent | Reply to this comment ]

Posted by Anonymous (139.11.xx.xx) on Tue 13 Jun 2006 at 08:31
Hi,

I really like your article. I implemented duplicity on my servers and it works fine so far. However, I'm somewhat at a loss as to how to delete old backups. With rdiff-backup it's easy to remove old increments. With duplicity the only thing I can seemingly do is to start over every couple weeks (or so) with a full backup, otherwise the backup server's harddisk(s) will overflow at some point.

I noticed backupninja has a capability to remove old backups. Do you know how this is implemented?

Thanks!

Jens

[ Parent | Reply to this comment ]

Posted by Anonymous (76.0.xx.xx) on Wed 25 Apr 2007 at 20:28
This is a very good How-To Article for Duplicity.
There are also so good examples in the comments too!!
Has this article been updated recently?
Has Part 2 of this article come out yet?

[ Parent | Reply to this comment ]

Posted by Anonymous (82.43.xx.xx) on Wed 16 May 2007 at 08:23
Hi,

I have been using this setup for a while now and it works great, thanks! but since I upgraded to etch it is taking alomst 3 times as long to complete the backup. The number ans size if files backed up hasnt changed very much. Does any one know what could be causing the slowdown and how i can aviod it?

Thanks

[ Parent | Reply to this comment ]

Posted by Anonymous (146.232.xx.xx) on Mon 15 Oct 2007 at 09:45
As a result of this article I have used duplicity now for several months to do automated backups of my work-pc to a server. Since a few days ago the backup keeps failing with the error message: "Running 'sftp <user>@server failed' ..."

I can do a scp-transfer so the ssh-key-authentication is working.

At the moment I can not backup do anything with the backups.

There is no indication of what can cause the problem.

Maybe it is time to look for something else to do the job without these kind of problems.

Johann Spies

[ Parent | Reply to this comment ]

Posted by Anonymous (216.211.xx.xx) on Tue 27 Nov 2007 at 00:32
i had the same issue. in global.py i changed:
# network timeout value
timeout = 30

to:
# network timeout value
timeout = 90


and all is well again

[ Parent | Reply to this comment ]

Posted by Anonymous (61.246.xx.xx) on Sat 24 May 2008 at 10:48
global.py is inside /usr/lib/python(ver)/site-packages/

For me also this thing worked .

[ Parent | Reply to this comment ]

Posted by Anonymous (72.160.xx.xx) on Wed 25 Jun 2008 at 17:08
if the above doesn't work add --ssh-options="-C" and or enable short filenames

[ Parent | Reply to this comment ]

Posted by Anonymous (62.198.xx.xx) on Mon 16 Dec 2013 at 18:42
When will we see part 2?

[ Parent | Reply to this comment ]

Sign In

Username:

Password:

[Register|Advanced]

 

Flattr

 

Current Poll

What do you use for configuration management?








( 472 votes ~ 5 comments )

 

 

Related Links