Duplicity is a piece of software that can perform encrypted backups to remote storage over the network. It uses the rsync algorithm to implement incremental backups, thus minimising the amount of data that needs to be transferred over the network and stored remotely. The GNU Privacy Guard is used to provide strong encryption, making it safe to keep your backups in one of the many public cloud storage solutions.
In this post, I will demonstrate basic usage of Duplicity by showing how I use it to back up the home directory on my workstation to Situla, Redpill Linpro’s S3-compatible object storage solution.
First and foremost, you’ll need to install Duplicity on the host you want to
back up. Duplicity is fortunately packaged in most Linux distributions, so
this should just be a matter of running apt-get install duplicity
or dnf
install duplicity
.
You’ll also need an account on Situla (or another S3-compatible object storage service), in particular the access key and the secret key. That said, Duplicity supports a large number of alternative storage backends (a full list is shown on its homepage), so adapting the examples below to your favourite storage service is probably as easy as pie.
Finally, you’ll need to generate an encryption passphrase. This should be
something long and random, e.g., the output from pwgen 128 1
. You must
ensure you keep a copy of the encryption passphrase in a safe location well
away from the system you’re backing up, otherwise your backups will be
worthless the day you actually need them for disaster recovery.
The easiest way is to create a short shell script that invokes Duplicity from a nightly cron job or systemd timer. In my case, the script looks something like this:
#! /bin/sh -e
export AWS_ACCESS_KEY_ID="The Situla/S3 access key goes here"
export AWS_SECRET_ACCESS_KEY="The Situla/S3 secret key goes here"
export PASSPHRASE="The encryption passphrase goes here"
duplicity --full-if-older-than 1M \
--s3-unencrypted-connection \
/home/tore \
s3://situla.bitbit.net/tore-duplicity/workstation
duplicity remove-older-than 6M \
--s3-unencrypted-connection \
--force \
s3://situla.bitbit.net/tore-duplicity/workstation
This first invocation of duplicity
in the script will perform a backup of my
home directory /home/tore
and store it in the workstation
subdirectory of
the tore-duplicity
storage bucket in Situla. It will by default perform an
incremental backup (i.e., only backing up any files that are new or changed
since the last backup run), but it will automatically switch to a full backup
if the last full backup is older than one month.
The second invocation will remove backups that are over six months old. This prevents my storage usage on Situla from growing without bounds.
Note that since the backup files has already been encrypted by the GNU Privacy Guard, there is not much point in spending CPU cycles on encrypting them a second time as they are transferred to Situla. Therefore I’m requesting an unencrypted connection in order to gain a small performance increase.
When the backup is complete, you’ll get a informative status summary with some
statistics, as shown below. Redirecting this to a log with logger
or to
e-mail with sendmail
is probably a useful thing to do.
--------------[ Backup Statistics ]--------------
StartTime 1480500123.35 (Wed Nov 30 11:02:03 2016)
EndTime 1480500362.49 (Wed Nov 30 11:06:02 2016)
ElapsedTime 239.13 (3 minutes 59.13 seconds)
SourceFiles 340965
SourceFileSize 37567051681 (35.0 GB)
NewFiles 26
NewFileSize 1717225 (1.64 MB)
DeletedFiles 7
ChangedFiles 24
ChangedFileSize 150673527 (144 MB)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 57
RawDeltaSize 64155497 (61.2 MB)
TotalDestinationSizeChange 29162549 (27.8 MB)
Errors 0
-------------------------------------------------
If you’d like to take a full backup of the entire system instead of just a
single directory, that is easily accomplished by replacing /home/tore
with
/
in the first ducplicity
invocation. However, it is likely that
weird-behaving files in the special file systems /dev
, /proc
and /sys
will cause problems. To avoid that, you can add --exclude /dev --exclude /proc
--exclude /sys
to the duplicity
command line.
Duplicity offers some handy commands that can be used to verify that everything
is all right with the backup. Note that you’ll need to export the
AWS_ACCESS_KEY
, AWS_SECRET_ACCESS_KEY
and PASSPHRASE
environment
variables first, exactly like in the backup script itself.
The collection-status
command shows the overall status of the backups that
have been made, i.e., when the last full backup was taken and how many
incremental backups have been made since. Example below:
$ duplicity --s3-unencrypted-connection collection-status \
s3://situla.bitbit.net/tore-duplicity/workstation
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Tue Nov 29 16:15:57 2016
Collection Status
-----------------
Connecting with backend: BackendWrapper
Archive dir: /home/tore/.cache/duplicity/9a6c69571a8f0d8c72abcc7b6d4c7d7c
Found 0 secondary backup chains.
Found primary backup chain with matching signature chain:
-------------------------
Chain start time: Tue Nov 29 16:15:57 2016
Chain end time: Wed Nov 30 11:02:01 2016
Number of contained backup sets: 5
Total number of contained volumes: 154
Type of backup set: Time: Num volumes:
Full Tue Nov 29 16:15:57 2016 148
Incremental Wed Nov 30 01:00:01 2016 3
Incremental Wed Nov 30 10:14:33 2016 1
Incremental Wed Nov 30 10:53:10 2016 1
Incremental Wed Nov 30 11:02:01 2016 1
-------------------------
No orphaned or incomplete backup sets found.
The list-current-files
command will by default show a list of all the files
that exists in the last backup made along with their timestamps:
$ duplicity --s3-unencrypted-connection list-current-files \
s3://situla.bitbit.net/tore-duplicity/workstation \
| grep bash_history
Wed Nov 30 11:02:04 2016 .bash_history
The list-current-files
command accepts an optional --time
parameter which
can be used to specify an older backup than the most recent one. This parameter
can be specified as an absolute timestamp (e.g., 2016-11-29) or an offset
(e.g., 1D) that specifies how long old the requested backup should be. At the
time of writing, it is the 30th of November, so the below two commands are
equivalent:
$ duplicity --s3-unencrypted-connection list-current-files \
--time 2016-11-29 \
s3://situla.bitbit.net/tore-duplicity/workstation \
| grep bash_history
Tue Nov 29 16:15:59 2016 .bash_history
$ duplicity --s3-unencrypted-connection list-current-files \
--time 1D s3://situla.bitbit.net/tore-duplicity/workstation \
| grep bash_history
Tue Nov 29 16:15:59 2016 .bash_history
Being able to restore files is clearly the single most important thing about keeping backups in the first place. Duplicity makes this very easy, it is just a matter of giving the remote backup storage location as the first command line argument and a local file system path (where the restored files will go) as the second. Some examples:
/home/tore-restored
:duplicity --s3-unencrypted-connection \
s3://situla.bitbit.net/tore-duplicity/workstation \
/home/tore-restored
duplicity --s3-unencrypted-connection --time 1D \
s3://situla.bitbit.net/tore-duplicity/workstation \
/home/tore-restored
.bash_history
from the most recent backup:duplicity --s3-unencrypted-connection --file-to-restore .bash_history \
s3://situla.bitbit.net/tore-duplicity/workstation \
.bash_history-restored
The --file-to-restore
parameter also accepts directories, and can of course
be combined with the --time
parameter in order to restore from older backups.
Duplicity does of course have have many more features than discussed in this post, and all of those are documented in its manual. That said, this post should contain everything you need to get started. Good luck, and remember: keep your encryption key in a safe place away from the system you’re backing up – one day, you’ll need it!