Practical Rsync

This post appeared originally in our sysadvent series and has been moved here following the discontinuation of the sysadvent microsite

Rsync is a versatile tool for copying files or whole directory trees locally or over the network to another computer. rsync -a will copy files and ensure most of attributes, ownership and permissions are carried over. Rsync over network is trivial, with SSH as the default network protocol (Rsync can also run in daemon mode).

Rsync over network as root

Often I need to do a rsync -a between hosts on directories I’m not the owner of. I do have SSH access and full sudo-access on both hosts. I can’t SSH directly in as the root user on either of them, as direct root logins are (rightly) frowned upon.

Let’s say I’m logged into some soon-to-be-in-production new mail server and want to rsync old-mailserver:/var/spool/mail/ over to /var/spool/mail. Here is a way to do it:

sudo -E rsync -a --rsync-path="sudo /usr/bin/rsync" \
  $LOGNAME@old-mailserver:/var/spool/mail/ /var/spool/mail/

or more generic,

SRCHOST=old-mailserver
SRCDIR=/var/spool/mail
DSTDIR=/var/spool/mail
sudo -E rsync -a --rsync-path="sudo /usr/bin/rsync" \
    ${LOGNAME}@${SRCHOST}:${SRCDIR}/ ${DSTDIR}/

of course it can be reversed,

DSTHOST=new-mailserver
SRCDIR=/var/spool/mail
DSTDIR=/var/spool/mail
sudo -E rsync -a --rsync-path="sudo /usr/bin/rsync" \
    ${SRCDIR}/ ${LOGNAME}@${DSTHOST}:${DSTDIR}/

Explanation

sudo -E will pass your personal environment variables to the subprocess. This ensures that the new Rsync process gets access to your ssh-agent, if you have one running. However, the username is not passed.

The -a option is to sync attributes, permissions and ownership - but be aware that hard links, extended attributes and ACLs will not be preserved! use -HAXa to get all of those - but careful, hard links is a memory-intensive task. Also, while the -a option will overwrite files that already exists on the destination, it will not delete any extra files on the destination. Add --delete for this.

Rsync has plenty of other useful options. If the network connection is unstable and the files are big, throw in --partial. Throw in --verbose and --progress if you prefer that. Worried that the rsync command may be clogging the bandwidth? Use --bwlimit! RTFM for more information!

Rsync will spawn SSH which again will spawn a Rsync server on the remote host. --rsync-path tells Rsync where to search for rsync on the remote host - but actually it’s possible to put in commands here, so sudo /usr/bin/rsync will execute sudo rsync instead of simply rsync, allowing Rsync access to read all those private mailboxes.

Since the username is lost during the initial sudo, we need to tell Rsync what username to use when ssh’ing to the remote server. That would typically be stored in the LOGNAME or USER environment variables. If you have another username on the remote host, replace $LOGNAME with your remote username.

Remember that Rsync makes a difference between path names ending with a slash and path names not ending with a slash. rsync -a foo bar will create a directory bar/foo while rsync -a foo/ bar will ensure foo/* is copied to bar.

Cheap snapshot backups with Rsync

Assuming you have a large pile of files (binaries, i.e. photos - you’d use Git for backing up text files, wouldn’t you?), where the files aren’t modified frequently. You would like to take frequent backups, and you want to keep the snapshots. A simple Rsync to a constant backup directory won’t do - if a file has been accidentally or maliciously edited or truncated on the source directory, the valuable backup will be overwritten when the backup is run. A simple rsync -a to a new directory every day is non-ideal, it’s very wasteful to store the same content over and over again (unless your file system or storage solution provides automatic deduplication).

There is a cheap and easy way to solve this problem by using Rsync with the --link-dest argument. This option will create hard links, thus effectively deduplicate the content. I will just provide a quick and working example, covering the backup of $HOME/photos/ at a remote host photo-album.example.com to /var/backups/photo-album on the local host

  • for more details on rsync --link-dest, RTFM or visit your favorite search engine.

First backup:

sudo mkdir /var/backups/photo-album
sudo chown $LOGNAME /var/backups/photo-album
cd /var/backups/photo-album
this_backup=photos-$(date +%FT%H%M)/
mkdir $this_backup
rsync -a photo-album.example.com:photos/ $this_backup/

Subsequent backups (assuming backup is done only once a day):

cd /var/backups/photo-album
prev_backup=$(ls -t | head -n1)
this_backup=photos-$(date +%FT%H%M)/
mkdir $this_backup
rsync -a photo-album.example.com:photos/ --link-dest=${BACKUP_DIR}/$prev_backup/ $this_backup/

Verify that the deduplication logic works:

du -sh $prev_backup
du -sh $this_backup
du -sh .

You should get roughly the same numbers on all three commands. Exactly the same if no modifications were done.

And now as a generic example

Config:

BACKUP_DIR=/var/backups/photo-album
SOURCE_HOST=photo-album.example.com
SOURCE_BASEDIR=""
DIRNAME=photos
SOURCE_DIR=${SOURCE_BASEDIR}${DIRNAME}

First backup:

sudo mkdir $BACKUP_DIR
sudo chown $LOGNAME $BACKUP_DIR
cd $BACKUP_DIR
this_backup=${DIRNAME}-$(date +%FT%H%M)/
mkdir $this_backup
rsync -a ${SOURCE_HOST}:${SOURCE_DIR}/ $this_backup/

Subsequent backups (assuming backup is done only once a day):

cd $BACKUP_DIR
prev_backup=$(ls -t | head -n1)
this_backup=${DIRNAME}-$(date +%FT%H%M)/
mkdir $this_backup
rsync -a ${SOURCE_HOST}:${SOURCE_DIR}/ --link-dest=${BACKUP_DIR}/${prev_backup}/ ${this_backup}/

Tobias Brox

Senior Systems Consultant at Redpill Linpro

Tobias started working as a developer when he finished his degree at The University of Tromsø. He joined Redpill Linpro as a system administrator a decade ago, and have embraced working with our customers, and maintaining/improving our internal tools.

Why automate Ansible

Ansible can be used for many things. There are only a few things I have on my bucket list of things I would like to do, where Ansible cannot help me.

One of my most urgent things to handle was the increasing complexity of Ansible, its configuration and in particular the role development. As I got deeper into Ansible, more and more factors needed to be taken into consideration when setting up a role: the role structure, linting issues, molecule ... [continue reading]

Comparison of different compression tools

Published on December 18, 2024

Why TCP keepalive may be important

Published on December 17, 2024