This post appeared originally in our sysadvent series and has been moved here following the discontinuation of the sysadvent microsite
Rsync is a versatile tool for copying files or whole directory trees
locally or over the network to another computer. rsync -a
will copy files and ensure most of attributes, ownership and
permissions are carried over. Rsync over network is trivial,
with SSH as the default network protocol (Rsync can also run in daemon
mode).
Rsync over network as root
Often I need to do a rsync -a
between hosts on directories I’m not
the owner of. I do have SSH access and full sudo-access on both
hosts. I can’t SSH directly in as the root user on either of them, as
direct root logins are (rightly) frowned upon.
Let’s say I’m logged into some soon-to-be-in-production new mail
server and want to rsync
old-mailserver:/var/spool/mail/
over to
/var/spool/mail
. Here is a way to do it:
sudo -E rsync -a --rsync-path="sudo /usr/bin/rsync" \
$LOGNAME@old-mailserver:/var/spool/mail/ /var/spool/mail/
or more generic,
SRCHOST=old-mailserver
SRCDIR=/var/spool/mail
DSTDIR=/var/spool/mail
sudo -E rsync -a --rsync-path="sudo /usr/bin/rsync" \
${LOGNAME}@${SRCHOST}:${SRCDIR}/ ${DSTDIR}/
of course it can be reversed,
DSTHOST=new-mailserver
SRCDIR=/var/spool/mail
DSTDIR=/var/spool/mail
sudo -E rsync -a --rsync-path="sudo /usr/bin/rsync" \
${SRCDIR}/ ${LOGNAME}@${DSTHOST}:${DSTDIR}/
Explanation
sudo -E
will pass your personal environment variables to the
subprocess. This ensures that the new Rsync process gets access to
your ssh-agent, if you have one running. However, the username is not
passed.
The -a
option is to sync attributes, permissions and ownership - but
be aware that hard links, extended attributes and ACLs will not be
preserved! use -HAXa
to get all of those - but careful, hard links
is a memory-intensive task. Also, while the -a
option will
overwrite files that already exists on the destination, it will not
delete any extra files on the destination. Add --delete
for this.
Rsync has plenty of other useful options. If the network connection
is unstable and the files are big, throw in --partial
. Throw in
--verbose
and --progress
if you prefer that. Worried that the
rsync command may be clogging the bandwidth? Use --bwlimit
! RTFM for
more information!
Rsync will spawn SSH which again will spawn a Rsync server on the
remote host. --rsync-path
tells Rsync where to search for rsync
on
the remote host - but actually it’s possible to put in commands here,
so sudo /usr/bin/rsync
will execute sudo rsync
instead of simply
rsync
, allowing Rsync access to read all those private mailboxes.
Since the username is lost during the initial sudo, we need to tell
Rsync what username to use when ssh’ing to the remote server. That
would typically be stored in the LOGNAME
or USER
environment
variables. If you have another username on the remote host, replace
$LOGNAME
with your remote username.
Remember that Rsync makes a difference between path names ending with
a slash and path names not ending with a slash. rsync -a foo bar
will create a directory bar/foo
while rsync -a foo/ bar
will
ensure foo/*
is copied to bar
.
Cheap snapshot backups with Rsync
Assuming you have a large pile of files (binaries, i.e. photos - you’d
use Git for backing up text files, wouldn’t you?), where the files
aren’t modified frequently. You would like to take frequent backups,
and you want to keep the snapshots. A simple Rsync to a constant
backup directory won’t do - if a file has been accidentally or
maliciously edited or truncated on the source directory, the valuable
backup will be overwritten when the backup is run. A simple rsync -a
to
a new directory every day is non-ideal, it’s very wasteful to store
the same content over and over again (unless your file system or
storage solution provides automatic deduplication).
–link-dest
There is a cheap and easy way to solve this problem by using Rsync
with the --link-dest
argument. This option will create hard links, thus
effectively deduplicate the content. I will just provide a quick and
working example, covering the backup of $HOME/photos/
at a remote host
photo-album.example.com to /var/backups/photo-album on the local host
- for more details on
rsync --link-dest
, RTFM or visit your favorite search engine.
First backup:
sudo mkdir /var/backups/photo-album
sudo chown $LOGNAME /var/backups/photo-album
cd /var/backups/photo-album
this_backup=photos-$(date +%FT%H%M)/
mkdir $this_backup
rsync -a photo-album.example.com:photos/ $this_backup/
Subsequent backups (assuming backup is done only once a day):
cd /var/backups/photo-album
prev_backup=$(ls -t | head -n1)
this_backup=photos-$(date +%FT%H%M)/
mkdir $this_backup
rsync -a photo-album.example.com:photos/ --link-dest=${BACKUP_DIR}/$prev_backup/ $this_backup/
Verify that the deduplication logic works:
du -sh $prev_backup
du -sh $this_backup
du -sh .
You should get roughly the same numbers on all three commands. Exactly the same if no modifications were done.
And now as a generic example
Config:
BACKUP_DIR=/var/backups/photo-album
SOURCE_HOST=photo-album.example.com
SOURCE_BASEDIR=""
DIRNAME=photos
SOURCE_DIR=${SOURCE_BASEDIR}${DIRNAME}
First backup:
sudo mkdir $BACKUP_DIR
sudo chown $LOGNAME $BACKUP_DIR
cd $BACKUP_DIR
this_backup=${DIRNAME}-$(date +%FT%H%M)/
mkdir $this_backup
rsync -a ${SOURCE_HOST}:${SOURCE_DIR}/ $this_backup/
Subsequent backups (assuming backup is done only once a day):
cd $BACKUP_DIR
prev_backup=$(ls -t | head -n1)
this_backup=${DIRNAME}-$(date +%FT%H%M)/
mkdir $this_backup
rsync -a ${SOURCE_HOST}:${SOURCE_DIR}/ --link-dest=${BACKUP_DIR}/${prev_backup}/ ${this_backup}/
Comparison of different compression tools
Working with various compressed files on a daily basis, I found I didn’t actually know how the different tools performed compared to each other. I know different compression will best fit different types of data, but I wanted to compare using a large generic file.
The setup
The file I chose was a 4194304000 byte (4.0 GB) Ubuntu installation disk image.
The machine tasked with doing of the bit-mashing was an Ubuntu with a AMD Ryzen 9 5900X 12-Core ... [continue reading]