If you have a server on the internet, you need to think about how to do your backups. In my case, I do have a NAS box at home that is perfect for this.

As I do not want to poke a hole into my NAT firewall, it should be a pull-based backup with minimal software on the machine that is backed up. It turns out, I do not need more than rsync on both the server and the NAS box.

Let’s assume that the box to be backed up is reachable at mybox.example.com and you NAS server is called mynas.

Preparation

The backup machine mynas needs to access the machine to be backed up (mybox). If you want to map file ownership and permission correctly to mynas, then you need to run the backup process locally as root, and if you want to backup system files from mybox then you need to do the backup as root there as well.

The local RAID hard-drive is mounted as /data and we create a folder /data/backup for the backups. We start by generating a private/public key pair to be used for backups:

root@mynas(~)# mkdir -p /data/backup/identity
root@mynas(~)# ssh-keygen -f /data/backup/identity/id_rsa
[press enter when asked for a passphrase]

Afterwards, you should have two files in the identity folder: id_rsa contains your private key, and id_rsa.pub your public key.

Now copy the id_rsa.pub file to mybox and append it to authorized_keys on your root user. As your normally cannot directly login with the root user, let’s assume you have myuser on the server. Here is an example how you can do it:

root@mynas(~)# scp /data/backup/identity/id_rsa.pub myuser@mybox.example.com:.
root@mynas(~)# ssh myuser@mybox.example.com
myuser@mybox.example.com's password: 
myuser@mybox(~)$ su -
Password:
root@mybox(~)# mkdir -p .ssh
root@mybox(~)# cat /home/myuser/id_rsa.pub >> ~/.ssh/authorized_keys
root@mybox(~)# chmod 700 .ssh .ssh/authorized_keys

Next you should check if you can now login to the backup machine using the shared key:

root@mynas(~)# ssh -i /data/backup/identity/id_rsa root@mybox.example.com

Backup

The magic option to use in rsync is called --link-dir, which is used to de-duplicate files in a new backup that have not changed. Instead, rsync will use a hard-link to save disk storage space.

There are a number directories that you do not want to backup. For example on Linux, the /prod, /sys and /dev file-systems should not be backed up. Also, it makes little sense to backup the /tmp or /var/tmp directories.

In my case mybox is an OpenBSD box and I only have a small set of directories I exclude. They are all listed in a file called exclude.mybox in my backup directory:

root@mynas(~)# cat /data/backup/exclude.mybox
/tmp
/var/tmp
/var/www/tmp

Now it is time for the first backup. This should be issued manually. We create a directory mybox and a folder with the current date.

root@mynas(~)# mkdir /data/backup/mybox
root@mynox(~)# rsync -az --exclude-from=/data/backup/exclude.mybox -e "ssh -i /data/backup/identity/id_rsa" mybox.example.com:/ /data/backup/mybox/$(date +"%Y%m%d-%H%M")

The magic date command date +"%Y%m%d-%H%M is used to create an “YYYYMMDD-hhmm” timestamp for the backup.

This will take a while. Grab a tea …

Every new backup should be referencing the previous backup to with the --link-dir option. This makes sure that we only store files that have been modified and otherwise use hard-links.

Here is the script that I am now running in a cron job. The file name is /data/backup/backup-mybox.sh.

#!/bin/sh

BASE_DIR=/data/backup/mybox

if [ ! -d $BASE_DIR ]; then
    echo "Base directory ($BASE_DIR) not available."
    exit 1
fi

NEW_DIR=$BASE_DIR/$(date +"%Y%m%d-%H%M")
LAST_DIR=$BASE_DIR/$(ls $BASE_DIR | sort | tail -1)

echo "NEW DIR:  $NEW_DIR"
echo "LAST DIR: $LAST_DIR"

time rsync -az --exclude-from=/data/backup/exclude.mybox --link-dest="$LAST_DIR" -e "ssh -i /data/backup/identity/id_rsa" mybox.example.com:/ "$NEW_DIR"

du -sh $LAST_DIR $NEW_DIR

The du at the last line with show you that the new backup only uses extra space for modified files.

Expire

There is a second job that cleans up old backup. My retention policy is:

  • Keep the last 7 backups
  • Keep a backup for every month for the last year
  • Keep a backup for every year

The following small script deletes all directories that do not match this description. It needs to be called with the name of the host that is backed up (I actually have a number of hosts that I backup this way).

#!/bin/sh

set -e

mkdir -p /data/backup/work/$1
cd /data/backup/work/$1

# Get all available dirs
ls /data/backup/$1 > all-dirs.txt

# Get the last 7 backups
tail -7 all-dirs.txt > last-7-days.txt

# Get first of years
rm -f first-of-years.txt
for year in $(cat all-dirs.txt | cut -c1-4 | sort -u); do
    filename=$(grep ^$year all-dirs.txt | head -1)
    echo $filename >> first-of-years.txt
done

# Get first of months
rm -f first-of-months.txt
for month in $(cat all-dirs.txt | cut -c1-6 | sort -u); do
    filename=$(grep ^$month all-dirs.txt | head -1)
    echo $filename >> first-of-months.txt
done

# Last 12 months
tail -12 first-of-months.txt > last-12-months.txt

# Join it all together, and unique
cat last-7-days.txt first-of-years.txt last-12-months.txt | sort -u > keep.txt

# Files to delete
for fn in $(grep -F -x -v -f keep.txt all-dirs.txt); do
    if [ ! -z $fn ]; then
	echo Deleting /data/backup/$1/$fn
	rm -rf /data/backup/$1/$fn
    fi
done

Running it as cron

I have setup a cronjob for root that will trigger the backup and delete script every night.

0 3 * * * /data/backup/backup-mybox.sh
0 5 * * * /data/backup/deletedirs.sh mybox