If you have a server on the internet, you need to think about how to do your backups. In my case, I do have a NAS box at home that is perfect for this.
As I do not want to poke a hole into my NAT firewall, it should be a
pull-based backup with minimal software on the machine that is backed
up. It turns out, I do not need more than rsync
on both the server
and the NAS box.
Let’s assume that the box to be backed up is reachable at
mybox.example.com
and you NAS server is called mynas
.
Preparation
The backup machine mynas
needs to access the machine to be backed up
(mybox
). If you want to map file ownership and permission correctly
to mynas
, then you need to run the backup process locally as root
,
and if you want to backup system files from mybox
then you need to
do the backup as root
there as well.
The local RAID hard-drive is mounted as /data
and we create a folder
/data/backup
for the backups. We start by generating a
private/public key pair to be used for backups:
root@mynas(~)# mkdir -p /data/backup/identity
root@mynas(~)# ssh-keygen -f /data/backup/identity/id_rsa
[press enter when asked for a passphrase]
Afterwards, you should have two files in the identity
folder:
id_rsa
contains your private key, and id_rsa.pub
your public
key.
Now copy the id_rsa.pub
file to mybox
and append it to
authorized_keys
on your root user. As your normally cannot directly
login with the root user, let’s assume you have myuser
on the
server. Here is an example how you can do it:
root@mynas(~)# scp /data/backup/identity/id_rsa.pub myuser@mybox.example.com:.
root@mynas(~)# ssh myuser@mybox.example.com
myuser@mybox.example.com's password:
myuser@mybox(~)$ su -
Password:
root@mybox(~)# mkdir -p .ssh
root@mybox(~)# cat /home/myuser/id_rsa.pub >> ~/.ssh/authorized_keys
root@mybox(~)# chmod 700 .ssh .ssh/authorized_keys
Next you should check if you can now login to the backup machine using the shared key:
root@mynas(~)# ssh -i /data/backup/identity/id_rsa root@mybox.example.com
Backup
The magic option to use in rsync
is called --link-dir
, which is
used to de-duplicate files in a new backup that have not
changed. Instead, rsync
will use a hard-link to save disk storage
space.
There are a number directories that you do not want to backup. For
example on Linux, the /prod
, /sys
and /dev
file-systems should
not be backed up. Also, it makes little sense to backup the /tmp
or
/var/tmp
directories.
In my case mybox
is an OpenBSD box and I only have a small set of
directories I exclude. They are all listed in a file called
exclude.mybox
in my backup directory:
root@mynas(~)# cat /data/backup/exclude.mybox
/tmp
/var/tmp
/var/www/tmp
Now it is time for the first backup. This should be issued
manually. We create a directory mybox
and a folder with the current
date.
root@mynas(~)# mkdir /data/backup/mybox
root@mynox(~)# rsync -az --exclude-from=/data/backup/exclude.mybox -e "ssh -i /data/backup/identity/id_rsa" mybox.example.com:/ /data/backup/mybox/$(date +"%Y%m%d-%H%M")
The magic date command date +"%Y%m%d-%H%M
is used to create an
“YYYYMMDD-hhmm” timestamp for the backup.
This will take a while. Grab a tea …
Every new backup should be referencing the previous backup to with the
--link-dir
option. This makes sure that we only store files that
have been modified and otherwise use hard-links.
Here is the script that I am now running in a cron job. The file name
is /data/backup/backup-mybox.sh
.
#!/bin/sh
BASE_DIR=/data/backup/mybox
if [ ! -d $BASE_DIR ]; then
echo "Base directory ($BASE_DIR) not available."
exit 1
fi
NEW_DIR=$BASE_DIR/$(date +"%Y%m%d-%H%M")
LAST_DIR=$BASE_DIR/$(ls $BASE_DIR | sort | tail -1)
echo "NEW DIR: $NEW_DIR"
echo "LAST DIR: $LAST_DIR"
time rsync -az --exclude-from=/data/backup/exclude.mybox --link-dest="$LAST_DIR" -e "ssh -i /data/backup/identity/id_rsa" mybox.example.com:/ "$NEW_DIR"
du -sh $LAST_DIR $NEW_DIR
The du
at the last line with show you that the new backup only uses
extra space for modified files.
Expire
There is a second job that cleans up old backup. My retention policy is:
- Keep the last 7 backups
- Keep a backup for every month for the last year
- Keep a backup for every year
The following small script deletes all directories that do not match this description. It needs to be called with the name of the host that is backed up (I actually have a number of hosts that I backup this way).
#!/bin/sh
set -e
mkdir -p /data/backup/work/$1
cd /data/backup/work/$1
# Get all available dirs
ls /data/backup/$1 > all-dirs.txt
# Get the last 7 backups
tail -7 all-dirs.txt > last-7-days.txt
# Get first of years
rm -f first-of-years.txt
for year in $(cat all-dirs.txt | cut -c1-4 | sort -u); do
filename=$(grep ^$year all-dirs.txt | head -1)
echo $filename >> first-of-years.txt
done
# Get first of months
rm -f first-of-months.txt
for month in $(cat all-dirs.txt | cut -c1-6 | sort -u); do
filename=$(grep ^$month all-dirs.txt | head -1)
echo $filename >> first-of-months.txt
done
# Last 12 months
tail -12 first-of-months.txt > last-12-months.txt
# Join it all together, and unique
cat last-7-days.txt first-of-years.txt last-12-months.txt | sort -u > keep.txt
# Files to delete
for fn in $(grep -F -x -v -f keep.txt all-dirs.txt); do
if [ ! -z $fn ]; then
echo Deleting /data/backup/$1/$fn
rm -rf /data/backup/$1/$fn
fi
done
Running it as cron
I have setup a cronjob for root that will trigger the backup and delete script every night.
0 3 * * * /data/backup/backup-mybox.sh
0 5 * * * /data/backup/deletedirs.sh mybox