Network backup

My home (and office!) network backup system.

Update 20100331: check jdmz.net and revise the procedure!

Principles of operation

The NAS runs an rsnapshot server, activated under cron to take snapshots of basically three backup target classes: servers (i.e. hosts that are permanently on-line), linux laptops and windows laptops (the latter two type of clients are not supported yet). The non-server target classes share the characteristic that their presence onto the network is not taken for granted, thus the snapshot server checks whether they are available for a backup, and acts accordingly.
This document explains how to set up an almost complete "set it and forget it" backup system on low-cost NASs such as the Synology DS207+ and the Buffalo TeraStation Live. Being rsnapshot available on serveral NAS platforms, deployment on other *NIX based NAS platforms could be possible with small or no modifications at all.

Push or pull?

Dealing with backup targets that are always on-line is rsnapshot everyday's task, so a single server backup configuration file defines all the hosts and file system parts to be backed up, with no differences from any standard rsnapshot server setup. Laptop backup targets, instead, are present onto the network (and therefore available to be backed up) at unpredictable intervals: since it is almost impossible to predict when such a backup target will be present (i.e. connected to the same network where the NAS is sitting), there are at least two possible solutions to initiate a backup, regardless of user's intervention, which of course should be avoided, to grant full automation and, eventually, the guarantee to have a "regular" backup schedule:

push (i.e. client-activated)
The laptop connects and checks whether a backup for today has already been done: if not, starts sending data to the NAS. Can be hooked as a post-up in /etc/network/interfaces.
pull (i.e. server-activated)
The NAS checks at periodic intervals whether the laptop is present or not, if yes, checks whether a backup for today has already been done, if not, starts getting data from the laptop. Can be hooked as a cron job on the NAS crontab.

The push solution looks easier to implement, with some advantages (e.g. a pre-down hook to stop politely in case of premature exit from the network...).
Anyway, a single rsnapshot installation on the NAS, acting as a pull backup server, seems more viable, in particular in an environment where there are several hosts to back up: this means in fact a single point of maintenance... decisions, decisions!

NAS-centric pull backups

On rsnapshot mailing list there is an interesting thread on how to configure rsnapshot, at crontab level, to pull backups when a host is found present.
The principle is to have a separate rsnapshot.conf for each machine to be snapshotted, and a shell script to be executed under cron, which checks whether the host is present, starting rsnapshot hourly (or daily, as it suits).
In practice, suggestions from the post cited above were used to set up a sh wrapper script, which finds the right rsnapshot.conf for a given backup target, checks whether that target is i) online, ii) another rsnapshot process is already running and eventually starts the snapshot process.

Software

Both the DS207+ and the TeraStation already have rsync server on board. Just enable it from the administration panel.
A portscan from the client (e.g. with nmap) will show port 873/tcp open, and a call to rsync (i.e. rsync priscilla.prea.net::) will list some shares (in my case two: NetBackup and Network Backup Share).

Install rsnapshot

DS207+
easily installed using ipkg
TeraStation
easily installed using ipkg
See DS207+ and TeraStation pages on this site to install ipkg bootstrap on both machines.

Server configuration

Since we have to deal with more than a single /etc/rsnapshot.conf, some little modifications to the basic "by-the book" rsnapshot configuration have to be made. By and large, the following operations will be made:

  1. Create a backup data repository (e.g. on my DS207+ is in /volume1/backup/).
  2. (Optional, potential security hole). Make the backup repository available as a read-only shared folder (either using samba or NFS).
  3. Create an /opt/etc/rsnapshot/ folder, where all the configuration files will be stored.
  4. Create a rsnapshot configuration file for "server" (i.e. "always up") backup targets. This file will take care of more than one backup target, in the classic rsnapshot.conf fashion.
  5. If you feel confortable using exclusion files create in the same configuration folder a suitable one.
  6. For each "moving target" (i.e. laptop-like "not always present" backup target) create a rsnapshot.conf.<hostname> configuration file. Each one of these files will deal with one and only one "moving" backup target.
  7. If you feel confortable using exclusion files create in the same configuration folder an exclusion file for each "moving target".
  8. Install and configure rsnapshot.host.sh.
  9. Install rsnapshot and sshd on backup targets.
  10. Install ssh keys on backup targets.
  11. Define cron jobs.

A note for database servers such as postgres, MySQL, Firebird and the like: if on the host to be backed up there are some DBMS servers running, remember that a database dump script is needed, since backing up "live" DBMS files is useless (and most of the time, doesn't work at all!).

For each box to be snapshotted, set up a separate config file, pidfile and backup area, then a script for cron, that uses ping (or some equivalent such as fping or busybox...) to test whether the needed host is up, and starts rsnapshot accordingly.

The Gory Details

Commands and pathnames reported here are referred to an installation on a Synology DS207+. On other NAS, commands should be the same, but pathnames surely will differ (e.g. on Buffalo Terastations, RAID volumes are under /mnt, i.e. /mnt/volume1). Modify path names according to your NAS (or server) filesystem.

Create a backup data repository

Synology: mkdir /volume1/backup/
TeraStation: mkdir /mnt/array1/backup/
Alternatively, the backup data repository can be created as a shared folder using the NAS web-based (or whatever it could be) administration panel.

(Optional) Make the backup repository available as a read-only shared folder

Use your NAS administrative tools to define the backup repository folder as a samba read-only share. Making the share read-only is mandatory, since users should just read data, without any possibility to alter them.
If you use NFS, the same can be achieved, setting the exported folder read only in /etc/exports.
Security note: this would expose all the snapshotted files, e.g. user home directory contents, a fix should be devised to allow visibility only to the owner or to the superuser.

Create a directory to hold configuration files/h5>

  1. mkdir /opt/etc/rsnapshot

Create the configuration file for "always on" targets

Here is a sample /opt/etc/rsnapshot/rsnapshot.conf to snapshot a "static" (i.e. "always up") hosts. This particular example is conceived for a TeraStation Live NAS used as rsnapshot server. The configuration shown here derives from an example found on CentOS Wiki. In particular, this is the configuration I use to snapshot two server (a production one and a development one) at my workplace.

  1. config_version 1.2
  2. snapshot_root /mnt/array2/backup/
  3. cmd_cp /opt/bin/cp
  4. cmd_rm /opt/bin/rm
  5. cmd_rsync /opt/bin/rsync
  6. cmd_ssh /usr/bin/ssh
  7. cmd_logger /usr/bin/logger
  8. cmd_du /usr/bin/du
  9. interval hourly 4
  10. interval daily 7
  11. interval weekly 4
  12. interval monthly 3
  13. link_dest 1
  14. verbose 2
  15. loglevel 4
  16. logfile /opt/var/log/rsnapshot/servers.log
  17. exclude_file /opt/etc/rsnapshot/servers.exclude
  18. rsync_long_args --delete --numeric-ids --delete-excluded
  19. lockfile /opt/var/run/rsnapshot.servers.pid
  20. backup root@server1:/ server1/
  21. backup_script /opt/etc/rsnapshot/server1.dump_databases.sh server1_databases/
  22. backup root@server2:/ server2/
  23. backup_script /opt/etc/rsnapshot/server2.dump_databases.sh server2_databases/

Create a common exclusion file for "always on" targets

Sample /opt/etc/rsnapshot/servers.exclude. Again, this is precisely the set I use at work.

  1. + /boot
  2. + /etc
  3. + /home
  4. + /opt
  5. + /root
  6. + /usr
  7. + /usr/java
  8. + /usr/local
  9. - /usr/*
  10. - /var/cache
  11. + /var
  12. + /srv
  13. - /*

Create the configuration file for "moving" targets

Sample /opt/etc/rsnapshot/rsnapshot.conf.mus. This is for one of my Debian laptops (hostnamed mus). As for the example above, I stripped all the comments present in the file for compactness' sake.
Separate rsnapshot.conf.<hostname> files for each target are needed, to achieve process separation. This means that in each different configuration file, different backup storage areas and different pidfiles will be specified, as well as the target-specific filesystem paths to be included in the snapshotting process.

  1. # rsnapshot.conf.mus - rsnapshot configuration file for mus.prea.net
  2. config_version 1.2
  3. snapshot_root /volume1/backup/mus/
  4. #no_create_root 1
  5. cmd_cp /opt/bin/cp
  6. cmd_rm /opt/bin/rm
  7. cmd_rsync /opt/bin/rsync
  8. cmd_ssh /opt/bin/ssh
  9. #cmd_logger /usr/bin/logger
  10. cmd_du /opt/bin/du
  11. interval hourly 4
  12. interval daily 7
  13. interval weekly 4
  14. #interval monthly 3
  15. link_dest 1
  16. verbose 4
  17. loglevel 3
  18. logfile /opt/var/log/rsnapshot.mus
  19. exclude_file /opt/etc/rsnapshot/exclude.mus
  20. rsync_short_args -a
  21. rsync_long_args --delete --numeric-ids --delete-excluded
  22. ssh_args -c blowfish
  23. #du_args -csh
  24. #one_fs 0
  25. lockfile /opt/var/run/rsnapshot.pid.mus
  26. backup root@mus:/ mus/
  27. #backup_script /opt/etc/rsnapshot/dump.databases.mus.sh mus_databases/
Some notes: look at the path defined as snapshot_root: for each target it must be a directory under the backup repository folder created in advance, by hand. Furthermore, separated logfile, exclude_file, and - most important - lockfile are defined, simply adding .<hostname> as a suffix to standard names used in normal rsnapshot configuration files.

Create the related exclusion file

Sample /opt/etc/rsnapshot/exclude.mus, which goes together with the configuration file shown above.

  1. + /boot
  2. + /data
  3. + /etc
  4. + /home
  5. + /opt
  6. + /root
  7. + /usr
  8. + /usr/java
  9. + /usr/local
  10. - /usr/*
  11. - /var/cache
  12. + /var
  13. + /srv
  14. - /*

rsnapshot.host.sh

Since rsnapshot has to be called with different (maybe ten...) configurations, writing something rather long either on the command line or in the crontab is boring and error-prone. Moreover, there is also some housekeeping to do, that is, to check whether the target host is online or a rsnapshot process is already ongoing.
All this stuff is taken care by a shell script, largely based on one proposed by Bob Hutchinson. Here it is... check it often, since I'm still fiddling with it, and of course, suggestions are always welcome.

  1. #!/bin/sh
  2. #
  3. # /opt/local/bin/rsnapshot.host.sh
  4. # launches rsnapshot with the correct configuration file
  5. # (c) Damiano G. Preatoni &lt;prea at prea dot net&gt;
  6. #
  7. # usage: rsnapshot &lt;hostname&gt; {hourly|daily|weekly|monthly}
  8.  
  9. HOST=$1
  10. INTERVAL=$2
  11.  
  12. case $INTERVAL in
  13. hourly)
  14. # test with BusyBox ping, will return 1 if host is alive, 0 if down
  15. testit=`ping -c 1 $HOST | grep 'time' | wc -l`
  16.  
  17. if [ $testit = 0 ]; then
  18. # this ip is not reachable, bail out
  19. echo "$HOST is unreachable. Exiting."
  20. exit 1
  21. fi
  22.  
  23. if [ -f "/opt/var/run/rsnapshot.pid.$HOST" ]; then
  24. # there is an instance already running, bail out as well
  25. echo "rsnapshot already running on $HOST. Exiting."
  26. exit 2
  27. fi
  28.  
  29. # if scripts reaches this, can do rsnapshot
  30. rsnapshot -c /opt/etc/rsnapshot/rsnapshot.conf.$HOST hourly
  31. ;;
  32. daily)
  33. rsnapshot -c /opt/etc/rsnapshot/rsnapshot.conf.$HOST daily
  34. ;;
  35. weekly)
  36. rsnapshot -c /opt/etc/rsnapshot/rsnapshot.conf.$HOST weekly
  37. ;;
  38. monthly)
  39. rsnapshot -c /opt/etc/rsnapshot/rsnapshot.conf.$HOST monthly
  40. ;;
  41. du)
  42. rsnapshot -c /opt/etc/rsnapshot/rsnapshot.conf.$HOST du
  43. ;;
  44. esac

Install rsnapshot and sshd on backup targets.

On Debian systems, just do aptitude install rsnapshot sshd.
Add the procedure to install client software on Windows systems. Trials ongoing, stay tuned!

Install ssh keys on backup targets.

To properly work, rsnapshot needs a ssh root passwordless login on target boxes (ok, I know that ssh root passwordless access is dangerous: be aware that this is a temporary solution. There are ways to allow ssh access to an unprivileged user (say, a user named backup), and to grant it su privileges... coming soon!)

Server-side: generate ssh keys

On the rsnapshot server, create a ssh keypair using ssh-keygen, if you haven't already done so.

  1. # ssh-keygen -t dsa
  2. Generating public/private dsa key pair.
  3. Enter file in which to save the key (/root/.ssh/id_dsa):
  4. Enter passphrase (empty for no passphrase):
  5. Enter same passphrase again:
  6. Your identification has been saved in /root/.ssh/id_dsa.
  7. Your public key has been saved in /root/.ssh/id_dsa.pub.
  8. The key fingerprint is:
  9. (etc, etc, etc...)

then, move the public key to the remote host: this can be done using ssh access itself...

  1. # cat ~/.ssh/id_dsa.pub | ssh root@remote_host "cat &gt;&gt; ~/.ssh/authorized_keys2"

You also may wish to turn off password logins via ssh on each backup target, but it's up to you to decide. Should you decide to do so, edit /etc/ssh/sshd_config on the target host. Make sure you turn PasswordAuthentication and PermitEmptyPasswords to say no. Also, I'm not a security expert, but you should change permissions on your ~/.ssh directories and files to something like 0700, e.g. # chmod 700 .ssh; chmod 400 .ssh/authorized_keys2.

Some notes on ssh paswordless key generation and usage can be found here (jdmz.net)

Define cron jobs.

Check your cron documentation. On Synology, perhaps you should edit by hand /etc/crontab and then restart cron. On TeraStations, crontab -e works as expected.
Anyway, here's a sample crontab, scheduling backups for two laptops, namely mus and sorex:

  1. 5 * * * * root sh /opt/local/bin/rsnapshot.host mus hourly
  2. 15 * * * * root sh /opt/local/bin/rsnapshot.host sorex hourly
  3.  
  4. 20 0 * * * root sh /opt/local/bin/rsnapshot.host mus daily
  5. 30 0 * * * root sh /opt/local/bin/rsnapshot.host sorex daily
  6.  
  7. 35 0 * * 1 root sh /opt/local/bin/rsnapshot.host mus weekly
  8. 45 0 * * 1 root sh /opt/local/bin/rsnapshot.host sorex weekly
  9.  
  10. 50 0 1 * * root sh /opt/local/bin/rsnapshot.host mus monthly
  11. 0 1 1 * * root sh /opt/local/bin/rsnapshot.host sorex monthly

Well, that should be all... for now.
Backup targets that are always on will be snapshotted at regular interval, the "normal" rsnapshot way.
Hosts that could be present or not will be checked for presence each hour (with the configuration presented here), and snapshotted if present.

Er... and the restore?

Should you need to restore some file, or an entire host, just point to the shared read-only backup repository, and simply copy what you need. So simple...