I have seen a lot of backup utilities over the years, both proprietary and open source, but my preferred solution is the open source rsnapshot project, preferably combined with LVM snapshots for a more consistent image. It does the actual work via rsync. rsnapshot is OK for backups, but excels in the use case of having a standby server that can (with manual intervention) take over your main server in the least amount of time. I recall a company spending days recovering data from a complete SAN failure (both controllers), but our rsnapshot based backup server got back up and running in 3 minutes once we got to the console. This even allows bare-metal recovery if you are good enough with Linux.
Written on: 2011-02-22
Last updated: 2015-08-26
Distributions: RHEL_6+7 CentOS_6+7
(this doc needs to be a bit cleaned up, the installation steps were originally written just for my own reference)
Rsnapshot is a large perl script that orchestrates backups using tools like rsync. You might be thinking, rsync isn't that hard, why would I use rsnapshot? Well, because rsnapshot can keep snapshots of the previous backups as well instead of overwriting them like rsync does. Imagine this scenario: the primary server loses a lot of information or gets corrupted late evening. No one notices, and the sync takes place at night. Next thing you know, you have two servers having the same corrupt information. With this method, you'd have the ability to almost instantly restore the backup as it were for any day of the past week or so. So if things go awry in the weekend and no one notices, you can restore the backup 3 days ago.
Rsnapshot's backup strategy is difficult to describe quickly until you see it, but this is how the end result looks like when you see the resulting backup, where daily.0 is the latest backup, and the previous day's is daily.1, and so on:
[root@storage snapshots]$ ls daily.0 daily.1 daily.2 daily.3 daily.4 daily.5 daily.6 [root@storage snapshots]$ cd daily.0/ # Yesterday's backup [root@storage daily.0]$ ls chat.test.com web.test.com server2.dev.local [root@storage daily.0]$ cd chat.test.com/ [root@storage chat.test.com]$ ls bin boot dev etc home lib lib64 lost+found media mnt opt proc root run sbin srv sys tmp usr var [root@storage chat.test.com]$ cd ../../ [root@storage snapshots]$ cd daily.6/chat.test.com/ # Last week's backup [root@storage chat.test.com]$ ls bin boot dev etc home lib lib64 lost+found media mnt opt proc root run sbin srv sys tmp usr var [root@storage chat.test.com]$
As you can see, as far as you're concerned, you just have lots of days (or even hours and days) of full backups of, in this case, the root filesystem. But before you cry about the space usage, realize that the actual space it uses is only marginally higher than a single full backup, because it hard-links identical files across backups.
So you end up with these advantages:
Of course, it's not all roses, as one major disadvantage is that backups are not compressed (unless your filesystem is [ZFS/BTRFS]). In addition, it's poorly suited in situations where you have very large single files that change by a bit (like virtual machine images); because rsnapshot would copy and store the entire file over; it cannot keep the differences of a single file between backups. Also, the place where you keep your backups has to support permissions, possibly ACLs, etc, making it often not work well with some NASs, or FAT32/NTFS mount points (if you use a USB, format the USB with ext3/4/xfs)
Rsnapshot by itself is fairly easy to set up, and there are loads of rsnapshot tutorials on the internet. However, most of them end up resulting in a situation where you can log into the primary server from the backup as root without a password. That's not ideal. So we'll take the much longer route to prevent this.
Create a new system user on the main server called say "backupuser". We'll be needing to transfer keys later, so you can optionally set a random long password and benefit from ssh-copy-id, or just do it manually without setting a temporary password.:
adduser backupuser passwd backupuser <long random password> mkdir -p /home/backupuser ; chown backupuser:root /home/backupuser ; chmod 700 /home/backupuser
You can use the numerous scripts available online to restrict the commands that can be run when backupuser logs in via a key. For this example, we'll be using a script taken from troy.jdmz.net , call it /home/backupuser/validate.sh :
#!/bin/sh FAIL_MESSAGE="Not allowed" case "$SSH_ORIGINAL_COMMAND" in *\&*) echo "$FAIL_MESSAGE" ;; *\;*) echo "$FAIL_MESSAGE" ;; *\(*) echo "$FAIL_MESSAGE" ;; *\{*) echo "$FAIL_MESSAGE" ;; *\<*) echo "$FAIL_MESSAGE" ;; *\`*) echo "$FAIL_MESSAGE" ;; *\|*) echo "$FAIL_MESSAGE" ;; rsync*) $SSH_ORIGINAL_COMMAND ;; /usr/local/bin/rsync_wrapper.sh*) $SSH_ORIGINAL_COMMAND ;; "sudo /home/backupuser/prepare-backup-start.sh") $SSH_ORIGINAL_COMMAND ;; "sudo /home/backupuser/prepare-backup-finish.sh") $SSH_ORIGINAL_COMMAND ;; true*) echo true ;; *) echo "$FAIL_MESSAGE" ;; esac
Set up some permissions for that script:
chown backupuser:root /home/backupuser/validate.sh; chmod 550 /home/backupuser/validate.sh
Then type:
visudo
And add this line:
backupuser ALL=NOPASSWD:/usr/bin/rsync
You will also need to comment out:
#Defaults requiretty
If you plan to use LVM snapshots, add:
backupuser ALL=NOPASSWD:/home/backupuser/prepare-backup-start.sh backupuser ALL=NOPASSWD:/home/backupuser/prepare-backup-finish.sh
Create a file called /usr/local/bin/rsync_wrapper.sh containing:
#!/bin/sh /usr/bin/sudo /usr/bin/rsync "$@";
Then type:
chown backupuser:root /usr/local/bin/rsync_wrapper.sh chmod 550 /usr/local/bin/rsync_wrapper.sh
Enable passwordless logins for backupuser. If you set a password for backupuser, you can do it the easy way by typing this on the rsnapshot server:
[in your backup backup] ssh-keygen -t rsa [in your backup backup] ssh-copy-id backupuser@your-main-server.domainip.com
Now, on the primary server, edit /home/backupuser/.ssh/authorized_keys and prepend:
from="1.2.3.4",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,command="/home/backupuser/validate.sh"
just before the just before the ssh-rsa AAAAB3Nza... text, modifying 1.2.3.4 with your rsnapshot/backup server's IP.
If the server supports LVM snapshots (all you need is a few gigabytes of free space in your VolumeGroup), go to your main server and create the scripts /home/backupuser/prepare-backup-start.sh & /home/backupuser/prepare-backup-finish.sh as required. An example:
/home/backupuser/prepare-backup-start.sh
#!/bin/bash set -e # Change the path of lvcreate to /usr/sbin/lvcreate for RHEL5 /sbin/lvcreate -L 2G --snapshot -n RootSnapshot /dev/VolGroup00/root /sbin/lvcreate -L 3G --snapshot -n OptSnapshot /dev/VolGroup00/opt mount -o ro /dev/VolGroup00/RootSnapshot /mnt/snapshot_root mount -o ro /dev/VolGroup00/OptSnapshot /mnt/snapshot_opt
/home/backupuser/prepare-backup-finish.sh
#!/bin/bash umount /mnt/snapshot_root umount /mnt/snapshot_opt # Change the path of lvremove to /usr/sbin/lvremove for RHEL5 /sbin/lvremove -f /dev/VolGroup00/RootSnapshot /sbin/lvremove -f /dev/VolGroup00/OptSnapshot
Edit the above according to your LVM setup, and create the snapshot directories (eg. /mnt/snapshot_root), and run both those scripts to make sure they work, then:
chmod 550 /home/backupuser/prepare-backup-*
It's much more complicated than the pre-installation:
Install EPEL
yum install rsnapshot
Edit /etc/rsnapshot.conf, and remember that you need TABS between the configuration name and value, so you can't copy and paste the following. So to be clear, if I mentioned a line like:
ssh_args -p 22 -c arcfour
it actually means:
ssh_args<TAB>-p<SPACE>22<SPACE>-c<SPACE>arcfour
Here are the recommended configuration changes; with the changes highlighted:
# # This file requires tabs between elements # # Directories require a trailing slash: # right: /home/ # wrong: /home config_version 1.2 # All snapshots will be stored under this root directory. snapshot_root /opt/snapshots/ no_create_root 1 cmd_cp /bin/cp cmd_rm /bin/rm cmd_rsync /usr/bin/rsync cmd_ssh /usr/bin/ssh cmd_logger /bin/logger cmd_du /usr/bin/du #cmd_rsnapshot_diff /usr/local/bin/rsnapshot-diff interval daily 7 # Verbose level, 1 through 5. # 1 Quiet Print fatal errors only # 2 Default Print errors and warnings only # 3 Verbose Show equivalent shell commands being executed # 4 Extra Verbose Show extra verbose information # 5 Debug mode Everything verbose 4 # Same as "verbose" above, but controls the amount of data sent to the # logfile, if one is being used. The default is 3. # loglevel 3 logfile /var/log/rsnapshot lockfile /var/run/rsnapshot.pid #exclude /mnt/snapshot_root/root/apps/ #rsync_short_args -a rsync_long_args --sparse --hard-links --delete --numeric-ids --relative --delete-excluded --rsync-path=rsync_wrapper.sh ssh_args -p 22 -c arcfour one_fs 1
I'll quickly go through what the changes mean:
snapshot_root
: Where the backups are kept. If this is in a mount point, I suggest you use a subdirectory in the mount point, so if /opt/ is a separate mount point, keep the backups in say /opt/snapshots instead of /opt. BTW: I had bad luck with samba and even some NAS's NFS implementations. Stick to ext3/4/xfs, esp. if you expect to do bare-metal recovery.no_create_root 1
: This means that it will not create the snapshot_root
directory, you will have to do that yourself, and if it can't find it, it will refuse to backup. Why is this a good idea? Because imagine you keep your backups on a USB, but the USB isn't mounted or attached. With the default, it will happily start creating the missing mount point and doing the backup on your root filesystem. So in most cases, it's best for it to fail than write the backups on the wrong disk.cmd_ssh
: Needed for remote backupsinterval daily 7
: This is difficult to explain; rsnapshot does not do any scheduling, so the 'daily' keyword here is just a name. The actual backup schedule is handled by you with cron, so if you set cron to run "rsnapshot daily" hourly, your daily actually means hourly.verbose
: I like increasing the verbosity of a manual run, so that you can see what's going on. logfile
: where the logs are kept; this has a separate verbosity level which is fine at it's default of 3rsync_long_args
--sparse
: Some apps like Zimbra 8/OpenLDAP's MDB backend, or some KVM virtual machine images, are kept as sparse files, which means that they seem to take up a lot of space (31 GB here):--sparse
, it would use the same space on disk as the original, 2.3GB. This makes it easier to recover as well without taking up more space than before.--hard-links
: Some apps like Cyrus and Zimbra 8 use hard-links a lot, and by default, rsync does not retain the hard-link information, so you end up with a much much larger backup than the original (along with problems during restoration). So adding --hard-links
would fix this (at the expense of extra memory usage).--rsync-path=rsync_wrapper.sh
: this calls the script we made to run rsync with sudo.ssh_args -p 2251 -c arcfour
: Change 22 to your SSH port, and the -c arcfour
uses a less-secure but less-CPU intensive encryption algorithm useful in LANs. Remove -c arcfour
when doing WAN backups.one_fs 1
- Stick to one filesystem when backing up a mount point. I prefer this, as otherwise backing up / would back up unnecessary things like /dev and /proc, or mount points you don't need. However, the downside is that you have to remember to manually include every mountpoint in the what-to-backup settings below.Then at the end of the config file, there should be a list of things that should be backed up. Remove the defaults, and add your own lines like:
############################### ### BACKUP POINTS / SCRIPTS ### ############################### # No LVM snapshots: backup backupuser@1.2.3.4:/ some.server.com/ backup backupuser@1.2.3.4:/boot some.server.com/ backup backupuser@another.server.com:/ another.server.com/ backup backupuser@another.server.com:/opt another.server.com/ backup backupuser@another.server.com:/boot another.server.com/ # or with LVM snapshots with your prepare-* scripts: backup backupuser@1.2.3.4:/mnt/snapshot_root some.server.com/ backup backupuser@1.2.3.4:/mnt/snapshot_opt some.server.com/ backup backupuser@1.2.3.4:/boot some.server.com/ # You can customize options per server. So say one of your servers has the SSH port 6824 instead of 22: backup backupuser@1.2.3.4:/something/ web.domain.com/ ssh_args=-p 6824 # Or you want to enable rsync stream compression on WAN clients to reduce bandwidth: backup backupuser@web.server.com:/ web.server.com/ +rsync_long_args=--compress
You can just run rsnapshot daily
to see if the backup works (it will most probably complain about errors in your config, fix them as it suggests). If you use LVM snapshots, you can temporarily manually run the /home/backupuser/prepare-backup-start.sh script on your main server before running "rsnapshot daily", and remember to run prepare-backup-finish.sh after you are done testing.
If everything works fine, you can use a simple backup script like:
#!/bin/bash MAIN_SERVER="1.2.3.4" USER="backupuser" RSNAPSHOT="/usr/bin/rsnapshot" # uncomment if you have LVM #ssh $USER@$MAIN_SERVER "sudo /home/$USER/prepare-backup-start.sh" $RSNAPSHOT daily #ssh $USER@$MAIN_SERVER "sudo /home/$USER/prepare-backup-finish.sh"
or you could use a more paranoid one; modify it to make it integrate with your monitoring system (edit the top variables and function):
#!/bin/bash MAIN_SERVER="1.2.3.4" MAIN_SERVER_PORT="22" USER="backupuser" RSNAPSHOT="/usr/bin/rsnapshot" LOG="/var/log/rsnapshot" TIMEOUT=4 # Edit this: fail_action)() { # Fill this in with how to notify you if the backup fails. # Input ($1) : string with a message about what failed } #--------------- #--------------- # input: none # output: 0 is success, 1 if not pingable try_ping () { pingcount=$(ping -c 1 -W $TIMEOUT $1 | grep 'received' | awk -F',' '{ print $2 }' | awk '{ print $1 }'); if [ $pingcount -eq 1 ] ; then return 0; fi return 1; } # input: message # output: nothing (writes message to $LOG with rsnapshot like date) write_log() { echo `date +"[%d/%b/%Y:%k:%M:%S]"` $1 | tee -a $LOG } # input: return_code message_string # output: if ok, write to log, otherwise write to log and exit with status 1 check_status() { RETURN=$1 MESSAGE=$2 if [ $RETURN == 0 ]; then write_log "$MESSAGE OK" else write_log "$MESSAGE FAILED!" write_log "Cleaning up & aborting backup" fail_action "$MESSAGE failed." ssh -p $MAIN_SERVER_PORT $USER@$MAIN_SERVER "sudo /home/$USER/prepare-backup-finish.sh" exit 1 fi } try_ping $MAIN_SERVER check_status $? "ping $MAIN_SERVER" ssh -p $MAIN_SERVER_PORT $USER@$MAIN_SERVER "sudo /home/$USER/prepare-backup-start.sh" check_status $? "prepare-backup-start.sh" $RSNAPSHOT daily check_status $? "rsnapshot daily" sleep 2 ssh -p $MAIN_SERVER_PORT $USER@$MAIN_SERVER "sudo /home/$USER/prepare-backup-finish.sh" check_status $? "prepare-backup-finish.sh" exit 0
If the server is not pingable; use this instead for the try_ping () function (make sure nc is installed):
# input: none # output: 0 is success, 1 if not pingable try_ping () { nc -w $TIMEOUT -z $MAIN_SERVER $MAIN_SERVER_PORT RETURN=$? if [ $RETURN -eq 0 ] ; then return 0; fi return 1; }
If you do not have LVM snapshots, remove the prepare-backup-*.sh lines above, and then replace the lines after $RSNAPSHOT with this:
$RSNAPSHOT daily RETURN=$? # note that rsnapshot returns 2 as a warning if a file changed # which will be the case if there are no snapshots if [[ $RETURN == 0 || $RETURN == 2 ]]; then write_log "Backup OK" exit 0 else write_log "rsnapshot returned $RETURN, backup FAILED" check_status 1 "rsnapshot return code $RETURN" fi
Test it. If it's fine, make a cron job:
45 23 * * * /root/scripts/rsnapshot-backup.sh
Don't just rely on the script telling you if there is a bad return code; find a way to get notified if the script doesn't run at all for some reason. If you have a Nagios plugin compatible monitoring system, you can use something like check_newest_file_age on a directory in daily.0 for each of your hosts that you expect should change every day (eg. var/log); so you get notified if the backup seems stale.