View Full Version : Let's Talk Incremental Backup
zackf
2008-08-20, 11:56 AM CDT
Here is what's going on. I have a NAS that I want to backup data to. The NAS mounts just like anything in let's say /mnt/backup
I want a solution to perform incremental backups for say my /home directory and my research has led me to rdiff and rsync, BUT - and correct me if I'm wrong here - with those you have to ssh into the other machine which my trusty little NAS doesn't do.
So I am thinking cron job here, but I am not sure how to how to tell it to only copy files that have changed since the last backup. Or for that matter it doesn't have to be since last backup, the past 24 hours would be ok and a few files would get backed up twice no biggy there.
So for weekly backup I will set up a job that says 0 2 * * 6 cp -r /home /mnt/backup (the intention is to do a full backup every Sunday at 2am)
For my daily incremental backup 0 2 * * * [fill in the rest]
Are you feeling it? Or am I totally off base.
Zack
stevea
2008-08-20, 01:16 PM CDT
First off "The NAS mounts just like anything", isn't good enough. Linux files can have many attributes and AFAIK all remote (network) file systems cannot support all the attributes. Not even NFSv4.
By file attributes I mean ownership, group, permission bits, (rwxrwxrwx), ACLs(access control lists) and SELinux context. Most Fedora users use the basic (own.grp,perm) and the SELinux context, but not ACLs. If the remote system is serving a FAT32 disk you'll lose almost all attributes when you copy a file there and back. NFSv3/v4 serving an ext2/3 and SMB shares serving NTFS make a halfhearted attempt attempt to preserve own,grp,perms and hacked versiol of ACLs. NFSv4 is being rev'ed to support SELinux context 'Zbits) in the furute.
So - the safest way to backup files to any remote system is to use an archive format that internally stores the attributes. Like "tar" with the "--selinux" option (or maybe the "--acl" option if needed). I'll leave it up to you to determine if this is worthwhile for your application. If you are backing up a whole system then I'd strongly recommend it - reconstructing even the perm bits is a nightmare. OTOH for a personal /home backup it may be quite practical to just capture the file data, and reconstruct the (own,grp,parm) info and let SELinux add the context.
Most Lin users never heard of ACLs and most times we can just let the kernel reconstruct SELinux conotext on the fly (except for executables). So the biggest deal is to make sure that the share preserved file ownership,group,perms). As root copy some files to share and back and see what's preserved. Make sure you try several owneres & groups and a range of perms.
======
correct me if I'm wrong here - with those you have to ssh into the other machine which my trusty little NAS doesn't do.
Well you DO have to connect to an rsync server on whatever is reading/writing the files, BUT this doesn't have to be the NAS. Since you have a shared file system mounted you can run both the rsync service (targeting the share) and the rsync/rdiff client too on your box. For example ....
You'd set up an rsync server config file, perhaps like:
/etc/rsyncd.conf:
motd file = /etc/rsyncd.motd
log file = /var/log/rsyncd.log
pid file = /var/run/rsyncd.pid
lock file = /var/run/rsync.lock
use chroot = no
max connections = 10
timeout = 300
# hosts allow = <your list>
[universe]
path = /mnt/backup
comment = all data on MYNAS
uid = root
gid = root
read only = no
list yes
[bkup]
path = /mnt/backup/bkup
comment = backup data on MYNAS
uid = root
gid = root
read only = no
list yes
Then switch on the rsyncd service ((It used xinetd as an "on demand" service last time I set it up)).
Then you can just do commands like:
rsync -av /home/fred localhost::bkup/home/fred
Note that this means when you backup a file you are using an rsync client command to communicate over the "localhost" "lo" interface to an rsync server that is reading/writing files in your mounted share "/mnt/backup". This is a little inefficient but if you like the features of rsync it's a big win. It is writing files to the share, so the attribute problem potentially remains.
====
FWIW many NAS'es run Linux internally and the several I've seen use an ext2(rarely ext3) file system. They universally share the ext2 file system with samba. So the attribute issue is the same as for samba shares of an ext2. Many NAS boxen are hackable and can be made to run rsync and NFS too !! I recently bought a DNS-323 NAS just for this purpose. It's got a low-end ARM processor but you can install a stripped down debian Linux release and do whatever you want with it ((observing the memory and performance limitations)). This is an advanced topic.
=======
Try "yum info safekeep-client bacula-client BackupPC rdiff-backup fwbackups" for some packages to help perform backups. Note that any backup scheme that requires a server/client network model can run the client&server on the same box to your NAS share just like the rsync example above.
The ancient and venerable "amanda" tape backup package still works quite nicely for file oriented backup too.
stevea
2008-08-20, 01:43 PM CDT
BTW - I didn't mean to imply that rsync is a backup program. It it not. It's just anet copy program with some efficiencies.
savage
2008-08-20, 03:20 PM CDT
I use the following to do my backups, instead of using rsync etc. (I've never got around to learning it) I use simple old tar, with append, the script creates daily incremental backups for each month, so... you get a .tar file for each month, that is updated daily, a new tar file is created each month.
There is one major downside to this, if you have a power outage during the backup, 9/10 that months backup will get shafted.
You can modify the script to use 2 backup files per month, and alternate between them each day, such that worst case scenario, your backup data would be 2 days old, rather than a month.
The script (simple enough):#!/bin/sh
backuppath="/path/to/where/you/want/your/backup/files"
sourcedir="/location/to/backup"
curdate=`date "+%Y-%m_%B"`
cd $sourcedir
nice tar --ignore-failed-read --same-owner -s -p -uf $backuppath/home_$curdate.tar .
exit 0Just change the backuppath and sourcedir to the locations you want the backup, and where you want backing up.
marcrblevins
2008-08-21, 02:46 AM CDT
There is one major downside to this, if you have a power outage during the backup, 9/10 that months backup will get shafted.
Savage, can your clarify what your mean? You lose September & October backups during poweroutage? Are you keeping everyday, not recycle the space?
I use tar on my backup script as well, backups to my 3rd hard drive. Mine does full backups on first of the month & Sunday, the rest of the week are incremental. Recycle the space weekly. I should tinker with it to do -selinux.
Is your NAS a big drive? Give which ever script you like to use.
My scripts are listed in my sig. ybackup.cron does the backup and zbackup gives you the listed of the backup drive thru e-mail daily.
marcrblevins
2008-08-21, 02:48 AM CDT
This is where I got the backup script and tinkered it for my Fedora taste.
http://www.faqs.org/docs/securing/
savage
2008-08-21, 07:33 AM CDT
Hi, sorry if it's a bit vague, I've just whacked a screeny up of the backups share so you can see what I mean, it shows the backup files that I get from my script.
At this exact moment in time, the file home_2008-08_August.tar gets incremental backups every 24 hours, but any prior backups are fine, so if there was a power outage during a backup, it would be the August backup that would get corrupted.
Edit: It hasn't happened, but if the backup did get corrupted, once the corrupt file has been deleted, the next time the script runs, a full backup will be done, and then incremented daily from then until the end of the month.
Once a month, the script creates a new full backup (so next will be home_2008-08_September.tar) that will then be incremented daily for that month until October.
zackf
2008-08-21, 01:26 PM CDT
I'm thinking this is a lot of good info.
So savage, as I understand it, you basically rotate full backups?
savage
2008-08-21, 01:39 PM CDT
Yeah basically, at the end of this month, the April backup goes bye bye, and a September one will be automatically created, and incremented until October.
I do the rotation manually, as the backups contain /home, and I extract my parents company info out of that to give them their own backup.
vBulletin® v3.7.3, Copyright ©2000-2009, Jelsoft Enterprises Ltd.