SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
Hard-linking across different backups
Author Message
Post Hard-linking across different backups 
Hi,

I need to make backups of several hosts. I want to make complete
system backups so that restoring a system to a working state does
not require anything except just restoring the backup. Many of the
machines will have lots of identical files because they have the
same operating system distributions.

I have been thinking of using the "hardlink"[1] utility or something
similar to hard-link identical files across backups from several
hosts. This would give huge savings in the required storage capacity.

Are other people doing this with production systems? What are the
best practices for doing this? Are there any special considerations
or caveats that people have encountered?


I am worried that the automatic hard-linking might result in some
undesired situations, for example one scenario:

1. The host has a configuration file foo.conf and a default
configuration template foo.conf.default which are identical because
there was no need to customize foo.conf.

2. The files are backed up. Because they are identical including
the time stamps they will be hard-linked together so that it
essentially becomes one file with two/several names.

3. Something happens and the host is restored from backup.

4. Later on someone needs to make changes to foo.conf. As a result
also the other file, foo.conf.template, gets messed up because it
is now actually the same file as foo.conf because they were hard-linked
together in the backup which was restored.


If the machines were based on some common ancestor (like a hard
disk image of the base OS) it would be easy to make a dummy backup
directory containing the files from the original image and then
just run the first rsync for each host with --link-dest=/path/to/ancestor.
But this is not the case. Also the backups would diverge over time
when updates are installed on the machines and more space would be
wasted (because the updated versions of identical files would no
longer be linked together).


[1] http://jak-linux.org/projects/hardlink/

Best Regards,
--
Janne Snabb / EPIPE Communications
snabb < at > epipe.com - http://epipe.com/


------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss

Post Hard-linking across different backups 
Hallo, Janne,

Du meintest am 03.11.11:

Hi,

I need to make backups of several hosts. I want to make complete
system backups so that restoring a system to a working state does
not require anything except just restoring the backup. Many of the
machines will have lots of identical files because they have the
same operating system distributions.

I have been thinking of using the "hardlink"[1] utility or something
similar to hard-link identical files across backups from several
hosts. This would give huge savings in the required storage capacity.

I use "hardlink" from Dag Wieers,

<http://helmut.hullen.de/filebox/Linux/slackware/ap/hardlink-1.2-i486-1hln.tgz>

for such a job.

I can run it with

hardlink /path/to/mach1/daily.? /path/to/mach2/weekly.?

etc., and it works fine. It's more conservative than some programs like
"fdupes", but that's no real defect.

Are other people doing this with production systems? What are the
best practices for doing this? Are there any special considerations
or caveats that people have encountered?

If a file with (now, for 1 machine) about 20 hard links gets damaged,
then only the backup for 1 machine suffers. And it happens that a file
gets damaged ...

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss

Post Hard-linking across different backups 
Hi,

You won't have problems when when one host changes because rsync will
create a new copy and "break the link" anyway. the real problem would be
if you hardlinked the *original files* with your backups, so don't do
*that*.

We use "hardlink" in our daily post-exec script. It's requires lots of
RAM and takes ages to run but it saved a lot of space.

However, you will notice that it will also hardlink files *within* the
same backup : duplicate configuration files, moved files etc. So you
*must not* restore *any* hardlink - make sure you don't use hardlinks
for anything else but this optimization. We've restored backups and
didn't notice any problems but it depends on the applications you use.

Also, you should only hardlink files that are really the same (data +
metadata : UID, ACLs etc).
For example the same package may have created the same username with
different uids, and a default file belonging to this user : if you
hardlink them, one will have a wrong UID.

I also advice you don't run hardlink and rsync at the same time because
they both build a list of files that they might change, and they don't
like them to be changed by others while they're running.

Lionel Sausin

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB