David Cantrell wrote:
Nathan Rosenquist wrote:
Also, not to spread FUD, but I"m almost positive that meta information
about the files (like timestamps, maybe even permissions, etc) might get
lost if 2 identical files with different meta information are merged.
They will get lost. EVERYTHING about a file apart from its name is
stored in the inode, so if two files share an inode they have to share
all that other stuff.
That said, this works surprisingly well in many cases. For instance,
if a bunch of machines have the same distribution and packages
installed, then it's pretty likely that their times and permissions
and whatnot are the same. I also have used this when I needed to
transition services between machines. I'll either cp -al the big data
on the backup server before the first rsnapshot, or after the
rsnapshot use a script to stitch everything back together.
Rather than calling stat() n^m times, I recommend calling it n times and
using join("," (stat($file))[0, 2..10]) as the key in a hash of
arrayrefs. This assumes you have the memory available for a Very Big
data structure.
A trick I've seen (and later used) was to run a find across the
filesystem, emitting the inode, size, relevant timestamps,
permissions, owner, and path, into a giant file. Then sort that file
using sort, which is pretty efficient for sorting such a huge file,
compared to anything you're likely to whip up in Perl. _Then_ feed
that file to a Perl script which scans it looking for potential
matches (different inodes, but otherwise same information) and groups
them together, then runs checksums on them to winnow things down
another level. Lastly, hardlink anything which seems likely, possibly
with a byte-for-byte compare beforehand.
It sounds convoluted, but by using the external sort, the comparison
pass doesn't really need much memory, so it's pretty fast (compared to
the single-pass "collect all the information and process" version).
-scott
-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
