Greg Freemyer <freemyer-ml < at > NorcrossGroup.com>
wrote the following on Fri, 02 Jan 2004 14:39:18 -0500
The writeup at:
http://www.usenix.org/events/hotos03/tech/full_papers/henson/henson_html/hash.html
in effect says that 'compare-by-hash' is not the perfect choice for
a rdiff-backup type of program. Instead, it says some kind of state
information should be added to the algorithym to ensure accuracy.
I don't know that I am concerned, but I am curious if rdiff-backup
takes any precautions against hash-collisions.
No, rdiff-backup takes no special precautions. As far as I can tell,
the paper makes the point that even cryptographically strong hashes
may be breakable in the future. This is a pretty good objection,
especially since librsync uses md5 instead of the stronger SHA1.
But in rdiff-backup the most an attacker could do would be to produce
a file that had the same signature, causing rdiff-backup to skip over
parts of the file. But an attacker can already do much better than
that: just use utime(), chmod, and such to make it look like your file
hasn't changed. Plus this method doesn't take years of research.
So anyway, rdiff-backup isn't intended to defeat an attacker trying to
make it look like his files haven't changed.
I did find this quote from the article interesting:
BitKeeper keeps state about each file under source control and knows
what changes have been made since the last time each tree was
synchronized. When synchronizing, it sends only the differences
since the last synchronization occurred, in compressed form. In
comparison to rsync, BitKeeper provides similar and sometimes better
bandwidth usage when simply synchronizing two trees without
resorting to compare-by-hash.
Does anyone know how BitKeeper can send only the differences without
using a compare-by-hash algorithm? I don't see how this is possible
unless BitKeeper has abandoned the usual edit w/arbitrary editor and
then checkin scheme.
--
Ben Escoto
