SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
New User Seeking Some Clarification
Author Message
Post New User Seeking Some Clarification 
Alan <alan < at > ufies.org>
wrote the following on Fri, 9 Jan 2004 13:35:37 -0800

However, I found a bit of a gotcha. When I moved from tar/scp to
rdiff-backup I was dumping my database everynight to a .sql file and
then bziping it and including that in my nightly tar. When I moved to
rdiff-backup I left it like that until I realized that because of the
bzip the .sql file was completely different each time, so the entire
file was transfered as an increment. When I removed the bzip part of
the process the base file was larger, but the increments were much
smaller because they were simply text diffs of new/changed data, not a
binary diff of an entirely changed file. Something to think about
anyway.

This is a tough problem. The xdelta program (similar to rdiff) would
decompress the files to better find the differences. But that leads
to its own problems because some files get really huge when you
decompress them...

I think there is a patch to gzip floating around that adds an option
to reset the buffer at certain clever intervals. The end result is
that similar data gzipped stays similar---one extra byte at the
beginning doesn't result in two totally separate gzip archives.

Perhaps eventually that patch will become standard, and programs that
compress changing files will use diff-friendly compression.


--
Ben Escoto

Post New User Seeking Some Clarification 
Hi Ben,
* Ben Escoto <bescoto < at > stanford.edu> [27. Jan. 2004]:
Alan <alan < at > ufies.org>
wrote the following on Fri, 9 Jan 2004 13:35:37 -0800
[...]
until I realized that because of the
bzip the .sql file was completely different each time, so the entire
file was transfered as an increment. When I removed the bzip part of
the process the base file was larger, but the increments were much
smaller because they were simply text diffs of new/changed data, not a
binary diff of an entirely changed file.

I think there is a patch to gzip floating around that adds an option
to reset the buffer at certain clever intervals. The end result is
that similar data gzipped stays similar---one extra byte at the
beginning doesn't result in two totally separate gzip archives.

This is in Debian unstable since almost one year:

gzip (1.3.5-4) unstable; urgency=low

* merge patch from Rusty Russell that adds --rsyncable option to gzip.
This modifies the output stream to allow rsync to transfer updated .gz
files much more effectively. The resulting .gz files should be compatible
with the existing gunzip. The plan is that if this works out well for
Debian, the functionality will be included in a future upstream gzip
release. Closes: #116183, #118118, #134741

-- Bdale Garbee <bdale < at > gag.com> Thu, 13 Feb 2003 23:50:23 -0700

But I did not test if this really helps reducing the size of
increments of *.gz files.

Gregor

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB