SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
rdiff-backup changes .gz files
Author Message
Post rdiff-backup changes .gz files 
Hi,

I do full Linux root partition backups with rdiff-backup
1.2.8 on Debian squeeze. As I have had numerous instances of
silent corruption, I always verify my backups, in this case with

rdiff-backup --compare-full --exclude-sockets --exclude-other-filesystems

This verify gives me several "metadata the same, data changed"
errors, that do not make sense. Examples are

/usr/share/doc/libssl-dev/demos/tunala/INSTALL.gz
/usr/share/doc/lrzsz/NEWS.gz
/usr/share/doc/openssl/doc/apps/rsa.pod.gz
/usr/share/doc/xfig/LATEX.AND.XFIG.zh_CN.gz

When I compare manually, these files are indeed different,
but gzip -tv tells me both are fine and they decompress
to bit-identical files.

One thing I have noticed is that these files are all pretty old.

What is going on here? As I also use md5sum for integrity checks,
modifying .gz files in a backup is really not a good idea., even
if they decompress the same. In fact changing anything when backing
up files is not a good idea. Also, with this problem, I have to
check manually for each verification error whether it is truely an
error or an instance of this not-really corruption.

To me this looks like a bug in rdiff-backup.
I have observed this only with .gz files.

Arno
--
Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: arno < at > wagner.name
GnuPG: ID: 1E25338F FP: 0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F
----
Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans

If it's in the news, don't worry about it. The very definition of
"news" is "something that hardly ever happens." -- Bruce Schneier

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post rdiff-backup changes .gz files 
Hi,

I found a bit more time and found also some .pm files
with the same issue. Looking into the files though,
I have a suspicion, namely that for some reason these
files were changed by a Debian update, that then did
set the same original metadata after installing new
files. The suspicion comes from the differences being date
format changes only, no corruption at all.

So new question: Is there a way to make rdiff-backup
actually do a checksum comparison for exisitng files in
order to determine what to backup? I know, that would
be significantly slower, but as least on Debian system
partitions, a metadata comparison is obviously not
enough. As a substitute, can I force rdiff-backup
to backup specific files even if the metadata
is the same?

Regards,
Arno




On Sat, Jul 09, 2011 at 10:19:59PM +0200, Arno Wagner wrote:
Hi,

I do full Linux root partition backups with rdiff-backup
1.2.8 on Debian squeeze. As I have had numerous instances of
silent corruption, I always verify my backups, in this case with

rdiff-backup --compare-full --exclude-sockets --exclude-other-filesystems

This verify gives me several "metadata the same, data changed"
errors, that do not make sense. Examples are

/usr/share/doc/libssl-dev/demos/tunala/INSTALL.gz
/usr/share/doc/lrzsz/NEWS.gz
/usr/share/doc/openssl/doc/apps/rsa.pod.gz
/usr/share/doc/xfig/LATEX.AND.XFIG.zh_CN.gz

When I compare manually, these files are indeed different,
but gzip -tv tells me both are fine and they decompress
to bit-identical files.

One thing I have noticed is that these files are all pretty old.

What is going on here? As I also use md5sum for integrity checks,
modifying .gz files in a backup is really not a good idea., even
if they decompress the same. In fact changing anything when backing
up files is not a good idea. Also, with this problem, I have to
check manually for each verification error whether it is truely an
error or an instance of this not-really corruption.

To me this looks like a bug in rdiff-backup.
I have observed this only with .gz files.

Arno
--
Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: arno < at > wagner.name
GnuPG: ID: 1E25338F FP: 0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F
----
Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans

If it's in the news, don't worry about it. The very definition of
"news" is "something that hardly ever happens." -- Bruce Schneier

--
Arno Wagner, Dr. sc. techn., Dipl. Inform., CISSP -- Email: arno < at > wagner.name
GnuPG: ID: 1E25338F FP: 0C30 5782 9D93 F785 E79C 0296 797F 6B50 1E25 338F
----
Cuddly UI's are the manifestation of wishful thinking. -- Dylan Evans

If it's in the news, don't worry about it. The very definition of
"news" is "something that hardly ever happens." -- Bruce Schneier

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post rdiff-backup changes .gz files 
On 07/17/2011 12:35 PM, Arno Wagner wrote:
Hi,

I found a bit more time and found also some .pm files
with the same issue. Looking into the files though,
I have a suspicion, namely that for some reason these
files were changed by a Debian update, that then did
set the same original metadata after installing new
files. The suspicion comes from the differences being date
format changes only, no corruption at all.

So new question: Is there a way to make rdiff-backup
actually do a checksum comparison for exisitng files in
order to determine what to backup? I know, that would
be significantly slower, but as least on Debian system
partitions, a metadata comparison is obviously not
enough. As a substitute, can I force rdiff-backup
to backup specific files even if the metadata
is the same?

No way I know of to do that. I find it surprising that .gz and .pm
files would be re-issued without changing the time stamps, but one
place where files do get changed without altering either the size or
modification time is during prelinking of ELF shared libraries and
binaries. Now, the file size will increase the first time a file is
prelinked, and that change will be noticed by rdiff-backup. But,
subsequent runs of prelink will just alter internal address fields,
so the size does not change. In all cases the modification time is
preserved.

Frankly, I view this as a mixed blessing. If all of the altered
files got backed up every time prelink was run, my incremental
backups would balloon quite seriously, and it matters very little
which prelinked version gets restored. The loader would detect the
out-of-date prelink information and do the same work it would have
to do with a non-prelinked binary, and everything will get back in
sync the next time prelink is run. All that matters is that
rdiff-backup records a checksum that is consistent with the file it
_did_ back up, and that is always the case.

One thing that _does_ have a problem with these files is using rsync
to maintain two copies of an rdiff-backup archive, since rsync would
happily update metadata and increment files but fail to notice that
the mirror file itself had changed. Avoiding use of the "-c" option
(which would be prohibitively expensive on a large archive) gets a
bit messy.

--
Bob Nichols "NOSPAM" is really part of my email address.
Do NOT delete it.


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post rdiff-backup changes .gz files 
On 07/17/2011 12:35 PM, Arno Wagner wrote:

I have a suspicion, namely that for some reason these
files were changed by a Debian update, that then did
set the same original metadata after installing new
files. The suspicion comes from the differences being date
format changes only, no corruption at all.

So new question: Is there a way to make rdiff-backup
actually do a checksum comparison for exisitng files in
order to determine what to backup?

Better question: apparently Debian issues files with the same name and
timestamps but with different contents? I'd say that shouldn't happen.

Other question: can rdiff-backup be instructed to record and match a
file's ctime (change time, not creation time) in addition to the mtime for
filesystems that do support that?

I just tried chmod-ing a file, which changes ctime but not mtime, and
indeed rdiff-backup skipped the file.
Next, I copied (cp -a, so preserving metadata) the file to .blah, removed
the original, and renamed the .blah to original name.
Rdiff-backup then updated the directory (since it had updated mtime after
file copy and remove) but the file-under-test was still skipped.

Maybe we could check how rsync handles these cases?
Maybe we could also store & check inode number, but that wouldn't catch
in-place modifications followed by metadata resets.


On Sun, 17 Jul 2011, Robert Nichols wrote:

All that matters is that
rdiff-backup records a checksum that is consistent with the file it
_did_ back up, and that is always the case.

Given my example above, this may not always be the case. At least for most
unix file systems.


I'd say that changing the contents of a file and resetting the file's
metadata is just asking for trouble. But maybe I'm missing the obvious and
this may happen regularly, e.g. when re-building a Debian package. If
that's the case, then I think this should be looked into. A lot of people
are using rdiff-backup as a backup tool for Debian systems, and if this
wreaks havoc on verify runs, this doesn't look good.


--
Maarten


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post rdiff-backup changes .gz files 
On 07/17/2011 06:08 PM, Maarten Bezemer wrote:


On 07/17/2011 12:35 PM, Arno Wagner wrote:

I have a suspicion, namely that for some reason these
files were changed by a Debian update, that then did
set the same original metadata after installing new
files. The suspicion comes from the differences being date
format changes only, no corruption at all.

So new question: Is there a way to make rdiff-backup
actually do a checksum comparison for exisitng files in
order to determine what to backup?

Better question: apparently Debian issues files with the same name and
timestamps but with different contents? I'd say that shouldn't happen.

Other question: can rdiff-backup be instructed to record and match a file's
ctime (change time, not creation time) in addition to the mtime for filesystems
that do support that?

I just tried chmod-ing a file, which changes ctime but not mtime, and indeed
rdiff-backup skipped the file.
Next, I copied (cp -a, so preserving metadata) the file to .blah, removed the
original, and renamed the .blah to original name.
Rdiff-backup then updated the directory (since it had updated mtime after file
copy and remove) but the file-under-test was still skipped.

Maybe we could check how rsync handles these cases?
Maybe we could also store & check inode number, but that wouldn't catch in-place
modifications followed by metadata resets.


On Sun, 17 Jul 2011, Robert Nichols wrote:

All that matters is that
rdiff-backup records a checksum that is consistent with the file it
_did_ back up, and that is always the case.

Given my example above, this may not always be the case. At least for most unix
file systems.

You misunderstand me. All I'm saying is that the recorded checksums are
always consistent with the files that are stored in the mirror and
increments. That indeed might not match the source files if there were
hidden changes.

--
Bob Nichols "NOSPAM" is really part of my email address.
Do NOT delete it.


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post rdiff-backup changes .gz files 
Hi,

Arno Wagner <arno < at > wagner.name> wrote:
I found a bit more time and found also some .pm files
with the same issue. Looking into the files though,
I have a suspicion, namely that for some reason these
files were changed by a Debian update, that then did
set the same original metadata after installing new
files. The suspicion comes from the differences being date
format changes only, no corruption at all.

So new question: Is there a way to make rdiff-backup
actually do a checksum comparison for exisitng files in
order to determine what to backup? I know, that would
be significantly slower, but as least on Debian system
partitions, a metadata comparison is obviously not
enough. As a substitute, can I force rdiff-backup
to backup specific files even if the metadata
is the same?

Last time we talked about this is just a few weeks back:
http://lists.nongnu.org/archive/html/rdiff-backup-users/2011-03/msg00016.html
The context was a bit different, but same problem.

Patrick
--
Sent from my phone.

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post rdiff-backup changes .gz files 
On Sun, 17 Jul 2011, Robert Nichols wrote:

All that matters is that
rdiff-backup records a checksum that is consistent with the file it
_did_ back up, and that is always the case.

Given my example above, this may not always be the case. At least for most
unix file systems.

You misunderstand me. All I'm saying is that the recorded checksums are
always consistent with the files that are stored in the mirror and
increments. That indeed might not match the source files if there were
hidden changes.

Well, maybe you're right and I misread your text. However, even though it
is essential to have the internal mirror state in good order, the fact
that there appear to be situations in which one would be unable to do a
restore of the source to a working and consistent state, bothers me more.

In fact, this makes using rdiff-backup an insufficient way of making
backups, which happens to be the primary function of the program :-S

--
Maarten

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB