SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
Re[4]: Verify times increasing
Author Message
Post Re[4]: Verify times increasing 
I mis-posted, and should have replied to the list, instead of just
Daniel...so here it is...

---
I'm not sure what you're doing with your --verify... [I'm confused, I
think...]

It *sounds* like you want a full CRC style check of the *current*
files after the backup is complete. (i.e. File X gets updated today with a
delta, and you want to verify that file X is the same both on the
source and destination locations/drives after the "backup" is complete.)

If that's the case, I think you already get it. (It's "built-in" to
RDiff-backup.)

Before I type a lot of drivel, let me know if that's what you
want/intend.

--
If I understand you right, this is a very different animal than a --verify...



-Greg

Greg wrote:
I know Matt corrected this post, but I wanted to address this:

---
If you do a --verify-at-time xyz where xyz is your oldest backup, it
should verify all files in that backup - so every delta should be
applied. This should verify that all delta's (backups) are good and
functioning.

[In short, it "verifies" that for each file for which a successful
verify is returned, that the most current file, all applicable
delta's and meta-data
are good and functioning properly.]

This is what I thought. It is good to have it confirmed. Thanks.

---
However, if files were added after the initial backup, I'd guess that
a verify won't check the delta's for those files - since they don't
exist in the set at time xyz

So, while a verify to your oldest backup is good, it's not
comprehensive for all files that have deltas+meta-data.

This seems to be a weakness in the rdiff-backup verify mechanism. I
think there is general consensus here on the list that this could be
improved since what many people are looking for is a way to verify
that their backup archive (including all past revisions) is free from
corruption.

---
I'm not aware, so if I'm wrong perhaps someone could correct me, but
I'd like a command to, in essence, do a comprehensive
--verify-all-files-in-the-archive. [I'm pretty sure such a thing
doesn't exist, at least I never saw it in the docs.]

This would apply all deltas to *all* files (back to the oldest copy)
and compare the stored
hashes at the time of backup to the rebuilt file. [Note all the
files, not just those in a particular target date/delta.]

This wouldn't verify that every file would be correct in every delta
version, but it would, I think, get as close as one might come to
that.

I agree, something like this would be great. Although with the speed
issue's I'm having it may not be practical (i.e. time feasible) to
reconstruct every file this way before comparing it to a signature
hash. I would propose that rdiff-backup store some additional meta-
data which would consist of signature hashes of the delta files as
they exist on the disk after rdiff-backup is finished with a backup
(similar to what yafic does - http://www.saddi.com/software/yafic/).
This should make the verification process much faster (yafic takes
less than two hours to verify an rdiff-backup repo that takes over
eight hours to --verify-at-time on my setup). Note that it would not
replace the --verify-at-time functionality, which would still be
necessary to verify the integrity of files as they existed before the
backup. But it would provide a fast way to verify the integrity of an
rdiff-backup repository.

Then again, doing an intermediate check of the hash and file at each
delta point wouldn't take too much longer [or so I think without a
lot of time invested in pondering it] - so if this option/feature
doesn't
exist and one were to code it, it might not be much more code or
difficulty...

Ease of implementation may be an argument that favors your proposal.
My proposal adds a completely new layer of integrity checking on top
of the existing rdiff-backup functionality.

~ Daniel



_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL:
http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki



--
Best regards,
listserv mailto:listserv.traffic < at > sloop.net



_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post Re[4]: Verify times increasing 
[Now I'm bottom posting... Smile]

---
I'm not aware, so if I'm wrong perhaps someone could correct me, but
I'd like a command to, in essence, do a comprehensive
--verify-all-files-in-the-archive. [I'm pretty sure such a thing
doesn't exist, at least I never saw it in the docs.]

This would apply all deltas to *all* files (back to the oldest copy)
and compare the stored
hashes at the time of backup to the rebuilt file. [Note all the
files, not just those in a particular target date/delta.]

This wouldn't verify that every file would be correct in every delta
version, but it would, I think, get as close as one might come to
that.

I agree, something like this would be great. Although with the speed
issue's I'm having it may not be practical (i.e. time feasible) to
reconstruct every file this way before comparing it to a signature
hash. I would propose that rdiff-backup store some additional meta-
data which would consist of signature hashes of the delta files as
they exist on the disk after rdiff-backup is finished with a backup
(similar to what yafic does - http://www.saddi.com/software/yafic/).
This should make the verification process much faster (yafic takes
less than two hours to verify an rdiff-backup repo that takes over
eight hours to --verify-at-time on my setup). Note that it would not
replace the --verify-at-time functionality, which would still be
necessary to verify the integrity of files as they existed before the
backup. But it would provide a fast way to verify the integrity of an
rdiff-backup repository.

Let me address this. Simply checking the a hash of the delta isn't
nearly enough. If the meta-data on how to apply that delta is gone or
corrupt, you're screwed too.

So, if you're going to calc and store a hash, you should store a hash
of both the meta-data and the delta.

Small nit, but thought I should mention it.

[I should note that I have never examined the code, so I'm speaking
from a theoretical point of view - but I've asked about these things
pretty carefully, so I'm pretty sure I'm clear on how things are
handled... I'm welcome to be corrected if I'm wrong.]

-Greg



_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post Re[4]: Verify times increasing 
On Nov 24, 2009, at 3:44 PM, listserv.traffic < at > sloop.net wrote:

I'm not sure what you're doing with your --verify...

It *sounds* like you want a full CRC style check of the *current*
files after the backup is complete. (i.e. File X gets updated with a
delta, and you want to verify that file X is the same both on the
source and destination locations/drives.)

Yes, although it's more of an internal consistency check within the
rdiff-backup repository itself. I'm looking for a way to quickly
verify the integrity my entire rdiff-backup repository.

In my scenario the repository is synced to an external USB drive that
gets rotated each day (i.e. each day I put yesterday's drive in
storage and bring a different drive out of storage to use for the next
backup). I use rsync to transfer my rdiff-backup repository (which
gets updated daily) to the USB drive. Then I run rdiff-backup --verify-
at-time to verify that the files on the USB drive are not corrupt. But
lately this has been taking too long.

Does that make sense?

Yesterday I introduced a utility called yafic into my backup scheme.
Yafic can do a full-repository verification. This works and it's much
faster than rdiff-backup's --verify-at-time, but it's complicated to
setup and I have to ignore all the changes that happen each day when
rdiff-backup updates the repository. It would be nicer to have this
kind of verification built-in to rdiff-backup so I wouldn't have to
filter out all the new delta and metadata files. rdiff-backup knows
which files were added/changed/deleted and would not report those
changes like yafic does. With my proposed enhancement, rdiff-backup
would only report warnings or errors if any part of the repository
became corrupt.

If that's the case, I think you already get it. (It's "built-in" to
RDiff-backup.)

This is good to know. Does this happen during the backup, or only
during --verify... ? I assume you're talking about something
equivalent to --verify-at-time 0B ? Of course, this would only verify
the current mirror.

BTW, is this documented? I'm going to feel stupid if it is, because I
did not see it when I read the docs (multiple times) for rdiff-backup.

~ Daniel



_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

View user's profile Send private message
Post Re[4]: Verify times increasing 
On Tue, Nov 24, 2009 at 04:12:16PM -0500, Daniel Miller wrote:
On Nov 24, 2009, at 3:44 PM, listserv.traffic < at > sloop.net wrote:

[snip]

In my scenario the repository is synced to an external USB drive
that gets rotated each day (i.e. each day I put yesterday's drive in
storage and bring a different drive out of storage to use for the
next backup). I use rsync to transfer my rdiff-backup repository
(which gets updated daily) to the USB drive. Then I run rdiff-backup
--verify-at-time to verify that the files on the USB drive are not
corrupt. But lately this has been taking too long.

wouldn't the fact that you are reading lots of information from the usb
drive be slowing you down, why not run your --verify-at-time on the
local disk repo. and then when using rsync from local to usb us the -c
option to let rsync do a checksum on the files, but yuo are still going
to run into the slowness of USB drives




Does that make sense?

[snip]


BTW, is this documented? I'm going to feel stupid if it is, because
I did not see it when I read the docs (multiple times) for
rdiff-backup.

~ Daniel



_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki


--
"You know, it's hard work to try to love her as best as I can, knowing full well that the decision I made caused her loved one to be in harm's way."

- George W. Bush
09/30/2004
first presidential debate, Coral Gables, Fla.

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post Re[4]: Verify times increasing 
On Nov 24, 2009, at 7:01 PM, Alex Samad wrote:
On Tue, Nov 24, 2009 at 04:12:16PM -0500, Daniel Miller wrote:
On Nov 24, 2009, at 3:44 PM, listserv.traffic < at > sloop.net wrote:

[snip]

In my scenario the repository is synced to an external USB drive
that gets rotated each day (i.e. each day I put yesterday's drive in
storage and bring a different drive out of storage to use for the
next backup). I use rsync to transfer my rdiff-backup repository
(which gets updated daily) to the USB drive. Then I run rdiff-backup
--verify-at-time to verify that the files on the USB drive are not
corrupt. But lately this has been taking too long.

wouldn't the fact that you are reading lots of information from the
usb
drive be slowing you down, why not run your --verify-at-time on the
local disk repo. and then when using rsync from local to usb us the -c
option to let rsync do a checksum on the files, but yuo are still
going
to run into the slowness of USB drives

See my response to Greg. I think I addressed all these issues in that.

~ Daniel


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

View user's profile Send private message
Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB