SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
Re[8]: Verify times increasing
Author Message
Post Re[8]: Verify times increasing 
I'll lay out my vision of how I see this to make sure we're on the
same page, and perhaps you'll agree with my conclusions.

I'm quite sure you understand how RDiff works, but humor me...

RDiff does a backup of the current files, moving delta's (though with
a local file system drive it's doing all the check-sums etc on the
same machine - thus "less efficient" than a SSH transfer) - when
you're done you get:

1) Regular files that are identical copies of the files that got backed
up. (Nothing needs to be done for a restore here, they're exact
copies.)

2) *Reverse* delta's (and meta data) to roll-back the files in (1)
back to the "last" revision.

So, a --verify isn't needed to "verify" the current files. The very
nature of RDB is that they're exact. (provided you trust the RDB
protocol...which we assume.)

A --verify IS needed when you want to check an "older" version to be
sure that something hasn't borked your repository for an older delta
set. [But the "current" files are already verified, IMO]

So, your most important data the current data is verified.
[IMO] Progressively older delta sets are each less certain, as they
all get layered on top of each other in reverse order to get to
"older" sets. [But in general, I consider each "older" set to be
progressively less important - at least in general.]

So, I see your problem as the following.

1) Verify that the current backup completed properly.
(I do this via logs and exit codes. I don't "double" check the
current backup by doing a --verify on the current backup set. I
implicitly trust that RDB does it's job properly and that at the end
the hashes will match properly and that the current "remote" files do
equal the current "local" files. {i.e. the files that were the
source of the backup equal the backup files)

2) Verify that your older delta's are as intact as possible. That all
the meta-data, deltas and current files can be merged and rolled-back
to whatever desired end-point you want.

(This is where I use --verify - it's not perfect because there's not
a way to check every delta-set for every single file in the
repository - at least not easily. [A recursive loop checking every
version would do that, but as you say, it's going to be very resource
expensive.])

3) Verify that the data is exact from your FW800 drive to the USB
drive on the mac-mini.

(I wouldn't use a --verify for this. As long as the files are equal
from the FW drive to the USB drive, if you can --verify on the FW drive
[source] you should be able to --verify on the USB drive too. So I'd
either "trust" rsync to be sure they're equal - or do something like
you are doing - checking that the FW files are exactly equal to the
USB files.

I'd do a verify on the fastest drive on the most powerful system.
Plus you don't need to do this all the time, say once a week - over a
weekend probably works. [And perhaps a full recursive loop through
all the diffs would be possible. If you write a bash script to do
that, I'd love to have it!])

To recap:
** Trust RDB does the backup properly and that source = destination
without additional checks.

** --verify the backup repository on the FW drive, and as much as
possible that all the older deltas and meta-data are intact and
functioning properly.

** check that the FW drive does copy exactly to the off-site USB
drive - but don't use --verify to accomplish this task. Just make
sure that the "off-site" repository is exactly equal to the "on-site"
FW drive.

HTH

-Greg



I'm not sure what you're doing with your --verify...

It *sounds* like you want a full CRC style check of the *current*
files after the backup is complete. (i.e. File X gets updated with a
delta, and you want to verify that file X is the same both on the
source and destination locations/drives.)

Yes, although it's more of an internal consistency check within the
rdiff-backup repository itself. I'm looking for a way to quickly
verify the integrity my entire rdiff-backup repository.

In my scenario the repository is synced to an external USB drive that
gets rotated each day (i.e. each day I put yesterday's drive in
storage and bring a different drive out of storage to use for the
next
backup). I use rsync to transfer my rdiff-backup repository (which
gets updated daily) to the USB drive. Then I run rdiff-backup --
verify-
at-time to verify that the files on the USB drive are not corrupt.
But
lately this has been taking too long.

Does that make sense?

Yes, and the USB connection may explain the longish verify times,
since it's somewhat slow, compared to a SATA drive connected directly
to the controller...

USB probably does have something to do with how long it takes. But on
the other hand yafic can do a full verify in 1/4 of the time on the
same drive with the same data, etc. So maybe rdiff-backup could be
made to be faster?

But I see that you want to verify the "local" RDiff repository to the
"of-line" one.

I'm not sure what you mean by this statement... I want to do an
internal consistency check on my rdiff-backup repository after it's
been rsync'd to the USB disk. I need to be sure that the data on the
USB disk is valid. I am doing the verify on the USB drive because that
is the last place that the data will be copied before it goes into
secure storage (for up to a month, but normally just a few days).
Maybe an outline of my data flow will help you to understand what I'm
trying to accomplish.

First the hardware:
- Xserve with raid array - this is being backed up with rdiff-backup
- Firewire 800 drive attached to Xserve - staging location for rdiff-
backup repository, gets a new revision each night
- Mac Mini - remote backup "server"
- USB 2.0 drive attached to Mac Mini - gets a copy of the rdiff-backup
repo from the Firewire 800 drive on the Xserve

Now the data flow:
- Xserve runs rdiff-backup from raid array to local firewire drive
- Xserve runs rdiff-backup --verify-at-time 0B on local firewire drive
to verify integrity of most recent revision (this step may not be
necessary)
- Mac Mini runs rsync to copy rdiff-backup repo from Xserve firewire
800 drive to local USB drive
- Mac Mini would now like to verify the integrity of the rdiff-backup
repository that it just rsync'd to the USB drive

During this last step I would rather not tie up any resources on the
Xserve. Instead, I want to do a fully local (to Mac Mini) verification
of the rdiff-backup repository. This verification should let me know
if any link in the (hardware) chain is failing: is the firewire 800
staging drive failing? is the USB drive failing?

Not sure how to do that - I'd guess you could do it with some other
tools - not storing the hashes - just a full compare each time. (How
big is the repository? [I think you said, but I don't recall.]

100 GB mirror + 80 GB of rdiff data. So almost 200 GB

---
But I'd guess your "local" repository isn't on the same disks as the
data, right?

Right.

If so, then it's probably not a huge deal if it takes 20 hours to
check the local repository against the remote. [Though I guess all
that disk channel activity might impact other disk through-put too...]

The drive will be moved to a secure location, so it needs to happen as
quickly as possible. If we have a disaster (fire, etc.) a backup
doesn't do us much good if the most recent snapshot is still online
being verified (and hence consumed by the fire).

(Add a controller? Dunno...)

I use a similar system and I don't verify the local repository to the
remote, though perhaps I should. (I trust rsync to make sure they're
the same...since it's not just copying the files - it's doing hash
matches like RDiff...)

Even if rsync verifies that they're the same this is only a false
sense of security since the staging repo (the source that rsync copied
from) could be corrupt and you'll never know it. This corruption could
be sneaking into old revisions which you don't bother to verify
because it takes too long. There needs to be some way to verify that
ALL of the data is fully intact after it's been copied... --verify-at-
time almost gets there, but not quite. It could get you there if you
have lots of time to do a verify-at-time for each revision in the
repo, but I'm guessing that would be prohibitively expensive in most
cases.

BTW, is this on a windows platform? (Curious...) Ah, probably not
since yafic isn't... Smile

Nope. All machines are running Mac OS. I have aspirations to add some
Windows machines at some point, but that's not likely until I get a
faster verify.

~ Daniel



--
Best regards,
listserv mailto:listserv.traffic < at > sloop.net



_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post Re[8]: Verify times increasing 
On Nov 25, 2009, at 12:25 PM, listserv.traffic < at > sloop.net wrote:
<snip explanation of how rdiff-backup works>

Sounds good.

So, a --verify isn't needed to "verify" the current files. The very
nature of RDB is that they're exact. (provided you trust the RDB
protocol...which we assume.)

OK, I can accept this (and this makes my backup time shorter, nice).

A --verify IS needed when you want to check an "older" version to be
sure that something hasn't borked your repository for an older delta
set. [But the "current" files are already verified, IMO]

When and why would I ever use this? If I need to restore an old backup
it might be nice to know that I have access to good data, but I'll
take whatever I can get at that point. --verify doesn't seem to be
very useful to do a general repository health check (bummer).

So, your most important data the current data is verified.
[IMO] Progressively older delta sets are each less certain, as they
all get layered on top of each other in reverse order to get to
"older" sets. [But in general, I consider each "older" set to be
progressively less important - at least in general.]

I half agree here. I certainly agree that the most important data is
the most current data. However, I would like to keep (at least) one
years worth of backup history, and I need to know that my history is
good.

So, I see your problem as the following.

1) Verify that the current backup completed properly.
(I do this via logs and exit codes. I don't "double" check the
current backup by doing a --verify on the current backup set. I
implicitly trust that RDB does it's job properly and that at the end
the hashes will match properly and that the current "remote" files do
equal the current "local" files. {i.e. the files that were the
source of the backup equal the backup files)

That's very trusting of you. I guess I'm a little more paranoid since
my job depends on it Smile

2) Verify that your older delta's are as intact as possible. That all
the meta-data, deltas and current files can be merged and rolled-back
to whatever desired end-point you want.

(This is where I use --verify - it's not perfect because there's not
a way to check every delta-set for every single file in the
repository - at least not easily. [A recursive loop checking every
version would do that, but as you say, it's going to be very resource
expensive.])

Agreed. This is where I'd like to see a new feature in rdiff-backup.
I'm willing to write code if I ever get time and no one else does first.

3) Verify that the data is exact from your FW800 drive to the USB
drive on the mac-mini.

(I wouldn't use a --verify for this. As long as the files are equal
from the FW drive to the USB drive, if you can --verify on the FW
drive
[source] you should be able to --verify on the USB drive too. So I'd
either "trust" rsync to be sure they're equal - or do something like
you are doing - checking that the FW files are exactly equal to the
USB files.

I'd do a verify on the fastest drive on the most powerful system.
Plus you don't need to do this all the time, say once a week - over a
weekend probably works. [And perhaps a full recursive loop through
all the diffs would be possible. If you write a bash script to do
that, I'd love to have it!])

The bash script would be hugely inefficient. I'd much rather spend the
time modifying rdiff-backup support an internal consistency check.

The problem with doing it once a week is that it only ever hits one of
the drives that is normally in secure storage. It would be a matter of
weeks or possibly months to make sure that all drives have been
verified (e.g. each time a particular drive is in use on a Friday).

To recap:
** Trust RDB does the backup properly and that source = destination
without additional checks.

** --verify the backup repository on the FW drive, and as much as
possible that all the older deltas and meta-data are intact and
functioning properly.

** check that the FW drive does copy exactly to the off-site USB
drive - but don't use --verify to accomplish this task. Just make
sure that the "off-site" repository is exactly equal to the "on-site"
FW drive.

I never do a direct compare between the two drives. I just use rsync
to copy from the FW to the USB drive. Here's my concerns: without some
type of regularly executed integrity check of the data on the drive
(FW or USB), how would I detect that a drive is failing before it is
catastrophic and the bad data has propagated to all of the redundant
USB drives? Will rdiff-backup and/or rsync tell me if the drive is
failing when they do a backup/copy? (I don't think so) The only way
know that the data is good in my setup is to run some type of
consistency check on the USB drive each day after the rsync is
complete. If that fails then I know I have a problem somewhere. BTW it
looks like yafic won't work for me now either. there seems to be a bug
that causes it to stop half-way through the check Sad

So back to the drawing board (or google) to find a different utility
to do the integrity check.

Thanks a lot for your input and generously patient explanations, Greg.
I do value your input.

~ Daniel



_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

View user's profile Send private message
Post Re[8]: Verify times increasing 
On Wed, Nov 25, 2009 at 02:12:37PM -0500, Daniel Miller wrote:

[snip]


I never do a direct compare between the two drives. I just use rsync
to copy from the FW to the USB drive. Here's my concerns: without
some type of regularly executed integrity check of the data on the
drive (FW or USB), how would I detect that a drive is failing before
it is catastrophic and the bad data has propagated to all of the
redundant USB drives? Will rdiff-backup and/or rsync tell me if the
drive is failing when they do a backup/copy? (I don't think so) The
only way know that the data is good in my setup is to run some type
of consistency check on the USB drive each day after the rsync is
complete. If that fails then I know I have a problem somewhere. BTW
it looks like yafic won't work for me now either. there seems to be
a bug that causes it to stop half-way through the check Sad

So back to the drawing board (or google) to find a different utility
to do the integrity check.

This has been a rather informative thread.

Can I suggest a change on what greg was suggesting. the fastest place
for you to do your check is the firewire drive (with rdiff-backup), once
you are happy with this run you file checker (on linux I would use
md5sum or cksfv) which you can create a checksum for each of the files.

transfer this checksum over from the xserver to the mini mac and check
your checksums against the files on the usb drive.

The presumption being that the xserver + firewaire 800 is going to allow
you to verify allot faster than the mini + usb - hopefully within in
your allotted time period. The other way to do this would be to rsync -c
(let rsync compare files via checksum )


Or maybe get fw800 drives for the mac mini.

Or (what i do - this depends on your internet connection ), I
rdiff-backup to another machine on site and then rsync the rdiff-backup
directory offsite to 2 other geographical locations. I also use
fusecompress to site underneath the rdiff-backup destination and find I
get pretty good compression - I actually rsync the compressed data which
saves me a lot of time.



Thanks a lot for your input and generously patient explanations,
Greg. I do value your input.

~ Daniel



_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki


--
"In the long run, every program becomes rococo, and then rubble."
-- Alan Perlis

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB