same page, and perhaps you'll agree with my conclusions.
I'm quite sure you understand how RDiff works, but humor me...
RDiff does a backup of the current files, moving delta's (though with
a local file system drive it's doing all the check-sums etc on the
same machine - thus "less efficient" than a SSH transfer) - when
you're done you get:
1) Regular files that are identical copies of the files that got backed
up. (Nothing needs to be done for a restore here, they're exact
copies.)
2) *Reverse* delta's (and meta data) to roll-back the files in (1)
back to the "last" revision.
So, a --verify isn't needed to "verify" the current files. The very
nature of RDB is that they're exact. (provided you trust the RDB
protocol...which we assume.)
A --verify IS needed when you want to check an "older" version to be
sure that something hasn't borked your repository for an older delta
set. [But the "current" files are already verified, IMO]
So, your most important data the current data is verified.
[IMO] Progressively older delta sets are each less certain, as they
all get layered on top of each other in reverse order to get to
"older" sets. [But in general, I consider each "older" set to be
progressively less important - at least in general.]
So, I see your problem as the following.
1) Verify that the current backup completed properly.
(I do this via logs and exit codes. I don't "double" check the
current backup by doing a --verify on the current backup set. I
implicitly trust that RDB does it's job properly and that at the end
the hashes will match properly and that the current "remote" files do
equal the current "local" files. {i.e. the files that were the
source of the backup equal the backup files)
2) Verify that your older delta's are as intact as possible. That all
the meta-data, deltas and current files can be merged and rolled-back
to whatever desired end-point you want.
(This is where I use --verify - it's not perfect because there's not
a way to check every delta-set for every single file in the
repository - at least not easily. [A recursive loop checking every
version would do that, but as you say, it's going to be very resource
expensive.])
3) Verify that the data is exact from your FW800 drive to the USB
drive on the mac-mini.
(I wouldn't use a --verify for this. As long as the files are equal
from the FW drive to the USB drive, if you can --verify on the FW drive
[source] you should be able to --verify on the USB drive too. So I'd
either "trust" rsync to be sure they're equal - or do something like
you are doing - checking that the FW files are exactly equal to the
USB files.
I'd do a verify on the fastest drive on the most powerful system.
Plus you don't need to do this all the time, say once a week - over a
weekend probably works. [And perhaps a full recursive loop through
all the diffs would be possible. If you write a bash script to do
that, I'd love to have it!])
To recap:
** Trust RDB does the backup properly and that source = destination
without additional checks.
** --verify the backup repository on the FW drive, and as much as
possible that all the older deltas and meta-data are intact and
functioning properly.
** check that the FW drive does copy exactly to the off-site USB
drive - but don't use --verify to accomplish this task. Just make
sure that the "off-site" repository is exactly equal to the "on-site"
FW drive.
HTH
-Greg
I'm not sure what you're doing with your --verify...
It *sounds* like you want a full CRC style check of the *current*
files after the backup is complete. (i.e. File X gets updated with a
delta, and you want to verify that file X is the same both on the
source and destination locations/drives.)
Yes, although it's more of an internal consistency check within the
rdiff-backup repository itself. I'm looking for a way to quickly
verify the integrity my entire rdiff-backup repository.
In my scenario the repository is synced to an external USB drive that
gets rotated each day (i.e. each day I put yesterday's drive in
storage and bring a different drive out of storage to use for the
next
backup). I use rsync to transfer my rdiff-backup repository (which
gets updated daily) to the USB drive. Then I run rdiff-backup --
verify-
at-time to verify that the files on the USB drive are not corrupt.
But
lately this has been taking too long.
Does that make sense?
Yes, and the USB connection may explain the longish verify times,
since it's somewhat slow, compared to a SATA drive connected directly
to the controller...
USB probably does have something to do with how long it takes. But on
the other hand yafic can do a full verify in 1/4 of the time on the
same drive with the same data, etc. So maybe rdiff-backup could be
made to be faster?
But I see that you want to verify the "local" RDiff repository to the
"of-line" one.
I'm not sure what you mean by this statement... I want to do an
internal consistency check on my rdiff-backup repository after it's
been rsync'd to the USB disk. I need to be sure that the data on the
USB disk is valid. I am doing the verify on the USB drive because that
is the last place that the data will be copied before it goes into
secure storage (for up to a month, but normally just a few days).
Maybe an outline of my data flow will help you to understand what I'm
trying to accomplish.
First the hardware:
- Xserve with raid array - this is being backed up with rdiff-backup
- Firewire 800 drive attached to Xserve - staging location for rdiff-
backup repository, gets a new revision each night
- Mac Mini - remote backup "server"
- USB 2.0 drive attached to Mac Mini - gets a copy of the rdiff-backup
repo from the Firewire 800 drive on the Xserve
Now the data flow:
- Xserve runs rdiff-backup from raid array to local firewire drive
- Xserve runs rdiff-backup --verify-at-time 0B on local firewire drive
to verify integrity of most recent revision (this step may not be
necessary)
- Mac Mini runs rsync to copy rdiff-backup repo from Xserve firewire
800 drive to local USB drive
- Mac Mini would now like to verify the integrity of the rdiff-backup
repository that it just rsync'd to the USB drive
During this last step I would rather not tie up any resources on the
Xserve. Instead, I want to do a fully local (to Mac Mini) verification
of the rdiff-backup repository. This verification should let me know
if any link in the (hardware) chain is failing: is the firewire 800
staging drive failing? is the USB drive failing?
Not sure how to do that - I'd guess you could do it with some other
tools - not storing the hashes - just a full compare each time. (How
big is the repository? [I think you said, but I don't recall.]
100 GB mirror + 80 GB of rdiff data. So almost 200 GB
---
But I'd guess your "local" repository isn't on the same disks as the
data, right?
Right.
If so, then it's probably not a huge deal if it takes 20 hours to
check the local repository against the remote. [Though I guess all
that disk channel activity might impact other disk through-put too...]
The drive will be moved to a secure location, so it needs to happen as
quickly as possible. If we have a disaster (fire, etc.) a backup
doesn't do us much good if the most recent snapshot is still online
being verified (and hence consumed by the fire).
(Add a controller? Dunno...)
I use a similar system and I don't verify the local repository to the
remote, though perhaps I should. (I trust rsync to make sure they're
the same...since it's not just copying the files - it's doing hash
matches like RDiff...)
Even if rsync verifies that they're the same this is only a false
sense of security since the staging repo (the source that rsync copied
from) could be corrupt and you'll never know it. This corruption could
be sneaking into old revisions which you don't bother to verify
because it takes too long. There needs to be some way to verify that
ALL of the data is fully intact after it's been copied... --verify-at-
time almost gets there, but not quite. It could get you there if you
have lots of time to do a verify-at-time for each revision in the
repo, but I'm guessing that would be prohibitively expensive in most
cases.
BTW, is this on a windows platform? (Curious...) Ah, probably not
since yafic isn't...
Nope. All machines are running Mac OS. I have aspirations to add some
Windows machines at some point, but that's not likely until I get a
faster verify.
~ Daniel
--
Best regards,
listserv mailto:listserv.traffic < at > sloop.net
_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
