SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
About backups and increments
Author Message
Post About backups and increments 
Hi everyone. I'm successfully using rdiff-backup to perform backups of several
machines. After using it for some time though, I have some questions that I
hope are not in the FAQ Wink.

Is the backup procedure crash/interrupt safe? If I'm backing-up some data and
for some reason the program gets killed (say, manually), in what state the
backup will remain? Does the latest snapshot reflect the old state (as if no
backup was performed), a broken state (cannot reliably recover anything), or a
mixed state? Is this state ever recorded as an increment so that I can drop it?

If the backup will be left in a mixed state, what exactly is the state of the
file where rdiff-backup was working before being killed? Old or broken? Will
the old increments still be recoverable?

I want to asses how resistant rdiff-backup is to this kind of problems.

About space requirements: I assume the space required for the backup is:

- the space of the source files themselves
- the space of all the increments
- extra space required to compute the increment?
* Is this space stored on the source or destination drive? * This should be
the size of the file currently computed + it's increments right? So should I
assume that to backup the *second* increment of some space X (where X can
possibly be just one huge file) I need at least X * 2 space for the backup -
just for temporary files?
* This brings me back to my first question: what happens when the destination
is full?

I need to be able to *reliably* set space reservations before being caught with
an incomplete backup.

About backup speed. rdiff-backup doesn't seem to support both backupping *and*
pruning the increments at the same time (yes, I've read the man page). Though
this sounds like a very sensible thing to do: knowing that you will prune
several old increments, you can avoid to calculate the reverse diffs. Has this
been considered? rdiff-backup is cpu bound for me one some machines for this
reason.

A request: increments pruning. --remove-older-than is fine most of the times,
though I would like an extra knob:

--keep-increments N (where N is the number of most recent increments to keep,
irregardless of time).

Let's say I want always to keep at all times at least 2 increments (or 2
months, if that matters), I have no way to do that directly (I could list the
increments and calculate the time myself, but that's ugly).

--remove-older-than X --keep-increments 2

would do the job. This would also allow to indirectly keep N increments
regardless of the date:

--remove-older-than 0D --keep-increments N

It's very important (in my opinion) that, if I have the space, I always keep at
least 2 old copies around at all times. I'm prepared to implement this myself
if that would help getting this adopted.

Sorry for the long post Smile, and thanks for rdiff-backup.



_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post About backups and increments 
On 08/21/2011 05:27 PM, Yuri D'Elia wrote:
Is the backup procedure crash/interrupt safe? If I'm backing-up some data and
for some reason the program gets killed (say, manually), in what state the
backup will remain? Does the latest snapshot reflect the old state (as if no
backup was performed), a broken state (cannot reliably recover anything), or a
mixed state? Is this state ever recorded as an increment so that I can drop it?

Recovery from a failed or interrupted session is reliable, but time consuming.
The next time you try to do any operation on that archive, rdiff-backup will
insist on first performing a regression of the failed session back to the
previous state. There is also the "--check-destination-dir" action, which
does only the regression, if one is needed. For a large backup set, the
regression takes a *long* time.


About space requirements: I assume the space required for the backup is:

- the space of the source files themselves
- the space of all the increments
- extra space required to compute the increment?
* Is this space stored on the source or destination drive? * This should be
the size of the file currently computed + it's increments right? So should I
assume that to backup the *second* increment of some space X (where X can
possibly be just one huge file) I need at least X * 2 space for the backup -
just for temporary files?
* This brings me back to my first question: what happens when the destination
is full?

I'm not aware of any extra space needed for computing the increment, but the
increment itself, of course, does need to be stored on the destination drive.
If the destination drive runs out of space, the rdiff-backup session will
fail.


About backup speed. rdiff-backup doesn't seem to support both backupping *and*
pruning the increments at the same time (yes, I've read the man page). Though
this sounds like a very sensible thing to do: knowing that you will prune
several old increments, you can avoid to calculate the reverse diffs. Has this
been considered?

There's not much point in combining those two, totally independent actions.
Computing the reverse diffs for session N vs. session N-1 is totally
independent of the existence (or lack thereof) of earlier sessions in the
archive.

A request: increments pruning. --remove-older-than is fine most of the times,
though I would like an extra knob:

--keep-increments N (where N is the number of most recent increments to keep,
irregardless of time).

You can already do that. Though the manpage doesn't mention it, you can also
use a "B" suffix to specify the number of sessions to keep:

rdiff-backup --remove-older-than 30B /path/to/archive

will retain the most recent 30 sessions. (Yes, you'll probably need to
include "--force" with that.)

Let's say I want always to keep at all times at least 2 increments (or 2
months, if that matters), I have no way to do that directly (I could list the
increments and calculate the time myself, but that's ugly).

Hey!! Some of us scripting veterans really get off on "ugly"!

# Each Sunday, delete backups older than the previous Sunday, but
# always retaining at least 6 backups.
if [ "$(date +%a)" = "Sun" ]; then
Cut=$(date -d 'last week' "+increments.%Y-%m-%d")
CutAt=$(rdiff-backup -l "$BkVolume" | \
awk -v "Where=$Cut" '
BEGIN {
stderr = "/dev/stderr"
Dmax = -1
}
FNR > 1 {
bdate[10000+FNR] = $1
if(Dmax < 0 && $1 > Where) Dmax = FNR
}
END {
if(Dmax < 0 || Dmax > FNR - 5) Dmax = FNR - 5
if(Dmax > 2) {
if(match(bdate[10000+Dmax], "^increments\\.....-..-..T..:..:..-")) {
print substr(bdate[10000Dmax], 12, 10)
exit 0
}
else {
print "Unrecognized increment: \"" bdate[10000+Dmax] "\"" >stderr
}
}
exit 1
}' )
[ $? = 0 -a -n "$CutAt" ] && rdiff-backup --remove-older-than $CutAt \
--force "$BkVolume"
fi


--
Bob Nichols "NOSPAM" is really part of my email address.
Do NOT delete it.


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post About backups and increments 
On Mon, 22 Aug 2011 10:54:48 -0500
Robert Nichols <rnicholsNOSPAM < at > comcast.net> wrote:

Recovery from a failed or interrupted session is reliable, but time consuming.
The next time you try to do any operation on that archive, rdiff-backup will
insist on first performing a regression of the failed session back to the
previous state. There is also the "--check-destination-dir" action, which
does only the regression, if one is needed. For a large backup set, the
regression takes a *long* time.

Hi Robert, thanks for the response.

Can I ask why does it take long? Is there a document/little explanation
somewhere that tells how rsync-backups keeps its internal
format/sessions/etc?

I'm not aware of any extra space needed for computing the increment, but the
increment itself, of course, does need to be stored on the destination drive.
If the destination drive runs out of space, the rdiff-backup session will
fail.

Ok, I only think to know more about the rdiff-backup storage to answer
that question better myself.

About backup speed. rdiff-backup doesn't seem to support both backupping *and*
pruning the increments at the same time (yes, I've read the man page). Though
this sounds like a very sensible thing to do: knowing that you will prune
several old increments, you can avoid to calculate the reverse diffs. Has this
been considered?

There's not much point in combining those two, totally independent actions.
Computing the reverse diffs for session N vs. session N-1 is totally
independent of the existence (or lack thereof) of earlier sessions in the
archive.

Ok.

--keep-increments N (where N is the number of most recent increments to keep,
irregardless of time).

You can already do that. Though the manpage doesn't mention it, you can also

Whoa. Can we fix the manpage? Smile

use a "B" suffix to specify the number of sessions to keep:

rdiff-backup --remove-older-than 30B /path/to/archive

will retain the most recent 30 sessions. (Yes, you'll probably need to
include "--force" with that.)

Yes, I basically always use --force with --remove-older-than. Using
--force feels "wrong" IMHO, since it *is* the intended action of
--remove-older-than to remove possibly more than one increment.

Let's say I want always to keep at all times at least 2 increments (or 2
months, if that matters), I have no way to do that directly (I could list the
increments and calculate the time myself, but that's ugly).

Hey!! Some of us scripting veterans really get off on "ugly"!

Hah, I know I do too, but I was hoping I could avoid that. Having
--remove-older-than ?B still doesn't allow me to avoid the ugliness.

# Each Sunday, delete backups older than the previous Sunday, but
# always retaining at least 6 backups.

Exactly what I intended to do.

I would love something integrated in rsync-backup, since it seems such
an obvious scenario to me. Maybe a combined specifier?

--remove-older-than time-spec[,?B]

Bob Nichols "NOSPAM" is really part of my email address.
Do NOT delete it.

I always loved that trick, but does it still work? Wink


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post About backups and increments 
On 08/22/2011 10:54 AM, Robert Nichols wrote:
print substr(bdate[10000Dmax], 12, 10)

Ouch! That should be "10000+Dmax". Somehow the "+" sign got
lost when I was reformatting the lines.

--
Bob Nichols "NOSPAM" is really part of my email address.
Do NOT delete it.


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post About backups and increments 
On Mon, 22 Aug 2011, Robert Nichols wrote:

About space requirements: I assume the space required for the backup is:

- the space of the source files themselves
- the space of all the increments
- extra space required to compute the increment?
* Is this space stored on the source or destination drive? * This should
be the size of the file currently computed + it's increments right? So
should I assume that to backup the *second* increment of some space X
(where X can possibly be just one huge file) I need at least X * 2
space for the backup - just for temporary files?
* This brings me back to my first question: what happens when the
destination is full?

I'm not aware of any extra space needed for computing the increment, but the
increment itself, of course, does need to be stored on the destination drive.
If the destination drive runs out of space, the rdiff-backup session will
fail.

If it is detected that a file has changed (based on file attributes), a
new file in the destination directory is created using a "temp name", and
it is synced to its new contents, using the old version to speed up the
rsync process. After that, an increment is created, and only then will the
old version be removed.
This process is followed sequentially for all files, so the total space
needed would be the space for the increments that are created during this
session, plus the size of the largest file in the repository.
Of course, you usually don't know in advance how large the increments will
be...

I don't really understand what you mean by 'the second increment'.
Worst case would be that you'd need the current size of the source, plus
the total size of your last backup including all increments (if everything
in the tree is replaced by something else), plus a small metadata
overhead. If you repeat for a second increment and again all data has been
replaced by other data, you would again need the current source size plus
the total size of the backup tree.
If, however, the data you backup changes only slightly or is mostly
'append-only' data like log files, each time the space used by increments
would be quite limited.

It all depends on your data set...


About backup speed. rdiff-backup doesn't seem to support both
backupping *and* pruning the increments at the same time (yes, I've
read the man page). Though this sounds like a very sensible thing to
do: knowing that you will prune several old increments, you can avoid
to calculate the reverse diffs. Has this been considered?

There's not much point in combining those two, totally independent actions.
Computing the reverse diffs for session N vs. session N-1 is totally
independent of the existence (or lack thereof) of earlier sessions in the
archive.

Adding to that:
One will always have to calculate a reverse diff to go from the newly
synced (N) version to the previous (N-1) version. If someone wants to
avoid calculating reverse diffs for a file, that is the same as having no
history at all. Better use rsync then, instead of rdiff-backup...
If you don't calculate a reverse-diff for a file, you won't be able to
regress a backup run that failed half-way through... leaving you with a
useless backup.

But!
Maybe I now know what I didn't understand in your line of questioning.
With rdiff-backup, increments are for individual files, and only when
these individual files have been changed. So, there are no reverse diffs
if a file has not been changed. For a data set of 1000 files with only 10
files changing since the previous run, the increments dir would only
contain 10 reverse diff files for this run.
Likewise, if a file hasn't been changed for 3 months and it is changed
today, but I only want to keep 1 month of history, I can NOT simply ditch
the 3-months old version. Maybe it wasn't changed for all these months,
but it is still yesterday's version and has to be kept in history for the
coming month minus 1 day...


--keep-increments N (where N is the number of most recent increments to
keep, irregardless of time).
[snip]
Let's say I want always to keep at all times at least 2 increments (or
2 months, if that matters), I have no way to do that directly (I could
list the increments and calculate the time myself, but that's ugly).

So.. lets assume you make weekly backups. (Hoping it will be more often,
but just as an example.)
You want to keep history of 2 months. That's about 8 or 9 weeks.
But sometimes you make an extra backup halfway through a week, and
sometimes you go on a vacation and don't run any backup.
So, in these cases, you might want to keep history for 2 months, but also
at least 5 increments, even if that means it will be more than 2 months?
Would it really be useful to.. eh.. keep increments from 4 months ago if
you forgot to run backups for the last 2 months? This sounds just like
"oh, I didn't make backups over the last two months, but I do happen to
have some historic versions from 3 months ago containing your PhD thesis
you've been working on... for the last 3 months....."

Let's just say that I don't think having such an option would be a really
nice thing to have Wink
And creating a small script would indeed be far easier Wink

Side note: I never automate the removal of old increments. Always do that
by hand, first without --force to check the increment dates it announces
that will be removed, then with --force if it looks OK. The only thing
that's automated wrt increment removal is a cron job reminding me of the
task. I could even modify it to remind me daily if increment removal is
due and wasn't done yet, but for now, I keep these reminders in my inbox
until the removal is done.


--
Maarten

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post About backups and increments 
On 08/22/2011 11:19 AM, Yuri D'Elia wrote:
On Mon, 22 Aug 2011 10:54:48 -0500
Robert Nichols<rnicholsNOSPAM < at > comcast.net> wrote:

Recovery from a failed or interrupted session is reliable, but time consuming.
The next time you try to do any operation on that archive, rdiff-backup will
insist on first performing a regression of the failed session back to the
previous state. There is also the "--check-destination-dir" action, which
does only the regression, if one is needed. For a large backup set, the
regression takes a *long* time.

Hi Robert, thanks for the response.

Can I ask why does it take long? Is there a document/little explanation
somewhere that tells how rsync-backups keeps its internal
format/sessions/etc?

I've never looked at the code (I've heard rumors of people being treated for
cancer of the eyeballs after prolonged viewing.), but conceptually it's just
a delicate process that must be done very carefully lest interrupting the
regression leave things in an even worse state.

There's a very basic outline of the data storage at
http://www.nongnu.org/rdiff-backup/format.html .

--keep-increments N (where N is the number of most recent increments to keep,
irregardless of time).

You can already do that. Though the manpage doesn't mention it, you can also

Whoa. Can we fix the manpage? Smile

use a "B" suffix to specify the number of sessions to keep:

rdiff-backup --remove-older-than 30B /path/to/archive

will retain the most recent 30 sessions. (Yes, you'll probably need to
include "--force" with that.)

The "nnnB" notation for counting back by backup sessions is documented as
an alternative to a timestamp, just not in the section dealing with
--remove-older-than.

Yes, I basically always use --force with --remove-older-than. Using
--force feels "wrong" IMHO, since it *is* the intended action of
--remove-older-than to remove possibly more than one increment.

I would still want some way to verify what will be removed before committing
to that irreversible action. And then of course there would have to be some
way to bypass that when invoking that function from a script. Whatever.
That would rank pretty low on the list of things I'd like to see different.

--
Bob Nichols "NOSPAM" is really part of my email address.
Do NOT delete it.


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post About backups and increments 
On 08/22/2011 02:26 PM, Maarten Bezemer wrote:

Likewise, if a file hasn't been changed for 3 months and it is changed today,
but I only want to keep 1 month of history, I can NOT simply ditch the 3-months
old version. Maybe it wasn't changed for all these months, but it is still
yesterday's version and has to be kept in history for the coming month minus 1
day...

If that old file is changed today, then what is kept is today's version
(the mirror) and a diff to reconstruct what the file was yesterday. You
can indeed simply ditch any history older than 1 month. In fact, that
"three-months-old" version _did_ get discarded as soon as you made
today's backup. The only thing kept showing that the file was actually
old is the timestamp in the mirror_metadata files.

--
Bob Nichols "NOSPAM" is really part of my email address.
Do NOT delete it.


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post About backups and increments 
Maarten Bezemer <mcbrdiff <at> robuust.nl> writes:

If it is detected that a file has changed (based on file attributes), a
new file in the destination directory is created using a "temp name", and
it is synced to its new contents, using the old version to speed up the
rsync process. After that, an increment is created, and only then will the
old version be removed.
This process is followed sequentially for all files, so the total space
needed would be the space for the increments that are created during this
session, plus the size of the largest file in the repository.
Of course, you usually don't know in advance how large the increments will
be...

Of course. I just wanted to understand how space was managed.

Maybe I now know what I didn't understand in your line of questioning.
With rdiff-backup, increments are for individual files, and only when
these individual files have been changed. So, there are no reverse diffs
if a file has not been changed. For a data set of 1000 files with only 10
files changing since the previous run, the increments dir would only
contain 10 reverse diff files for this run.
Likewise, if a file hasn't been changed for 3 months and it is changed
today, but I only want to keep 1 month of history, I can NOT simply ditch
the 3-months old version. Maybe it wasn't changed for all these months,
but it is still yesterday's version and has to be kept in history for the
coming month minus 1 day...

Sure.

So.. lets assume you make weekly backups. (Hoping it will be more often,
but just as an example.)
You want to keep history of 2 months. That's about 8 or 9 weeks.
But sometimes you make an extra backup halfway through a week, and
sometimes you go on a vacation and don't run any backup.
So, in these cases, you might want to keep history for 2 months, but also
at least 5 increments, even if that means it will be more than 2 months?
Would it really be useful to.. eh.. keep increments from 4 months ago if
you forgot to run backups for the last 2 months? This sounds just like
"oh, I didn't make backups over the last two months, but I do happen to
have some historic versions from 3 months ago containing your PhD thesis
you've been working on... for the last 3 months....."

I'll pick one of my scenarios here, so please don't consider this as the
"only" way I'm currently using rdiff-backup.

I have a small laptop connected to a NAS drive. I have done a little
perl tool that wraps rdiff-backup so that I can have some automation over
it, for instance I can control which directory gets backed up, pre-post
backup scripts per-directory, backup frequency, etc. I integrated the script
with both cron and udev, so as soon as the laptop is plugged to the NAS
the backups start to roll. If enough time passes, backups gets re-run (some
directories have hourly granularity).

The backup frequency though is not guaranteed. If I move the laptop away from
the NAS, I will have holes (in that case I have more than one NAS, but I
digress).

Of course, space (although not a problem in this case) is limited. What I've
done is to allow for each tree to have a different retention mechanism. Some
trees have unlimited retention, some have lesser guarantees such as
"one month minimum, but more if there's space", some even less as "just one
month, prune everything else".

To make up the free space, before the backup, I simply speculate with a
linear predictor for each tree, and decide what to prune using a simple
proportion between the trees. Again, I want to retain as much as possible,
but within the limits.

This way, by default I can restore the machine fully from scratch at any
time. I can also possibly do a full restore of months ago (if there was
enough space), but that's not guaranteed. What I can always do is recover
any important file at any time in the past, and recover a working state
of the machine.

I want at least "X" copies of increments because you may never know at
which instant some files were backed up. Having N copies allow me to try
(in emergency situations) other snapshots. Since backups are not regular,
you may never know in terms of "time" how to prune them.

And yeah, that's pure perl hackery. I also do quite good with "ugly".
I'm quite satisfied with this particular setup, but I feel like I can
reduce the hackery Wink

Let's just say that I don't think having such an option would be a really
nice thing to have
And creating a small script would indeed be far easier

counting output lines is not my favorite, but yeah, I can do that.

Side note: I never automate the removal of old increments. Always do that
by hand, first without --force to check the increment dates it announces
that will be removed, then with --force if it looks OK. The only thing
that's automated wrt increment removal is a cron job reminding me of the
task. I could even modify it to remind me daily if increment removal is
due and wasn't done yet, but for now, I keep these reminders in my inbox
until the removal is done.

Sometimes automatic removal is a "Good thing(tm)", but again it depends.
I'm trying to maximize snapshots while still guaranteeing some directory
trees.



_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB