SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
what to do when an rsync fails
Author Message
Post what to do when an rsync fails 
Right now, if an rsync fails, all of the backups for that machine are aborted
and the backup from the previous day is rolled back.

This seems like an overzealous discarding of what's at least partially good
data.

To that end, I modified my rsnapshot to not remove the subdir for the machine
(in preparation for a roll-back) but instead to rename it $machine.interrupted
and then allow the rollback to do it's job. At least that way if I end up
needinga file from that machine's backup on the day that it was rolled back to
the previous day, I can choose to see if there is a newer copy in
$machine.interrupted before having to revert to the previous day's in the
rolled-back dir.

I would propose however that (firstly) not all other filesystems for a machine
in which one did fail are automatically removed from the backup queue -- it's
possible that while one had an error, the others will succeed.

Secondly, I would propose that rather than simply discarding interrupted backups
(of filesystems), that what did manage to get backed up be *merged* with the
rollback from the previous day.

I have achieved this manually by use rsync to roll-back from the previous day
(i.e. effectively a roll-back) but including the $machine.interrupted dir as a
high priority link-dest directory.

The result of this rsync was that it effectively achieved the desired roll-back
but preferd files from the interrupted backup where they are available, making
the backup as complete and fresh as possible -- moreso than simply rolling back
from the previous day.

Anyone want to code this up into rsnaphot? Smile I'm afraid I just don't have
the cycles (yet -- planning on winning the lottery though, so will have time
then -- i just don't know when that will be though Smile ).

Cheers,
b.



------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and,
should the need arise, upgrade to a full multi-node Oracle RAC database
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss

Post what to do when an rsync fails 
On 4 January 2011 17:46, Brian J. Murrell <brian < at > interlinx.bc.ca> wrote:
Right now, if an rsync fails, all of the backups for that machine are aborted
and the backup from the previous day is rolled back.

This seems like an overzealous discarding of what's at least partially good
data.

To that end, I modified my rsnapshot to not remove the subdir for the machine
(in preparation for a roll-back) but instead to rename it $machine.interrupted
and then allow the rollback to do it's job.
...
Anyone want to code this up into rsnaphot?  :-)

This sounds like a very nice way to have rsnapshot behaving, so as a
user I would welcome this.

BR Håkon Løvdal

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and,
should the need arise, upgrade to a full multi-node Oracle RAC database
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss

Post what to do when an rsync fails 
On Tue, Jan 04, 2011 at 04:46:25PM +0000, Brian J.Murrell wrote:
Right now, if an rsync fails, all of the backups for that machine are aborted
and the backup from the previous day is rolled back.

Only if you have
link_dest 1
in rsnapshot.conf.

This seems like an overzealous discarding of what's at least partially good
data.

Do you have some real life examples of situations where rsync fails
(according to rsnapshot) and there are some good data?

If this happens regularly (depending on the reason) you might want
to look at the underlying cause of rsync failure and see if you can
do something about that (for example if you have a flakey network).

--
___________________________________________________________________________
David Keegel <djk < at > cybersource.com.au> http://www.cyber.com.au/users/djk/
Cybersource P/L: Linux/Unix Systems Administration Consulting/Contracting

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and,
should the need arise, upgrade to a full multi-node Oracle RAC database
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss

Post what to do when an rsync fails 
Brian J. Murrell wrote:
Right now, if an rsync fails, all of the backups for that machine are aborted
and the backup from the previous day is rolled back.

This seems like an overzealous discarding of what's at least partially good
data.

To that end, I modified my rsnapshot to not remove the subdir for the machine
(in preparation for a roll-back) but instead to rename it $machine.interrupted
and then allow the rollback to do it's job.
[...]

Hi Brian.

What about using SYNC_FIRST=1 and a wrapper script which does something
like this.

1.
Call "rsnapshot sync". This should give a ".sync" directory.

2.
If the sync was ok then call "rsnapshot daily" (or what ever interval)
to make the backup rotation.

3.
If the sync failed (e.g. rsync error) then rename the ".sync" to
".failedYYYYMMDD". Omit the backup rotation in this case.

Clemens

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and,
should the need arise, upgrade to a full multi-node Oracle RAC database
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss

Post what to do when an rsync fails 
David Keegel <djk <at> cybersource.com.au> writes:

Only if you have
link_dest 1
in rsnapshot.conf.

I will defer to you on that. I don't really keep the various functionality of
all of the scenarios in which rsnapshot works in my head.

Do you have some real life examples of situations where rsync fails
(according to rsnapshot) and there are some good data?

Sure. Network interruption is one. Another is a machine that was woken up
(i.e. from suspend) to be backed up and goes back to sleep again before
everything that needed to be backed up could be grabbed.

If this happens regularly

Not so regularly, but it does happen. For example, networks (especially WANs)
fail and no amount of (sane -- i.e. cost effective) mitigation is possible.

(depending on the reason) you might want
to look at the underlying cause of rsync failure and see if you can
do something about that (for example if you have a flakey network).

As I said, not all network failure type scenarios are resolvable cost-
efficiently. Yes, we could get out a backhoe and dig a 300km ditch (two in
fact, to mitigate a single fibre failure) and run a dedicated fibre link, but
given the infrequency of the failures, that is hardly a cost-effective solution.

Cheers,
b.



------------------------------------------------------------------------------
Gaining the trust of online customers is vital for the success of any company
that requires sensitive data to be transmitted over the Web. Learn how to
best implement a security strategy that keeps consumers' information secure
and instills the confidence they need to proceed with transactions.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss

Post what to do when an rsync fails 
Clemens Feige <c.feige <at> osypkamed.com> writes:

Hi Brian.

Hi Clemens,

What about using SYNC_FIRST=1 and a wrapper script which does something
like this.

I'm not familiar with that option but let me see if I understand...

1.
Call "rsnapshot sync". This should give a ".sync" directory.

So the backups are done into the .sync dir, which contains a subdir per
machine? I.e. this is the dir that will get renamed (in my case) hourly.0?

2.
If the sync was ok then call "rsnapshot daily" (or what ever interval)
to make the backup rotation.

So make .sync hourly.0?

3.
If the sync failed (e.g. rsync error) then rename the ".sync" to
".failedYYYYMMDD". Omit the backup rotation in this case.

So now I am "throwing away" (or taking out of the rotation at least) all of the
backups of all of my machines in the case of a failure. This seems like an even
worse case than simply throwing away the interrupted backup of a single machine.

Yes, I do understand that I am simply putting the backups of all of the machines
into a ".failedYYYYMMDD" dir, but (a) that dir ends up outside of the rotation
scheme and thus orphaned (and will stick around forever without manual pruning)
then the next backup is not going to be able leverage off of all of the changes
that were in the .failedYYYMMDD dir.

The point of my proposition was not just to simply not throw away a partial
backup, but to integrate it properly into the rotation in which it failed.

In fact, coincidentally, one of my backups failed last night for a machine named
jenny. Here is the sequence of commands to re-integrate the failed backup into
the rotation:

# cd /.snapshots/hourly.0
# ls -l
total 28
drwxr-xr-x 27 root root 4096 2011-01-03 09:34 cmurrell
drwxr-xr-x 32 root root 4096 2011-01-07 22:07 jenny
drwxr-xr-x 32 root root 4096 2011-01-07 22:07 jenny.interrupted
drwxr-xr-x 19 root root 4096 2009-06-09 15:06 klug
drwxr-xr-x 30 root root 4096 2011-01-07 11:22 linux
drwxr-xr-x 33 root root 4096 2010-12-28 00:39 pc
drwxr-xr-x 28 root root 4096 2010-10-19 15:12 pvr
# mv jenny{,.old}
# mkdir jenny
# rsync -aiHAX --link-dest=/.snapshots/hourly.0/jenny.interrupted \
--link-dest=/.snapshots/hourly.0/jenny.old/ \
/.snapshots/hourly.0/jenny.old/ /.snapshots/hourly.0/jenny
# rm -rf jenny.old

So as you can see, since "jenny" is just a rollback from the previous backup and
jenny.interrupted is the partial backup, we want to use rsync to rebuild jenny
taking first any files that are in the interrupted backup and otherwise, any
files that are in the rollback.

Cheers,
b.



------------------------------------------------------------------------------
Gaining the trust of online customers is vital for the success of any company
that requires sensitive data to be transmitted over the Web. Learn how to
best implement a security strategy that keeps consumers' information secure
and instills the confidence they need to proceed with transactions.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss

Post what to do when an rsync fails 
On 9 January 2011 15:37, Brian J. Murrell <brian < at > interlinx.bc.ca> wrote:
So the backups are done into the .sync dir, which contains a subdir per
machine?  I.e. this is the dir that will get renamed (in my case) hourly.0?

In my opinion it is much better to have different snapshot_roots for
different machines;
that way problems or irregularities with one machine will be
completely independent
from other machines.

In my home network my server PC will always be on (and runs raid and rsnapshot),
my desktop PC is usually on all the time, but might not be from time to time.,
and my laptop is sometimes on and sometimes not. When I have three different
snapshot_roots it is not a problem that the laptop is backed up less frequently
than the server because the backups are rotated separately.

BR Håkon Løvdal

------------------------------------------------------------------------------
Gaining the trust of online customers is vital for the success of any company
that requires sensitive data to be transmitted over the Web. Learn how to
best implement a security strategy that keeps consumers' information secure
and instills the confidence they need to proceed with transactions.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss

Post what to do when an rsync fails 
On Sun, Jan 09, 2011 at 01:29:44PM +0000, Brian J.Murrell wrote:
David Keegel <djk <at> cybersource.com.au> writes:

Only if you have
link_dest 1
in rsnapshot.conf.

I will defer to you on that. I don't really keep the various functionality of
all of the scenarios in which rsnapshot works in my head.

I had to check the rsnapshot code to refresh my memory on when rollbacks
can happen.

Do you have some real life examples of situations where rsync fails
(according to rsnapshot) and there are some good data?

Sure. Network interruption is one. Another is a machine that was woken up
(i.e. from suspend) to be backed up and goes back to sleep again before
everything that needed to be backed up could be grabbed.


If this happens regularly

Not so regularly, but it does happen. For example, networks (especially WANs)
fail and no amount of (sane -- i.e. cost effective) mitigation is possible.

My first suggestion is to use
link_dest 0

IIRC link_dest does not cope well with things like network interruptions,
because it builds the new backup as it goes along and so premature abort
(eg from network interruption) means that for files/directories it hasn't
gotten to yet, you could have neither the old version nor the new version.
That is why rollbacks are necessary for safety if you use link_dest 1.

If you use link_dest 0, then rsnapshot will make sure there is a copy
of an old snapshot that it can work on before it calls rsync. That
way a premature abort means you might have a mixture of old files and
new files, but at least (pretty much?) all the files will be there.

So there is some more work for rsnapshot doing copies with link_dest 0,
but I think it copes better with things like network interruptions.

My second suggestion is to look at sync_first. But that will involve
a few minor changes (putting "sync_first 1" in rsnapshot.conf and
calling rsnapshot sync as well as rsnapshot hourly as described in the
man page -- rsnapshot should automatically do "cp -al hourly.0 .sync"
the first time you do "rsnapshot sync" when it notices there is no
.sync directory but there is a hourly.0).

You might want to change a cron job "rsnapshot hourly" to a script like
if rsnapshot sync || rsnapshot sync || rsnapshot sync ; then
rsnapshot hourly
else
echo "rsnapshot sync failed after 3 tries - help!"
fi
(script untested).

Warning: as mentioned in the FAQ there was a nasty bug in rsnapshot 1.2.9
and 1.3.0 if you try to use sync_first 1 and link_dest 1 together. This
was fixed in rsnapshot 1.3.1.

--
___________________________________________________________________________
David Keegel <djk < at > cybersource.com.au> http://www.cyber.com.au/users/djk/
Cybersource P/L: Linux/Unix Systems Administration Consulting/Contracting

------------------------------------------------------------------------------
Gaining the trust of online customers is vital for the success of any company
that requires sensitive data to be transmitted over the Web. Learn how to
best implement a security strategy that keeps consumers' information secure
and instills the confidence they need to proceed with transactions.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB