Mac Newbold wrote:
On a separate but related note, my purpose for joining the list today was
to ask a question related to long running backups. I've got a server that
runs rsnapshots of all the servers (about 8 now, 5 local to the rsnapshot
server and 3 remote over a DSL line). While the backups generally complete
in time to stick with the schedule (an "hourly" backup ever 4 hours), they
sometimes run longer, causing rsnapshot to miss the next backup. I don't
care much about it missing an hourly snapshot, but when it misses the
daily or weekly snapshot, it makes things much less happy, and throws off
the schedule for data retention.
Would it be wise to add an option to rsnapshot (perhaps even a default
behavior?) that would allow intervals other than the lowest (i.e. any
interval that is only rotating directories, and not running a sync) to
ignore the lockfile, if any, when performing their rotations?
When I see a daily or weekly backup has been missed, I've been manually
moving the pid file (if any) out of the way and manually running the
appropriate rsnapshot interval, then replacing the pid file (if any). Does
that sound like an acceptable solution to the problem?
Does anyone else have this problem? If this (or another) solution sounds
acceptable, what is the best way to help it get integrated into rsnapshot?
I'd be willing to contribute some time to prepare a patch if that would be
the best way to help move the process along.
My workaround for this problem was to change how rsnapshot uses the
lockfile. Rather than simply fail when there is an old lockfile, I
patched rsnapshot to wait until the lockfile is removed and then
continue. This way I still get all my intervals but some of them will
occasionally have slightly incorrect timestamps. It is much easier to
deal with irregular intervals than it is to deal with missing backups.
When rsnapshot is waiting for a lockfile, it reports to syslog, so I
only have to check the console from time to time to make sure something
isn't stuck.
I have enclosed patches for rsnapshot and rsnapshot.conf.default. I run
with:
lockfile_wait 1440
to give myself 24 hours to fix a lockfile problem.
John
--- rsnapshot.1.2.9.old 2006-08-27 12:05:55.000000000 -0700
+++ rsnapshot.1.2.9.new 2006-08-27 12:15:26.000000000 -0700
< at > < at > -249,7 +249,8 < at > < at >
# this is reported to fix some semi-obscure problems with rmtree()
set_posix_locale();
-# if we're using a lockfile, try to add it (the program will bail if one exists)
+# if we're using a lockfile, try to add it
+# the program will bail or wait (depending on the config) if one exists
add_lockfile();
# create snapshot_root if it doesn't exist (and no_create_root != 1)
< at > < at > -471,6 +472,9 < at > < at >
my $rsync_include_args = undef;
my $rsync_include_file_args = undef;
+ # default is to exit immediately on lockfile collision
+ $config_vars{'lockfile_wait'} = 0;
+
# open the config file
my $config_file = shift() || $config_file;
open(CONFIG, $config_file)
< at > < at > -502,6 +506,23 < at > < at >
next;
}
+ # LOCKFILE_WAIT
+ if ($var eq 'lockfile_wait') {
+ if (!defined($value)) {
+ config_err($file_line_num, "$line - lockfile_wait can not be blank");
+ next;
+ }
+ if (!is_nonnegative_integer($value)) {
+ config_err(
+ $file_line_num, "$line - \"$value\" is not a legal value for lockfile_wait, must be nonnegative integer"
+ );
+ next;
+ }
+ $config_vars{'lockfile_wait'} = $value;
+ $line_syntax_ok = 1;
+ next;
+
+ }
# INCLUDEs
if($var eq 'include_conf') {
if(defined($value) && -f $value && -r $value) {
< at > < at > -1073,6 +1094,23 < at > < at >
$line_syntax_ok = 1;
next;
}
+ # LOCKFILE_WAIT
+ if ($var eq 'lockfile_wait') {
+ if (!defined($value)) {
+ config_err($file_line_num, "$line - lockfile_wait can not be blank");
+ next;
+ }
+ if (!is_nonnegative_integer($value)) {
+ config_err(
+ $file_line_num, "$line - \"$value\" is not a legal value for lockfile_wait, must be nonnegative integer"
+ );
+ next;
+ }
+ $config_vars{'lockfile_wait'} = $value;
+ $line_syntax_ok = 1;
+ next;
+
+ }
# INCLUDE
if ($var eq 'include') {
if (!defined($rsync_include_args)) {
< at > < at > -2055,8 +2093,9 < at > < at >
}
# accepts no arguments
+# waits until it can create a lockfile or exits with 1 as the return value
+# if the wait timed out
# returns undef if lockfile isn't defined in the config file, and 1 upon success
-# also, it can make the program exit with 1 as the return value if it can't create the lockfile
#
# we don't use bail() to exit on error, because that would remove the
# lockfile that may exist from another invocation
< at > < at > -2075,11 +2114,23 < at > < at >
exit(1);
}
+ my $lockfile_wait = $config_vars{'lockfile_wait'};
+
# does a lockfile already exist?
- if (1 == is_real_local_abs_path($lockfile)) {
- print_err ("Lockfile $lockfile exists, can not continue!", 1);
- syslog_err("Lockfile $lockfile exists, can not continue");
- exit(1);
+ while (1 == is_real_local_abs_path($lockfile)) {
+ # check for lockfile timeout
+ if (0 == $lockfile_wait) {
+ # terminate the program
+ print_err ("Lockfile $lockfile exists, can not continue!", 1);
+ syslog_err("Lockfile $lockfile exists, can not continue");
+ exit(1);
+ } else {
+ # wait to see if the situation improves
+ print_warn ("Waiting $lockfile_wait minutes for lockfile $lockfile", 1);
+ syslog_warn("Waiting $lockfile_wait minutes for lockfile $lockfile");
+ sleep(60);
+ $lockfile_wait--;
+ }
}
# create the lockfile
< at > < at > -2376,6 +2427,20 < at > < at >
return (0);
}
+# accepts one argument
+# checks to see if that argument is set to a nonnegative integer
+# returns 1 on success, 0 on failure
+sub is_nonnegative_integer {
+ my $var = shift( < at > _);
+
+ if (!defined($var)) { return (0); }
+ if ($var !~ m/^\d+$/) { return (0); }
+
+ if (0 <= $var) { return (1); }
+
+ return (0);
+}
+
# accepts string
# returns 1 if it is a comment line (beginning with #)
# returns 0 otherwise
--- rsnapshot.conf.default.1.2.9.old 2006-08-27 12:25:47.000000000 -0700
+++ rsnapshot.conf.default.1.2.9.new 2006-08-27 12:26:30.000000000 -0700
< at > < at > -126,6 +126,14 < at > < at >
#
#lockfile /var/run/rsnapshot.pid
+# If a lockfile is enabled, it can be used to exit the program on a collision
+# or it can be used to pause a specified number of minutes until the lockfile
+# has been removed. If the timeout limit is reached before the lockfile is
+# removed, then the program will exit.
+# The default is to exit if a lockfile already exists, i.e. zero timeout.
+#
+#lockfile_wait 0
+
# Default rsync args. All rsync commands have at least these options set.
#
#rsync_short_args -a
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss