I spotted this in my log file when running some backups manually. I'm
surprised that no-one's noticed it before ...
I start 'rsnapshot weekly' ...
[18/Apr/2011:23:04:41] /usr/local/bin/rsnapshot weekly: started
[18/Apr/2011:23:04:41] echo 432 > /var/run/rsnapshot.pid
... [weekly backups get rotated] ...
[18/Apr/2011:23:04:42] mv /Volumes/Backups/snapshots/daily.6/ /Volumes/Backups/snapshots/weekly.0/
[18/Apr/2011:23:04:42] rm -f /var/run/rsnapshot.pid
[18/Apr/2011:23:04:42] /bin/rm -rf /Volumes/Backups/snapshots/_delete.432
So far so good. Now, seeing that the lockfile is gone, I start
'rsnapshot daily' ...
[18/Apr/2011:23:06:30] /usr/local/bin/rsnapshot daily: started
[18/Apr/2011:23:06:30] echo 473 > /var/run/rsnapshot.pid
... [daily backups get rotated] ...
[18/Apr/2011:23:06:31] native_cp_al("/Volumes/Backups/snapshots/daily.0", "/Volumes/Backups/snapshots/daily.1")
but while that's happening, the weekly process logs ...
[18/Apr/2011:23:34:58] rm -f /var/run/rsnapshot.pid <--- AWOOGA! AWOOGA!
[18/Apr/2011:23:34:58] /usr/local/bin/rsnapshot weekly: completed successfully
while the daily process continues with native_cp_al, before getting on to
fixing up some stuff and backing up my data ...
[19/Apr/2011:00:08:10] /usr/bin/rsync -a --delete --numeric-ids /Volumes/Backups/snapshots/daily.0/ /Volumes/Backups/snapshots/daily.1/
This bug appears to be still present in CVS. If you search for
'remove_lockfile()' in CVS (http://tinyurl.com/rsnapshot-1-428-cvs) you
will see that lines 285 and 288 are:
handle_interval( $cmd );
remove_lockfile();
However, inside handle_interval we find ...
2931 # if use_lazy_delete is on, delete the _delete.$$ directory
2932 # we just check for the directory, it will have been created or not depending on the value of use_lazy_delete
2933 if ( -d "$config_vars{'snapshot_root'}/_delete.$$" ) {
2934 # this is the last thing to do here, and it can take quite a
while.
2935 # we remove the lockfile here since this delete shouldn't block other rsnapshot jobs from running
2936 remove_lockfile();
So, if use_lazy_delete is on, we correctly unlock before deleting, but
then once the delete has finished (it's the last thing in handle_interval)
we blindly unlock again, even if the lock (if one exists) is owned by
another process.
A quick and easy solution would be to only execute the remove_lockfile()
at line 288 if use_lazy_deletes is *not* turned on. The more robust
solution, which would fix any other related errors, would be for the
remove_lockfile() subroutine to check that the lock file belongs to the
correct process before unlocking. To aid in finding any other such
errors, I suggest that remove_lockfile() log when it refuses to remove
other processes' locks.
I don't have the time to fix this right now, but if no-one gets in with
a fix before the weekend, I'll get it done then.
--
David Cantrell | semi-evolved ape-thing
I caught myself pulling grey hairs out of my beard.
I'm definitely not going grey, but I am going vain.
------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev
_______________________________________________
rsnapshot-discuss mailing list
rsnapshot-discuss < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rsnapshot-discuss
