SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
Encountering the NFS "Directory not empty" error
Author Message
Post Encountering the NFS "Directory not empty" error 
After using rdiff for daily backups for the last couple weeks, I've run
into the problem described at
http://cvs.lp.se/doc/rdiff-backup/FAQ.html#dir_not_empty

It appears to be triggered by removal of a directory in the production
copy, which creates problems when the backup copy tries to duplicate the
removal.

I am not able to perform the backup outside of NFS, as we are doing this
in case of a local server failure while also conforming to existing
internal network infrastructure. Is there a known workaround for this
issue? Has it been resolved in the 1.1 series? If the answer to these is
"no", are there alternative backup systems similar in functionality to
rdiff-backup? (preferably with support for diffs of binary files, like
what rdiff-backup uses)

Thanks!

Nick

This is the error itself:

Traceback (most recent call last):
File "/usr/bin/rdiff-backup", line 23, in ?
rdiff_backup.Main.Main(sys.argv[1:])
File "/usr/lib/python2.4/site-packages/rdiff_backup/Main.py", line
283, in Main
take_action(rps)
File "/usr/lib/python2.4/site-packages/rdiff_backup/Main.py", line
253, in take_action
elif action == "backup": Backup(rps[0], rps[1])
File "/usr/lib/python2.4/site-packages/rdiff_backup/Main.py", line
303, in Backup
backup.Mirror_and_increment(rpin, rpout, incdir)
File "/usr/lib/python2.4/site-packages/rdiff_backup/backup.py", line
51, in Mirror_and_increment
DestS.patch_and_increment(dest_rpath, source_diffiter, inc_rpath)
File "/usr/lib/python2.4/site-packages/rdiff_backup/backup.py", line
229, in patch_and_increment
ITR(diff.index, diff)
File "/usr/lib/python2.4/site-packages/rdiff_backup/rorpiter.py",
line 281, in __call__
if self.finish_branches(index) is None:
File "/usr/lib/python2.4/site-packages/rdiff_backup/rorpiter.py",
line 233, in finish_branches
to_be_finished.end_process()
File "/usr/lib/python2.4/site-packages/rdiff_backup/backup.py", line
574, in end_process
self.base_rp.rmdir()
File "/usr/lib/python2.4/site-packages/rdiff_backup/rpath.py", line
806, in rmdir
self.conn.os.rmdir(self.path)
OSError: [Errno 39] Directory not empty:
'/backup/trantor/project_local/old/trac_data/templates/0.8'
Exception exceptions.TypeError: "'NoneType' object is not callable" in
<bound method GzipFile.__del__ of <gzip open file
'/backup/trantor/project_local/rdiff-backup-data/file_statistics.2005-10-29T16:50:49-04:00.data.gz',
mode 'wb' at 0xb7bb6f50 -0x4842b234>> ignored
Exception exceptions.TypeError: "'NoneType' object is not callable" in
<bound method GzipFile.__del__ of <gzip open file
'/backup/trantor/project_local/rdiff-backup-data/error_log.2005-10-29T16:50:49-04:00.data.gz',
mode 'wb' at 0xb7ea1f08 -0x4842b074>> ignored
Exception exceptions.TypeError: "'NoneType' object is not callable" in
<bound method GzipFile.__del__ of <gzip open file
'/backup/trantor/project_local/rdiff-backup-data/mirror_metadata.2005-10-29T16:50:49-04:00.snapshot.gz',
mode 'wb' at 0xb7bb6f98 -0x48427a14>> ignored

Post Encountering the NFS "Directory not empty" error 
Hi Nick,

On Saturday 29 October 2005 23:21, Nick Parker wrote:
After using rdiff for daily backups for the last couple weeks, I've run
into the problem described at
http://cvs.lp.se/doc/rdiff-backup/FAQ.html#dir_not_empty

It appears to be triggered by removal of a directory in the production
copy, which creates problems when the backup copy tries to duplicate the
removal.

I am not able to perform the backup outside of NFS, as we are doing this
in case of a local server failure while also conforming to existing
internal network infrastructure. Is there a known workaround for this
issue? Has it been resolved in the 1.1 series? If the answer to these is
"no", are there alternative backup systems similar in functionality to
rdiff-backup? (preferably with support for diffs of binary files, like
what rdiff-backup uses)

Thanks!

I'm pretty sure the NFS problem is actually a general problem, but which is
hidden by most kernels/filesystems. Actually, rdiff-backup does not close
some filehandles before it tries to delete the files. On most filesystems,
the files and also parent directories still can be deleted, though to the
user it only seems they are deleted - its still on the disk, only no *new*
filedescriptors can be opened. Old filedescriptors still have full access. On
nfs this is different since it goes over the network and since the
nfs-protocol up to v3 has no native filelocking (NFSv4 has native
filelocking, but I still didn't have the time to test it). So on NFS
directories, the client kernel can't hide the deleted file as on other
filesystems, but moves each file which is deleted, but which still has open
filedescriptors (fd), to .nfs* files. Those .nfs files can't be deleted until
the last fd with access to this file has been closed. Unfortunately of
course, also the parent directory can't be deleted then... *This is then the
acutual problem you will notice*

A few month ago I already tried to fix this, but did it wrong (due to my lack
of Python knowledge), unfortutely my lack of time (need to finish my Ph.D.
thesis as soon as possible, but there are still unsolved problems) prevented
further me to look deeper into it Sad I will really do another attempt this or
the next weekend.
Well a workaround, yes there is one, mount your nfs-directory without locking
support (-olock on linux). But be carefull, other applications which may need
locking support, will have their own problems then.

Cheers,
Bernd



--
Bernd Schubert
PCI / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg

Post Encountering the NFS "Directory not empty" error 
Bernd Schubert <bernd-schubert < at > gmx.de>
wrote the following on Sun, 30 Oct 2005 01:23:21 +0200

I'm pretty sure the NFS problem is actually a general problem, but
which is hidden by most kernels/filesystems. Actually, rdiff-backup
does not close some filehandles before it tries to delete the
files. On most filesystems, the files and also parent directories
still can be deleted, though to the user it only seems they are
deleted - its still on the disk, only no *new* filedescriptors can
be opened. Old filedescriptors still have full access. On nfs this
is different since it goes over the network and since the
nfs-protocol up to v3 has no native filelocking (NFSv4 has native
filelocking, but I still didn't have the time to test it). So on NFS
directories, the client kernel can't hide the deleted file as on
other filesystems, but moves each file which is deleted, but which
still has open filedescriptors (fd), to .nfs* files. Those .nfs
files can't be deleted until the last fd with access to this file
has been closed. Unfortunately of course, also the parent directory
can't be deleted then... *This is then the acutual problem you will
notice*

I had considered the possibility that rdiff-backup was failing to
close files, but I spent some time looking through the code and
couldn't find any loose files. Also from the reports I've gotten it
seems this error occurs inconsistently, and at different places, even
though rdiff-backup is single-threaded and purely deterministic.

But your paragraph explains the mechanism well, and suggests that it's
an rdiff-backup problem after all.

A few month ago I already tried to fix this, but did it wrong (due
to my lack of Python knowledge), unfortutely my lack of time (need
to finish my Ph.D. thesis as soon as possible, but there are still
unsolved problems) prevented further me to look deeper into it Sad I
will really do another attempt this or the next weekend.

From my perspective, the hard part seems to be replicating the problem
consistently, and figuring out exactly which file(s) are not getting
closed. If you can do this (which should be doable with no Python
knowledge), then it may be easy for me or someone else to fix it.

Good luck with your thesis :-)


--
Ben Escoto

Post Encountering the NFS "Directory not empty" error 
Hello Ben,

I had considered the possibility that rdiff-backup was failing to
close files, but I spent some time looking through the code and
couldn't find any loose files. Also from the reports I've gotten it
seems this error occurs inconsistently, and at different places, even
though rdiff-backup is single-threaded and purely deterministic.

But your paragraph explains the mechanism well, and suggests that it's
an rdiff-backup problem after all.

I already checked my theory some time ago, by just putting a long sleep before
the rmdir exception. Then I looked into the directory and it showed .nfs
files. Furthermore /proc/{PID of rdiff-backup}/fd showed open filedescriptors
to those files.


A few month ago I already tried to fix this, but did it wrong (due
to my lack of Python knowledge), unfortutely my lack of time (need
to finish my Ph.D. thesis as soon as possible, but there are still
unsolved problems) prevented further me to look deeper into it Sad I
will really do another attempt this or the next weekend.

From my perspective, the hard part seems to be replicating the problem
consistently, and figuring out exactly which file(s) are not getting
closed. If you can do this (which should be doable with no Python
knowledge), then it may be easy for me or someone else to fix it.

Well, replicating is very easy, I think I can do this within seconds, I only
need one file in one dir. So I think its also easy to figure out which file
it is. From my point of view its difficult to find out the corresponding
open() in rdiff-backup. I already tried a python-debugger, but it always
fails with another exception (I believe to remember its an open() to a
non-existent file), I also still do not understand why it doesn't rise this
exception without a debugger. If you are also interested in this issue, I can
tell you tomorrow or on Tuesday the exact problem.
Another possibility would be do monitor all open() and close() calls, but I
think there are rather many of them and all of them would need a print or
somethink link that.

Another point I also still don't understand is how the deleted files reappear
in the directory on the next rdiff-backup run. In principle the .nfs files
are immediately deleted after rdiff-backup has rised its exception and closes
itself (that's why one usually can't see the .nfs files). In principle I
would expect rdiff-backup to succeed on the next run, but on the next run
those already deleted files reappear again :(

Sorry, today I again had no time to further care about it (its time to go to
sleep now), but we have a public holiday on Tuesday and I will try to look
into it agai tomorrow evening or on Tuesday.


Good luck with your thesis :-)

Thanks, I really need it (we need to find a numerical workaround for a
mathematical singularity and now its seems only a small step is missing).

Cheers,
Bernd

--
Bernd Schubert
PCI / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg

Post Encountering the NFS "Directory not empty" error 
Bernd Schubert <bernd-schubert < at > gmx.de>
wrote the following on Mon, 31 Oct 2005 00:18:19 +0100

I already checked my theory some time ago, by just putting a long
sleep before the rmdir exception. Then I looked into the directory
and it showed .nfs files. Furthermore /proc/{PID of rdiff-backup}/fd
showed open filedescriptors to those files.
...
Well, replicating is very easy, I think I can do this within seconds,
I only need one file in one dir.

Ahh, interesting. So can you just look at the file and tell me which
one it is? Like if the backup directory "backup" looks like

backup/foo/somefile

and the new source directory "source" doesn't have the /foo directory
in it, and then you run

rdiff-backup source backup

does that cause the error all the time? What is the file that's
hanging around, is it "somefile" itself?

I already tried a python-debugger, but it always fails with another
exception (I believe to remember its an open() to a non-existent
file), I also still do not understand why it doesn't rise this
exception without a debugger.

Dunno I don't use debuggers.

Another possibility would be do monitor all open() and close()
calls, but I think there are rather many of them and all of them
would need a print or somethink link that.

Yes, that's definitely doable if you can reproduce the problem with
such a small data set.


--
Ben Escoto

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB