SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
Possible bug in status reporting
Author Message
Post Possible bug in status reporting 
I have a win2000 host that I successfully did a backup of yesterday
afternoon using rsyncd (the version on the main BackupPC webpage).
Today I worked on trying out sshd to see if I could set up an SSH
tunnel for security reasons. I searched around and found
http://www.itefix.no/ . I downloaded their copSSH package (a package
containing the Cygwin dll and ssh/sshd programs with an installer that
fixes permissions etc.), installed and configured it, and created a
secondary host aliased to the win2k box (so I could be sure it worked
before using it on the actual host).

I used the version of rsync listed on the main BackupPC webpage and
did a few test rsync's on small directories under the C: drive and
they seemed to work satisfactorally, so I changed over the original
host.

Now the original host was set to use rsync and to backup the entire C:
drive (/cygdrive/c/). I let the backup process run for around an hour
before I realized it seemed to have stalled. I stopped the backup job
and deleted the host as per the instructions in the manual.

I then created a new host dir and tried doing a full backup. Again, it
appeared as though rsync stalled (no activity on the host, sshd and
rsync processes weren't dead, just idle), so I cancelled the backup so
I could see the error log.

The error log showed that it had gotten a few files under one
directory tree, but apparently stopped there. I tried resuming the
backup by doing another full backup, but this did the same thing
except this time didn't produce any output whatsoever in the transfer
logfile.

I stopped the backup process again and again deleted and recreated the
host. When I went to the new host's status page it said that the
original (~2 hours old) backup job was still running. I reloaded the
page to see if it would go away, but each time I reloaded the status
page it (seemingly) randomly listed a different current backup job, or
none at all.

It seems there might be a bug to do with cancelling jobs perhaps. When
I looked at the process list on the host I saw several rsync and sshd
processes that were idle, when none should have still been running. I
killed all the rsync processes and all but one SSHD process was left
(the Cygwin SSH system service).

I should also mention that I ran into this problem with old backup
jobs being listed as current and inconsistent host status yesterday
with another (linux) host. (also using rsync through SSH).

I don't know where to begin to debug this. Anyone have any ideas for
how I can figure out what's going on? And has anyone else had any luck
running rsync through SSH under windows?

--
Justin Guenther
IT Analyst
CrownAg International Inc.
250 Henderson Drive
Regina, SK, Canada S4N 5P7
Tel: (306) 522-8111
Email: justin.guenther < at > crownag.ca


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Post Possible bug in status reporting 
Justin Guenther writes:

I have a win2000 host that I successfully did a backup of yesterday
afternoon using rsyncd (the version on the main BackupPC webpage).
Today I worked on trying out sshd to see if I could set up an SSH
tunnel for security reasons. I searched around and found
http://www.itefix.no/ . I downloaded their copSSH package (a package
containing the Cygwin dll and ssh/sshd programs with an installer that
fixes permissions etc.), installed and configured it, and created a
secondary host aliased to the win2k box (so I could be sure it worked
before using it on the actual host).

I used the version of rsync listed on the main BackupPC webpage and
did a few test rsync's on small directories under the C: drive and
they seemed to work satisfactorally, so I changed over the original
host.

Now the original host was set to use rsync and to backup the entire C:
drive (/cygdrive/c/). I let the backup process run for around an hour
before I realized it seemed to have stalled. I stopped the backup job
and deleted the host as per the instructions in the manual.

I then created a new host dir and tried doing a full backup. Again, it
appeared as though rsync stalled (no activity on the host, sshd and
rsync processes weren't dead, just idle), so I cancelled the backup so
I could see the error log.

The error log showed that it had gotten a few files under one
directory tree, but apparently stopped there. I tried resuming the
backup by doing another full backup, but this did the same thing
except this time didn't produce any output whatsoever in the transfer
logfile.

I stopped the backup process again and again deleted and recreated the
host. When I went to the new host's status page it said that the
original (~2 hours old) backup job was still running. I reloaded the
page to see if it would go away, but each time I reloaded the status
page it (seemingly) randomly listed a different current backup job, or
none at all.

It seems there might be a bug to do with cancelling jobs perhaps. When
I looked at the process list on the host I saw several rsync and sshd
processes that were idle, when none should have still been running. I
killed all the rsync processes and all but one SSHD process was left
(the Cygwin SSH system service).

I should also mention that I ran into this problem with old backup
jobs being listed as current and inconsistent host status yesterday
with another (linux) host. (also using rsync through SSH).

I don't know where to begin to debug this. Anyone have any ideas for
how I can figure out what's going on? And has anyone else had any luck
running rsync through SSH under windows?

As Mike mentioned, I have never found rsync + ssh + cygwin to be
reliable, so I only use rsyncd + cygwin on WinXX machines.

Back to your earlier issues. If you kill BackupPC then it is likely
some status information will be out of date. If you send an INT signal
to BackupPC it will try to clean up gracefully, but it might not. You
don't need to kill and restart BackupPC each time you add a host.

Also, you shouldn't need to delete and recreate the host directory
each time. It's sufficient to just edit that host's config.pl file
and then re-start the backup.

Why don't you check this again with a clean start, without stopping
and restarting BackupPC and deleting host directories etc?

Craig


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Post Possible bug in status reporting 
Back to your earlier issues. If you kill BackupPC then it is likely
some status information will be out of date. If you send an INT signal
to BackupPC it will try to clean up gracefully, but it might not. You
don't need to kill and restart BackupPC each time you add a host.

I didn't kill the server, I hit the 'stop/dequeue backup' button on
the host's summary page. Shouldn't cancelling a job remove the job's
entry in the server status?

Also, you shouldn't need to delete and recreate the host directory
each time. It's sufficient to just edit that host's config.pl file
and then re-start the backup.

I tried deleting the host directory only after I had problems with old
backups interfering with new backups (I had changed the directory that
BackupPC was set to backup on the host). I was just trying to
eliminate possible sources of error.

I'll try doing this again, I'm going to try installing various
combinations of versions of ssh/rsync.

--
Justin Guenther
IT Analyst
CrownAg International Inc.
250 Henderson Drive
Regina, SK, Canada S4N 5P7
Tel: (306) 522-8111
Email: justin.guenther < at > crownag.ca


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Post Possible bug in status reporting 
More on this issue:

I think the problem is that the cached status of deleted hosts aren't
deleted, or at least it wasn't for me.

I tried this again today. I have a host 'what' that had 4 backups done
on it. I wanted to try something out clean so I deleted the pc/what
dir, and edited out the entry in the hosts file, and reloaded the
configuration (this is what it says to do in the manual).

Today, unlike before, I ran a 'BackupPC_nightly 0 255' command to
clean up the pool as well. After this was complete I added the host
back to the hosts file and recreated the host dir and config.pl.

Now, when I view the host's status, I get a random status message. For
instance, loading the status page for the host might give:

----
This PC is used by jguenther.
Last status is state "idle" (nothing to do) as of 7/27 11:00.
Pings to what have succeeded 13 consecutive times.
Because what has been on the network at least 7 consecutive times, it
will not be backed up from 7:30 to 17:00 on Mon, Tue, Wed, Thu, Fri.
----

and then subsequently reloading the status page might give

----
This PC is used by jguenther.
Last status is state "idle" (nothing to do) as of 7/27 13:00.
Pings to what have succeeded 1 consecutive times.
----

or

----
This PC is used by jguenther.
Last status is state "" as of 7/27 13:42.
----

It's random which of these status messages show up on the status page
for the host.

Did I do something wrong in how I removed the host, or is there
something I'm missing, or is this a bug? Let me know if there's
anything I can do to help fix this.

Thanks

--
Justin Guenther
IT Analyst
CrownAg International Inc.
250 Henderson Drive
Regina, SK, Canada S4N 5P7
Tel: (306) 522-8111
Email: justin.guenther < at > crownag.ca


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Post Possible bug in status reporting 
Look in TOPDIR/data/log/status.pl

Delete the lines associated with the old "what" server and restart BPC.

--
Tony Nelson
Director of IT Operations
Starpoint Solutions LLC
115 Broadway, 2nd Fl
New York, NY 10006


Quoting Justin Guenther <jguenther < at > gmail.com>:

More on this issue:

I think the problem is that the cached status of deleted hosts aren't
deleted, or at least it wasn't for me.

I tried this again today. I have a host 'what' that had 4 backups done
on it. I wanted to try something out clean so I deleted the pc/what
dir, and edited out the entry in the hosts file, and reloaded the
configuration (this is what it says to do in the manual).

Today, unlike before, I ran a 'BackupPC_nightly 0 255' command to
clean up the pool as well. After this was complete I added the host
back to the hosts file and recreated the host dir and config.pl.

Now, when I view the host's status, I get a random status message. For
instance, loading the status page for the host might give:

----
This PC is used by jguenther.
Last status is state "idle" (nothing to do) as of 7/27 11:00.
Pings to what have succeeded 13 consecutive times.
Because what has been on the network at least 7 consecutive times, it
will not be backed up from 7:30 to 17:00 on Mon, Tue, Wed, Thu, Fri.
----

and then subsequently reloading the status page might give

----
This PC is used by jguenther.
Last status is state "idle" (nothing to do) as of 7/27 13:00.
Pings to what have succeeded 1 consecutive times.
----

or

----
This PC is used by jguenther.
Last status is state "" as of 7/27 13:42.
----

It's random which of these status messages show up on the status page
for the host.

Did I do something wrong in how I removed the host, or is there
something I'm missing, or is this a bug? Let me know if there's
anything I can do to help fix this.

Thanks

--
Justin Guenther
IT Analyst
CrownAg International Inc.
250 Henderson Drive
Regina, SK, Canada S4N 5P7
Tel: (306) 522-8111
Email: justin.guenther < at > crownag.ca


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/



This email message from Starpoint Solutions LLC is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. Opinions, conclusions and other information in this message that do not relate to the official business of Starpoint Solutions shall be understood as neither given nor endorsed by it.

Post Possible bug in status reporting 
Justin Guenther writes:

I think the problem is that the cached status of deleted hosts aren't
deleted, or at least it wasn't for me.

I tried this again today. I have a host 'what' that had 4 backups done
on it. I wanted to try something out clean so I deleted the pc/what
dir, and edited out the entry in the hosts file, and reloaded the
configuration (this is what it says to do in the manual).

Today, unlike before, I ran a 'BackupPC_nightly 0 255' command to
clean up the pool as well. After this was complete I added the host
back to the hosts file and recreated the host dir and config.pl.

Now, when I view the host's status, I get a random status message. For
instance, loading the status page for the host might give:

----
This PC is used by jguenther.
Last status is state "idle" (nothing to do) as of 7/27 11:00.
Pings to what have succeeded 13 consecutive times.
Because what has been on the network at least 7 consecutive times, it
will not be backed up from 7:30 to 17:00 on Mon, Tue, Wed, Thu, Fri.
----

and then subsequently reloading the status page might give

----
This PC is used by jguenther.
Last status is state "idle" (nothing to do) as of 7/27 13:00.
Pings to what have succeeded 1 consecutive times.
----

or

----
This PC is used by jguenther.
Last status is state "" as of 7/27 13:42.
----

It's random which of these status messages show up on the status page
for the host.

Did I do something wrong in how I removed the host, or is there
something I'm missing, or is this a bug? Let me know if there's
anything I can do to help fix this.

Looks like a bug. I could see this could happen in the following case:

- you are running mod_perl
- you deleted the host, then tell BackupPC to reload
- you re-added the host, but not tell BackupPC to reload
- a backup has not yet been started or run on this host
- you inspect then inspect the status.

When this happens I suspect you should see this error in the main
log file:

Unknown host HOST for status request

Do you see this message?

If all this is true, I'd recommend trying this patch (untested).

Craig

--- lib/BackupPC/CGI/Lib.pm 2004-07-10 15:35:32.000000000 -0700
+++ lib/BackupPC/CGI/Lib.pm 2004-07-27 13:27:30.979020800 -0700
< at > < at > -287,6 +287,8 < at > < at >
{
my($status) = < at > _;
ServerConnect();
+ %Status = () if ( $status =~ /\bhosts\b/ );
+ %StatusHost = () if ( $status =~ /\bhost\(/ );
my $reply = $bpc->ServerMesg("status $status");
$reply = $1 if ( $reply =~ /(.*)/s );
eval($reply);


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Post Possible bug in status reporting 
Looks like a bug. I could see this could happen in the following case:
- you are running mod_perl
- you deleted the host, then tell BackupPC to reload
Both of these are true

- you re-added the host, but not tell BackupPC to reload
After adding the host, I did a 'touch TOPDIR/conf/hosts' to reload the info.

- a backup has not yet been started or run on this host
This is true yes (see below)

When this happens I suspect you should see this error in the main
log file:

Unknown host HOST for status request

Do you see this message?

2004-07-27 13:32:32 Unknown host what for status request (there are
several of these in the logfile)

Regarding my note above: I tried to do a manual full backup several
times from the host's summary page, and each time it gave me the above
"Unknown host" error message. I had thought this meant the hostname
lookup failed, which didn't make any sense because I could ping the
host.

I'll test this patch and try doing the same process again.

--
Justin Guenther
IT Analyst
CrownAg International Inc.
250 Henderson Drive
Regina, SK, Canada S4N 5P7
Tel: (306) 522-8111
Email: justin.guenther < at > crownag.ca


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Post Possible bug in status reporting 
I'm a bit new to this whole 'patch' thing, could anyone tell me what
I'm doing wrong? I pasted the patch given into a file and did a `patch
-b -p0 <file.patch' (i was in the backuppc main dir, where bin and lib
are)

Am I missing an option or something?

--
Justin Guenther
IT Analyst
CrownAg International Inc.
250 Henderson Drive
Regina, SK, Canada S4N 5P7
Tel: (306) 522-8111
Email: justin.guenther < at > crownag.ca


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Post Possible bug in status reporting 
Justin Guenther writes:

Looks like a bug. I could see this could happen in the following case:
- you are running mod_perl
- you deleted the host, then tell BackupPC to reload
Both of these are true

- you re-added the host, but not tell BackupPC to reload
After adding the host, I did a 'touch TOPDIR/conf/hosts' to reload the info.

Just touching the file won't do a reload. BackupPC doesn't know about
the new host. The problem you saw is because the CGI interface loads
the new host file, but BackupPC doesn't know about it. Because you
are using mod_perl, the CGI interface uses some old status (seemingly
random based on which apache process handles your request) since
BackupPC doesn't return anything for the new host.

To tell BackupPC to reload the host file you need to do one of
the following:

- wait until the next WakeupSchedule time - BackupPC rechecks config.pl
and hosts then,

- kill -HUP BackupPC_pid, or equivalently: /etc/init.d/backuppc reload,

- restart BackupPC,

- use the "reload" button on the Admin CGI screen,

- run: /usr/local/BackupPC/bin/BackupPC_serverMesg reload.

Craig


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Post Possible bug in status reporting 
On Tue, 27 Jul 2004 14:02:19 -0700, Craig Barratt
<cbarratt < at > users.sourceforge.net> wrote:
Just touching the file won't do a reload. BackupPC doesn't know about
the new host. The problem you saw is because the CGI interface loads
the new host file, but BackupPC doesn't know about it. Because you
are using mod_perl, the CGI interface uses some old status (seemingly
random based on which apache process handles your request) since
BackupPC doesn't return anything for the new host.

To tell BackupPC to reload the host file you need to do one of
the following:

- wait until the next WakeupSchedule time - BackupPC rechecks config.pl
and hosts then,

- kill -HUP BackupPC_pid, or equivalently: /etc/init.d/backuppc reload,

- restart BackupPC,

- use the "reload" button on the Admin CGI screen,

- run: /usr/local/BackupPC/bin/BackupPC_serverMesg reload.

Craig


Ahh... this makes more sense. I had just been touching the files to
reload the info, but I'll start doing a manual reload now. Thanks!

Also, regarding the patch thing, it was because of inconsistent
whitespace. I figured it out, thanks again.

--
Justin Guenther
IT Analyst
CrownAg International Inc.
250 Henderson Drive
Regina, SK, Canada S4N 5P7
Tel: (306) 522-8111
Email: justin.guenther < at > crownag.ca


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB