SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
Bacula Waits for 40min When a Client is Down
Author Message
Post Bacula Waits for 40min When a Client is Down 
Hi,

On 6/6/2007 10:57 PM, Kyle Marsh wrote:
Howdy,

I'm working on a bacula setup for my college and I have found that
when a client goes down, whether it's firewalled, turned off, or
otherwise disconnected from the network, bacula seems to hang for
about 40 minutes

Unusual timeout, in my experience... I'd expect nearly instantaneous job
failure or the IT-related two hours...

before deciding that the client isn't there and
stopping. This could become problematic if we have several machines
down each night and could cause substantial problems if some backups
don't start until people are back working. Is there a directive that
allows me to specify something sane as the timeout period, and where
does it need to go?

I prefer leaving the timeouts to Bacula, and instead use "Run Before
Job" scripts to ping the clients. Concurrent jobs are a reasonable
solution against long-running or stalled jobs.

Arno

Thanks,

Kyle Marsh

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bacula-users mailing list
Bacula-users < at > li...
https://lists.sourceforge.net/lists/listinfo/bacula-users

--
IT-Service Lehmann al < at > it...
Arno Lehmann http://www.its-lehmann.de

Post Bacula Waits for 40min When a Client is Down 
I had the same thought about time-outs just yesterday. I am new to
Bacula, and was testing my second client backup, when I realized I had
not defined the client in my host file. I run my backups on an internal
network (non-public) so the DNS for this was not available either. I was
using bconsole, and the job hung up due to not being able to find the
client, and I was wondering how to kill it immediately, as it wasn't
going to find the client. As a new user, I didn't know if there is a
timeout value I could set, or how long it was going to run (maybe even
forever?).

So, good question, Arno, and I hope someone provides the answer.

Steve

Arno Lehmann wrote:
Hi,

On 6/6/2007 10:57 PM, Kyle Marsh wrote:

Howdy,

I'm working on a bacula setup for my college and I have found that
when a client goes down, whether it's firewalled, turned off, or
otherwise disconnected from the network, bacula seems to hang for
about 40 minutes


Unusual timeout, in my experience... I'd expect nearly instantaneous job
failure or the IT-related two hours...


before deciding that the client isn't there and
stopping. This could become problematic if we have several machines
down each night and could cause substantial problems if some backups
don't start until people are back working. Is there a directive that
allows me to specify something sane as the timeout period, and where
does it need to go?



I prefer leaving the timeouts to Bacula, and instead use "Run Before
Job" scripts to ping the clients. Concurrent jobs are a reasonable
solution against long-running or stalled jobs.

Arno


Thanks,

Kyle Marsh




Post Bacula Waits for 40min When a Client is Down 
Hi,

On 6/7/2007 2:42 PM, Steve Campbell wrote:
I had the same thought about time-outs just yesterday. I am new to
Bacula, and was testing my second client backup, when I realized I had
not defined the client in my host file. I run my backups on an internal
network (non-public) so the DNS for this was not available either. I was
using bconsole, and the job hung up due to not being able to find the
client,

It should terminate almost immediately.

and I was wondering how to kill it immediately, as it wasn't
going to find the client.

The cancel command...

As a new user, I didn't know if there is a
timeout value I could set, or how long it was going to run (maybe even
forever?).

So, good question, Arno, and I hope someone provides the answer.

Erm, which was the question?

:-)

Arno

Steve

Arno Lehmann wrote:
Hi,

On 6/6/2007 10:57 PM, Kyle Marsh wrote:

Howdy,

I'm working on a bacula setup for my college and I have found that
when a client goes down, whether it's firewalled, turned off, or
otherwise disconnected from the network, bacula seems to hang for
about 40 minutes

Unusual timeout, in my experience... I'd expect nearly instantaneous job
failure or the IT-related two hours...


before deciding that the client isn't there and
stopping. This could become problematic if we have several machines
down each night and could cause substantial problems if some backups
don't start until people are back working. Is there a directive that
allows me to specify something sane as the timeout period, and where
does it need to go?


I prefer leaving the timeouts to Bacula, and instead use "Run Before
Job" scripts to ping the clients. Concurrent jobs are a reasonable
solution against long-running or stalled jobs.

Arno


Thanks,

Kyle Marsh




-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bacula-users mailing list
Bacula-users < at > li...
https://lists.sourceforge.net/lists/listinfo/bacula-users

--
IT-Service Lehmann al < at > it...
Arno Lehmann http://www.its-lehmann.de

Post Bacula Waits for 40min When a Client is Down 
Sorry, Arno, I mistyped the OP. I meant Kyle.

I'm also trying out Thunderbird with a few Quoting extensions, so I
misread who sent the original question.

I also haven't figured out why it wants to send only to the poster,
instead of the list, for Reply, so I need to Reply All..

Please forgive me.

Steve

Arno Lehmann wrote:
Hi,

On 6/7/2007 2:42 PM, Steve Campbell wrote:

I had the same thought about time-outs just yesterday. I am new to
Bacula, and was testing my second client backup, when I realized I had
not defined the client in my host file. I run my backups on an internal
network (non-public) so the DNS for this was not available either. I was
using bconsole, and the job hung up due to not being able to find the
client,


It should terminate almost immediately.


and I was wondering how to kill it immediately, as it wasn't
going to find the client.


The cancel command...


As a new user, I didn't know if there is a
timeout value I could set, or how long it was going to run (maybe even
forever?).

So, good question, Arno, and I hope someone provides the answer.


Erm, which was the question?

:-)

Arno


Steve

Arno Lehmann wrote:

Hi,

On 6/6/2007 10:57 PM, Kyle Marsh wrote:


Howdy,

I'm working on a bacula setup for my college and I have found that
when a client goes down, whether it's firewalled, turned off, or
otherwise disconnected from the network, bacula seems to hang for
about 40 minutes


Unusual timeout, in my experience... I'd expect nearly instantaneous job
failure or the IT-related two hours...



before deciding that the client isn't there and
stopping. This could become problematic if we have several machines
down each night and could cause substantial problems if some backups
don't start until people are back working. Is there a directive that
allows me to specify something sane as the timeout period, and where
does it need to go?


I prefer leaving the timeouts to Bacula, and instead use "Run Before
Job" scripts to ping the clients. Concurrent jobs are a reasonable
solution against long-running or stalled jobs.

Arno



Thanks,

Kyle Marsh



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bacula-users mailing list
Bacula-users < at > li...
https://lists.sourceforge.net/lists/listinfo/bacula-users




Post Bacula Waits for 40min When a Client is Down 
On 6/7/07, Arno Lehmann <al < at > it...> wrote:
Hi,

On 6/7/2007 2:42 PM, Steve Campbell wrote:
I had the same thought about time-outs just yesterday. I am new to
Bacula, and was testing my second client backup, when I realized I had
not defined the client in my host file. I run my backups on an internal
network (non-public) so the DNS for this was not available either. I was
using bconsole, and the job hung up due to not being able to find the
client,

It should terminate almost immediately.


And yet the fact remains that for both Steve and me, it doesn't. You
suggested pinging the client with RunBeforeJob. Is there a better way
to do this than adding a new line in each job? You cannot put it in
the JobDefs because you need the hostname, and of course you cannot
extract that from anything in the Job field as far as I know, so it
has to appear magically.

and I was wondering how to kill it immediately, as it wasn't
going to find the client.

The cancel command...


Of course, I don't believe the cancel command is particularly
effective when your console is busy with something else, like waiting
40 minutes for a response that isn't coming. You need to start a new
bconsole and run cancel from there as far as I can tell.

As a new user, I didn't know if there is a
timeout value I could set, or how long it was going to run (maybe even
forever?).

So, good question, Arno, and I hope someone provides the answer.

Erm, which was the question?

:-)

Arno

Steve

Arno Lehmann wrote:
Hi,

On 6/6/2007 10:57 PM, Kyle Marsh wrote:

Howdy,

I'm working on a bacula setup for my college and I have found that
when a client goes down, whether it's firewalled, turned off, or
otherwise disconnected from the network, bacula seems to hang for
about 40 minutes

Unusual timeout, in my experience... I'd expect nearly instantaneous job
failure or the IT-related two hours...


before deciding that the client isn't there and
stopping. This could become problematic if we have several machines
down each night and could cause substantial problems if some backups
don't start until people are back working. Is there a directive that
allows me to specify something sane as the timeout period, and where
does it need to go?


I prefer leaving the timeouts to Bacula, and instead use "Run Before
Job" scripts to ping the clients. Concurrent jobs are a reasonable
solution against long-running or stalled jobs.

Arno


Thanks,

Kyle Marsh




-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bacula-users mailing list
Bacula-users < at > li...
https://lists.sourceforge.net/lists/listinfo/bacula-users

--
IT-Service Lehmann al < at > it...
Arno Lehmann http://www.its-lehmann.de

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bacula-users mailing list
Bacula-users < at > li...
https://lists.sourceforge.net/lists/listinfo/bacula-users


Post Bacula Waits for 40min When a Client is Down 
Hi,

On 6/7/2007 7:28 PM, Kyle Marsh wrote:
Sorry if this double posts -- I used the wrong e-mail and the first
copy is at the mercy of the moderator. You can kill that one, btw.

On 6/7/07, Arno Lehmann <al < at > it...> wrote:
Hi,

On 6/7/2007 2:42 PM, Steve Campbell wrote:
I had the same thought about time-outs just yesterday. I am new to
Bacula, and was testing my second client backup, when I realized I had
not defined the client in my host file. I run my backups on an internal
network (non-public) so the DNS for this was not available either. I was
using bconsole, and the job hung up due to not being able to find the
client,
It should terminate almost immediately.


And yet the fact remains that for both Steve and me, it doesn't. You
suggested pinging the client with RunBeforeJob. Is there a better way
to do this than adding a new line in each job? You cannot put it in
the JobDefs because you need the hostname, and of course you cannot
extract that from anything in the Job field as far as I know, so it
has to appear magically.

Hmm... right. Using a python event would perhaps work, but I haven't
investigated this.

For my setups, the clients affected by possible non-availability are
only a minority, so I just added the line to the jobs as needed.


and I was wondering how to kill it immediately, as it wasn't
going to find the client.
The cancel command...


Of course, I don't believe the cancel command is particularly
effective when your console is busy with something else, like waiting
40 minutes for a response that isn't coming.

Ah, I misunderstood the problem... I assumed it was the job that was
stalled, but it's the console itself.

This is definitely another problem, then. In my experience, the console
returns always immediately after a run command, except when the catalog
database is currently locked (which should only happen while the catalog
backup is running).

You need to start a new
bconsole and run cancel from there as far as I can tell.

Yes, when the consle is stuck, you're right.

As a new user, I didn't know if there is a
timeout value I could set, or how long it was going to run (maybe even
forever?).

For this particular issue, there is nothing you can configure as far as
I can tell.

If this problem persists and is not related to the catalog database
backend, I'd suggesting either running tcpdump or wireshark to observe
what the DIR and console exchange, or run the DIR with debug output.
That might tell us where the wait time comes from.

Arno

So, good question, Arno, and I hope someone provides the answer.
Erm, which was the question?

:-)

Arno

Steve

Arno Lehmann wrote:
Hi,

On 6/6/2007 10:57 PM, Kyle Marsh wrote:

Howdy,

I'm working on a bacula setup for my college and I have found that
when a client goes down, whether it's firewalled, turned off, or
otherwise disconnected from the network, bacula seems to hang for
about 40 minutes

Unusual timeout, in my experience... I'd expect nearly instantaneous job
failure or the IT-related two hours...


before deciding that the client isn't there and
stopping. This could become problematic if we have several machines
down each night and could cause substantial problems if some backups
don't start until people are back working. Is there a directive that
allows me to specify something sane as the timeout period, and where
does it need to go?

I prefer leaving the timeouts to Bacula, and instead use "Run Before
Job" scripts to ping the clients. Concurrent jobs are a reasonable
solution against long-running or stalled jobs.

Arno


Thanks,

Kyle Marsh



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bacula-users mailing list
Bacula-users < at > li...
https://lists.sourceforge.net/lists/listinfo/bacula-users
--
IT-Service Lehmann al < at > it...
Arno Lehmann http://www.its-lehmann.de

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bacula-users mailing list
Bacula-users < at > li...
https://lists.sourceforge.net/lists/listinfo/bacula-users


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bacula-users mailing list
Bacula-users < at > li...
https://lists.sourceforge.net/lists/listinfo/bacula-users

--
IT-Service Lehmann al < at > it...
Arno Lehmann http://www.its-lehmann.de

Post Bacula Waits for 40min When a Client is Down 
Hi,

On 6/7/2007 11:05 PM, Kyle Marsh wrote:
...
That's an interesting suggestion -- I could have a Python script that
gets called and parses the config file to determine the full name from
the client name. That gets rid of the magic, at least (or replaces it
with worse magic, depending on your perspective).

Well, personally, I like bad magic, but not in production systems :-)

I was rather thinking of making the client resource available to to the
python event, but I have no idea how that would be done.

For my setups, the clients affected by possible non-availability are
only a minority, so I just added the line to the jobs as needed.


Unfortunately I'm not sure how the boxes are to be set up -- most are
machines for students doing research with their professors and they
will change hands and configuration every semester. I don't know if
users will decide to shut them down overnight or what, so I was hoping
to find a blanket that could cover them all.

Ok, that is a nihtmare for anyone responsible for backups. I'd use the
aproach with a ping, set a high number of retries with long intervals in
between (something like 1 hour interval, 12 retries), NOT rerun failed
jobs, and set a maximum job wait time etc.

This should get you usable backups from time to time, probably during
the day, but whoever wants a work day without backup load can simply
leave the computer on during the night (which is, ecologically as well
as economically, not so good...).

You could refine this with tries to wake-on-lan the machines and turn
them off after backups, if you woke them up yourself. Just a nice
practice in writing a script in your favorite language :-)

...
Thanks for the help, Arno. Is there any chance of you taking a look
at my other post about the pool configuration? That's really the more
pressing now.

Sure, for money I do quite a lot :-)

I even do things for free, only I don't know which mail you refer to :-(

Arno

--
IT-Service Lehmann al < at > it...
Arno Lehmann http://www.its-lehmann.de

Post Bacula Waits for 40min When a Client is Down 
Hi,

On 6/8/2007 1:00 AM, Kyle Marsh wrote:
Hello again,

On 6/7/07, Arno Lehmann <al < at > it...> wrote:
Hi,

On 6/7/2007 11:05 PM, Kyle Marsh wrote:
...
That's an interesting suggestion -- I could have a Python script that
gets called and parses the config file to determine the full name from
the client name. That gets rid of the magic, at least (or replaces it
with worse magic, depending on your perspective).
Well, personally, I like bad magic, but not in production systems :-)

I was rather thinking of making the client resource available to to the
python event, but I have no idea how that would be done.

For my setups, the clients affected by possible non-availability are
only a minority, so I just added the line to the jobs as needed.

Unfortunately I'm not sure how the boxes are to be set up -- most are
machines for students doing research with their professors and they
will change hands and configuration every semester. I don't know if
users will decide to shut them down overnight or what, so I was hoping
to find a blanket that could cover them all.
Ok, that is a nihtmare for anyone responsible for backups. I'd use the
aproach with a ping, set a high number of retries with long intervals in
between (something like 1 hour interval, 12 retries), NOT rerun failed
jobs, and set a maximum job wait time etc.

So a RunBeforeJob to do the ping, then RescheduleOnError = yes,
RescheduleInterval = 1 hour, and RescheduleTimes = 12?

What do you
mean by not rerun failed jobs?

Sorry, my fault... I meant "Rerun failed levels" but it was probably too
late or too hot here to actually write what I meant :-)

By the way: this is set to no as the default. Also, you should check the
manual description and think about your objectives - setting it to yes
will make sure you get your full backups eventually, but in case the
users turn off the computers while jobs run you might end up wasting
lots of space for incomplete jobs. Not rerunning failed levels might
give you longer-than-wanted times between full backups but will keep the
jobs running more smoothly overall. In my experience, and so on :-)

Don't we want to rerun it if they miss
one?

Definitely.

Or does that mean if it gets rescheduled 12 times, not to
reschedule it for another 12 and just let tomorrow's try?

No, that's already accomplished by your numbers - 12 retries in 12
hours, and the give up and wait for the next round of scheduled jobs.

Also where
would I specify that?

In the job definition.

This should get you usable backups from time to time, probably during
the day, but whoever wants a work day without backup load can simply
leave the computer on during the night (which is, ecologically as well
as economically, not so good...).

You could refine this with tries to wake-on-lan the machines and turn
them off after backups, if you woke them up yourself. Just a nice
practice in writing a script in your favorite language :-)

Hmm...you intrigue me Smile. I'll have to see about this -- I'll let
you know if I do it and it works out well.

...
Thanks for the help, Arno. Is there any chance of you taking a look
at my other post about the pool configuration? That's really the more
pressing now.
Sure, for money I do quite a lot :-)

:-)

I even do things for free, only I don't know which mail you refer to :-(

I sent another mail right after this one asking how well the sample
configuration for pools of disk backups (here:
http://bacula.org/rel-manual/Automated_Disk_Backup.html) scales, since
the example only has one client and I fear horrible things if I simply
up the number of clients without changing anything else.

Hmm... I don't see that mail here. You might point me to a list archive
URL, or resend the mail.

Arno

Thanks a bunch for all your help,

~Kyle Marsh


Arno

--
IT-Service Lehmann al < at > it...
Arno Lehmann http://www.its-lehmann.de

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bacula-users mailing list
Bacula-users < at > li...
https://lists.sourceforge.net/lists/listinfo/bacula-users


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bacula-users mailing list
Bacula-users < at > li...
https://lists.sourceforge.net/lists/listinfo/bacula-users

--
IT-Service Lehmann al < at > it...
Arno Lehmann http://www.its-lehmann.de

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB