 |
Page 1 of 1
|
| Author |
Message |
Steven Schoch
Guest
|
 Estimate timeout
It was working for several days, then all of a sudden it stopped and hasn't
worked since.
Amcheck works fine, but amdump doesn't.
Amdump is run on homer, the system with the tape drive. Homer is a RedHat
Enterprise Linux system with amanda version 2.4.4p1. The system that fails
to dump is marge, a FreeBSD system with amanda version 2.4.4p2.
The important lines from amanda.conf:
----
etimeout 1800 # number of seconds per filesystem for estimates.
#etimeout -600 # total number of seconds for estimates.
# a positive number will be multiplied by the number of filesystems on
# each host; a negative number will be taken as an absolute total time-out.
# The default is 5 minutes per filesystem.
----
From disklist:
----
marge /var comp-user
marge /usr comp-root
marge / comp-root
----
From crontab:
----
45 0 * * 2-6 /usr/sbin/amdump OurDump
----
In /tmp/amanda on marge, these lines appear in
amandad.20040609004501000.debug:
----
amandad: debug 1 pid 22611 ruid 1001 euid 1001: start at Wed Jun 9 00:45:01
200
4
amandad: version 2.4.4p2
amandad: build: VERSION="Amanda-2.4.4p2"
...
amandad: time 0.003: got packet:
--------
Amanda 2.4 REQ HANDLE 001-389B0608 SEQ 1086767104
SECURITY USER amanda
SERVICE sendsize
...
amandad: time 0.004: sending ack:
----
Amanda 2.4 ACK HANDLE 001-389B0608 SEQ 1086767104
...
amandad: time 0.009: amandahosts security check passed
amandad: time 0.009: running service "/usr/local/libexec/sendsize"
amandad: time 447.906: sending REP packet:
----
Amanda 2.4 REP HANDLE 001-389B0608 SEQ 1086767104
OPTIONS features=fffffeff9ffe0f;
/var 0 SIZE 11520
/var 1 SIZE 1580
/usr 0 SIZE 1166599
/usr 1 SIZE 18710
/ 0 SIZE 39571
/ 1 SIZE 381
----
amandad: time 457.910: dgram_recv: timeout after 10 seconds
amandad: time 457.910: waiting for ack: timeout, retrying
amandad: time 467.920: dgram_recv: timeout after 10 seconds
amandad: time 467.920: waiting for ack: timeout, retrying
amandad: time 477.930: dgram_recv: timeout after 10 seconds
amandad: time 477.930: waiting for ack: timeout, retrying
amandad: time 487.940: dgram_recv: timeout after 10 seconds
amandad: time 487.941: waiting for ack: timeout, retrying
amandad: time 497.950: dgram_recv: timeout after 10 seconds
amandad: time 497.951: waiting for ack: timeout, giving up!
amandad: time 497.951: pid 22611 finish time Wed Jun 9 00:53:19 2004
On homer, in amdump.1 these lines:
----
amdump: start at Wed Jun 9 00:45:01 PDT 2004
amdump: datestamp 20040609
planner: pid 9813 executable /usr/lib/amanda/planner version 2.4.4p1
planner: build: VERSION="Amanda-2.4.4p1"
...
setup_estimate: marge:/var: command 0, options:
last_level 0 next_level0 21 level_days 0
getting estimates 0 (11503) 1 (0) -1 (-1)
planner: time 0.125: setting up estimates for marge:/usr
setup_estimate: marge:/usr: command 0, options:
last_level 0 next_level0 21 level_days 0
getting estimates 0 (1163201) 1 (0) -1 (-1)
planner: time 0.135: setting up estimates for marge:/
setup_estimate: marge:/: command 0, options:
last_level 0 next_level0 21 level_days 0
getting estimates 0 (39486) 1 (0) -1 (-1)
...
planner: time 223.483: got result for host homer disk /home: 0 -> 4642543K,
4 ->
899568K, -1 -> -1K
planner: time 10801.886: error result for host marge disk /: Estimate
timeout fr
om marge
planner: time 10801.886: error result for host marge disk /usr: Estimate
timeout
from marge
planner: time 10801.886: error result for host marge disk /var: Estimate
timeout
from marge
planner: time 10801.886: getting estimates took 10801.690 secs
It looks like homer was waiting a suffcient time for marge to reply, but the
reply was dropped.
Marge and homer are on the same switch.
--
Steve
_________________________________________________________________
Get fast, reliable Internet access with MSN 9 Dial-up – now 3 months FREE!
http://join.msn.click-url.com/go/onm00200361ave/direct/01/
|
| Wed Jun 09, 2004 9:00 am |
|
 |
Paul Bijnens
Guest
|
 Estimate timeout
Steven Schoch wrote:
It was working for several days, then all of a sudden it stopped and
hasn't worked since.
First thing to ask is: what did change since then?
Installed something? Reconfigured something? Rebooted system?
amandad: time 447.906: sending REP packet:
It took less than 550 seconds to estimate all of it.
planner: time 10801.886: error result for host marge disk /: Estimate
and server timed out after 3 DLE's * 2 lvls * 1800 sec = 10800 seconds
It looks like homer was waiting a suffcient time for marge to reply, but
the reply was dropped.
Yes, indeed.
Marge and homer are on the same switch.
Are there other clients besides marge?
Is there a local firewall activated on homer?
Try to find out where the UDP packet got dropped, using tcpdump or
etherreal or other network analyzer on homer and marge.
--
Paul Bijnens, Xplanation Tel +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM Fax +32 16 397.512
http://www.xplanation.com/ email: Paul.Bijnens < at > xplanation.com
***********************************************************************
* I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, *
* PF4, F20, ^X^X,  :  , KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, *
* kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ... "Are you sure?" ... YES ... Phew ... I'm out *
***********************************************************************
|
| Wed Jun 09, 2004 9:14 am |
|
 |
Steven Schoch
Guest
|
 Estimate timeout
on Wed, 09 Jun 2004 Paul Bijnens wrote:
Try to find out where the UDP packet got dropped, using tcpdump or
etherreal or other network analyzer on homer and marge.
Now we're getting somewhere. The tcpdump shows this:
14:54:28.697197 homer.858 > marge.amanda: udp 117 (DF)
14:54:29.176236 marge.amanda > homer.858: udp 50
14:54:29.444159 marge.amanda > homer..858: udp 83
14:54:29.444563 homer.858 > marge.amanda: udp 50 (DF)
14:54:29.445650 homer.858 > marge.amanda: udp 531 (DF)
14:54:29.525614 marge.amanda > homer.858: udp 50
15:01:56.739172 marge.amanda > homer.858: udp 184
15:01:56.739818 homer > marge: icmp: host homer unreachable - admin
prohibited [tos 0xc0]
15:02:06.743312 marge.amanda > homer.858: udp 184
15:02:06.743992 homer > marge: icmp: host homer unreachable - admin
prohibited [tos 0xc0]
My guess is that ICMP message is something to do with a firewall.
--
Steve
_________________________________________________________________
MSN 9 Dial-up Internet Access fights spam and pop-ups – now 3 months FREE!
http://join.msn.click-url.com/go/onm00200361ave/direct/01/
|
| Wed Jun 09, 2004 2:29 pm |
|
 |
Paul Bijnens
Guest
|
 Estimate timeout
Steven Schoch wrote:
on Wed, 09 Jun 2004 Paul Bijnens wrote:
Try to find out where the UDP packet got dropped, using tcpdump or
etherreal or other network analyzer on homer and marge.
Now we're getting somewhere. The tcpdump shows this:
15:01:56.739818 homer > marge: icmp: host homer unreachable - admin
prohibited [tos 0xc0]
My guess is that ICMP message is something to do with a firewall.
"admin prohibited" is definately a result of iptables filtering.
Have a close look in homer. Execute "iptables -L".
Maybe the solution is loading the amanda iptables module,
if that is available on the machine.
--
Paul Bijnens, Xplanation Tel +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM Fax +32 16 397.512
http://www.xplanation.com/ email: Paul.Bijnens < at > xplanation.com
***********************************************************************
* I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, *
* PF4, F20, ^X^X,  :  , KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, *
* kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ... "Are you sure?" ... YES ... Phew ... I'm out *
***********************************************************************
|
| Wed Jun 09, 2004 11:37 pm |
|
 |
Joshua Baker-LePain
Guest
|
 Estimate timeout
On Thu, 10 Jun 2004 at 9:31am, Paul Bijnens wrote
Steven Schoch wrote:
Now we're getting somewhere. The tcpdump shows this:
15:01:56.739818 homer > marge: icmp: host homer unreachable - admin
prohibited [tos 0xc0]
My guess is that ICMP message is something to do with a firewall.
"admin prohibited" is definately a result of iptables filtering.
Have a close look in homer. Execute "iptables -L".
Maybe the solution is loading the amanda iptables module,
if that is available on the machine.
I'd be interested to see if that fixes it. My amanda server which runs
the nightlies of the (small) home partitions has been at RH9 for a while,
and has this as the only rule it needed to get amdump working:
# If we've an established session, well, okay
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
I recently moved my other amanda server (which backs up my 4.5TB of RAID
space) to RH9. The first few nights, most of the clients were failing
with estimate timeouts. But when I tested during the day (with small
partitions), everything worked. I finally decided that the estimates on
the big partitions were taking long enough that the above rule was timing
out. I couldn't afford another night of the backups failing, so I didn't
try loading the amanda module -- I just added rules to allow incoming
UDP traffic on priviledged ports from the clients.
--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University
|
| Thu Jun 10, 2004 3:29 am |
|
 |
Paul Bijnens
Guest
|
 Estimate timeout
Joshua Baker-LePain wrote:
On Thu, 10 Jun 2004 at 9:31am, Paul Bijnens wrote
Steven Schoch wrote:
Now we're getting somewhere. The tcpdump shows this:
15:01:56.739818 homer > marge: icmp: host homer unreachable - admin
prohibited [tos 0xc0]
My guess is that ICMP message is something to do with a firewall.
"admin prohibited" is definately a result of iptables filtering.
Have a close look in homer. Execute "iptables -L".
Maybe the solution is loading the amanda iptables module,
if that is available on the machine.
I'd be interested to see if that fixes it. My amanda server which runs
the nightlies of the (small) home partitions has been at RH9 for a while,
and has this as the only rule it needed to get amdump working:
# If we've an established session, well, okay
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
I recently moved my other amanda server (which backs up my 4.5TB of RAID
space) to RH9. The first few nights, most of the clients were failing
with estimate timeouts. But when I tested during the day (with small
partitions), everything worked. I finally decided that the estimates on
the big partitions were taking long enough that the above rule was timing
out. I couldn't afford another night of the backups failing, so I didn't
try loading the amanda module -- I just added rules to allow incoming
UDP traffic on priviledged ports from the clients.
I have been thinking about this problem, and, without any real testing
to backup my hypothesis, I believe the problem lies in the default
timeout in iptables for UDP traffic, as you decided too.
For TCP traffic, once a packet is replied, the timeout becomes very
large (5 days or so I believe). But for UDP, which is a conectionless
protocol the timeout is 180 seconds (I believe).
After this timeout the connection tracking drops the rule.
In my config, the estimates of the clients in the DMZ all take less than
2 minutes. And this works fine.
That means that the real solution is to compile amanda with a dedicated
udp range, and add that range to the firewall iptables.
--
Paul Bijnens, Xplanation Tel +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM Fax +32 16 397.512
http://www.xplanation.com/ email: Paul.Bijnens < at > xplanation.com
***********************************************************************
* I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, *
* PF4, F20, ^X^X,  :  , KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, *
* kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ... "Are you sure?" ... YES ... Phew ... I'm out *
***********************************************************************
|
| Thu Jun 10, 2004 3:42 am |
|
 |
Joshua Baker-LePain
Guest
|
 Estimate timeout
On Thu, 10 Jun 2004 at 1:40pm, Paul Bijnens wrote
I have been thinking about this problem, and, without any real testing
to backup my hypothesis, I believe the problem lies in the default
timeout in iptables for UDP traffic, as you decided too.
For TCP traffic, once a packet is replied, the timeout becomes very
large (5 days or so I believe). But for UDP, which is a conectionless
protocol the timeout is 180 seconds (I believe).
After this timeout the connection tracking drops the rule.
Is this true even with ip_conntrack_amanda loaded?
--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University
|
| Thu Jun 10, 2004 4:01 am |
|
 |
Paul Bijnens
Guest
|
 Estimate timeout
Joshua Baker-LePain wrote:
On Thu, 10 Jun 2004 at 1:40pm, Paul Bijnens wrote
I have been thinking about this problem, and, without any real testing
to backup my hypothesis, I believe the problem lies in the default
timeout in iptables for UDP traffic, as you decided too.
For TCP traffic, once a packet is replied, the timeout becomes very
large (5 days or so I believe). But for UDP, which is a conectionless
protocol the timeout is 180 seconds (I believe).
After this timeout the connection tracking drops the rule.
Is this true even with ip_conntrack_amanda loaded?
I should have a look at the source code, or find a detailed doc that
explains it, to find out.
Anyway that module should somehow know the etimeout parameter
of amanda.conf, which of course it does not know, or otherwise allow
a really really large timeout, like a few hours. Or should be tuneable
somehow (in the amanda-tradition that could be hardcoded at compile time).
--
Paul Bijnens, Xplanation Tel +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM Fax +32 16 397.512
http://www.xplanation.com/ email: Paul.Bijnens < at > xplanation.com
***********************************************************************
* I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, *
* PF4, F20, ^X^X,  :  , KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, *
* kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ... "Are you sure?" ... YES ... Phew ... I'm out *
***********************************************************************
|
| Thu Jun 10, 2004 4:26 am |
|
 |
Joshua Baker-LePain
Guest
|
 Estimate timeout
On Thu, 10 Jun 2004 at 2:11pm, Paul Bijnens wrote
Is this true even with ip_conntrack_amanda loaded?
I should have a look at the source code, or find a detailed doc that
explains it, to find out.
Anyway that module should somehow know the etimeout parameter
of amanda.conf, which of course it does not know, or otherwise allow
a really really large timeout, like a few hours. Or should be tuneable
somehow (in the amanda-tradition that could be hardcoded at compile time).
It seems to be tuneable. From the header of the source code:
* Module load syntax:
* insmod ip_conntrack_amanda.o [master_timeout=n]
*
* Where master_timeout is the timeout (in seconds) of the master
* connection (port 10080). This defaults to 5 minutes but if
* your clients take longer than 5 minutes to do their work
* before getting back to the Amanda server, you can increase
* this value.
I should test it one of these nights...
--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University
|
| Thu Jun 10, 2004 5:02 am |
|
 |
Paul Bijnens
Guest
|
 Estimate timeout
Joshua Baker-LePain wrote:
It seems to be tuneable. From the header of the source code:
* Module load syntax:
* insmod ip_conntrack_amanda.o [master_timeout=n]
*
* Where master_timeout is the timeout (in seconds) of the master
* connection (port 10080). This defaults to 5 minutes but if
* your clients take longer than 5 minutes to do their work
* before getting back to the Amanda server, you can increase
* this value.
I should test it one of these nights...
Wow! Learning something new every day!
--
Paul Bijnens, Xplanation Tel +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM Fax +32 16 397.512
http://www.xplanation.com/ email: Paul.Bijnens < at > xplanation.com
***********************************************************************
* I think I've got the hang of it now: exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit, ZZ, :q, :q!, M-Z, ^X^C, logoff, logout, close, bye, /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt, abort, hangup, *
* PF4, F20, ^X^X,  :  , KJOB, F14-f-e, F8-e, kill -1 $$, shutdown, *
* kill -9 1, Alt-F4, Ctrl-Alt-Del, AltGr-NumLock, Stop-A, ... *
* ... "Are you sure?" ... YES ... Phew ... I'm out *
***********************************************************************
|
| Thu Jun 10, 2004 5:04 am |
|
 |
Gene Heskett
Guest
|
 Estimate timeout
On Thursday 10 June 2004 07:59, Joshua Baker-LePain wrote:
On Thu, 10 Jun 2004 at 1:40pm, Paul Bijnens wrote
I have been thinking about this problem, and, without any real
testing to backup my hypothesis, I believe the problem lies in the
default timeout in iptables for UDP traffic, as you decided too.
For TCP traffic, once a packet is replied, the timeout becomes
very large (5 days or so I believe). But for UDP, which is a
conectionless protocol the timeout is 180 seconds (I believe).
After this timeout the connection tracking drops the rule.
Is this true even with ip_conntrack_amanda loaded?
I wasn't even aware of such a module, and got surprised by the output
of a locate!
Its part of the kernel's netfilter options since back in 2.4.22 or
earlier days, so if he doesn't have the executable module, he may
have to rebuild his kernel to get it.
I hadn't worried about it here since everything I backup with amanda
is inside the firewall, or on the firewall itself, but iptables sits
between the 2 NICS in the firewall that seperate inside from outside
stuffs.
--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.23% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.
|
| Thu Jun 10, 2004 6:24 am |
|
 |
Steven Schoch
Guest
|
 Estimate timeout
Joshua Baker-LePain wrote:
"admin prohibited" is definately a result of iptables filtering.
Have a close look in homer. Execute "iptables -L".
Maybe the solution is loading the amanda iptables module,
if that is available on the machine.
I'd be interested to see if that fixes it.
The following line was added to /etc/sysconfig/iptables:
-A RH-Firewall-1-INPUT -m state --state NEW -m udp -p udp -s XX.XX.XX.0/24
--sport 10080 -j ACCEPT
...where XX.XX.XX is the IP address of our local 'external' network, on
which both homer and marge are located.
The problem has been solved.
--
Steve
_________________________________________________________________
Watch the online reality show Mixed Messages with a friend and enter to win
a trip to NY
http://www.msnmessenger-download.click-url.com/go/onm00200497ave/direct/01/
|
| Fri Jun 11, 2004 10:10 am |
|
 |
|
|
The time now is Thu May 24, 2012 6:37 am | All times are GMT - 8 Hours
|
Page 1 of 1
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|