SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
Fatal error: Authorization key rejected by Storage daemon
Author Message
Post Fatal error: Authorization key rejected by Storage daemon 
Hi,

So having quite a frustrating issue with bacula on one of our CentOS hosts. Without pattern, I will receive this error on some days:

12-Oct 09:13 dir JobId 24468: Start Backup JobId 24468, Job=<job>
12-Oct 09:13 dir JobId 24468: Using Device "chg0_drive0"
12-Oct 09:46 fd JobId 24468: Fatal error: Authorization key rejected by Storage daemon.
Please see http://www.bacula.org/en/rel-manual/Bacula_Freque_Asked_Questi.html#SECTION003760000000000000000 for help.
12-Oct 09:46 fd JobId 24468: Fatal error: Failed to authenticate Storage daemon.
12-Oct 09:46 dir JobId 24468: Fatal error: Bad response to Storage command: wanted 2000 OK storage , got 2902 Bad storage

If the backup actually DOES work, it takes forever for the job to complete, even for very little data (connected to 1gb switch).

Brief Network Setup:

We utilize a lot of VLANs on our network so our bacula server is accessible by a set IP on each VLAN (ex. VLAN id= 25, bacula server listens on x.x.25.100 on that VLAN). All our servers run on XenServer hosts and we've never had an issue with bacula running until this particular host (1.5+ years).

Good Host:

Below is snippets of debug output from a working host with 0 issues-

fd: job.c:263-0 <dird: Hello Director dir calling
fd: job.c:279-0 Executing Hello command.
fd: cram-md5.c:73-0 send: auth cram-md5 <1146528753.1350051814@fd> ssl=0
fd: cram-md5.c:133-0 cram-get received: auth cram-md5 <1771224273.1350052057@dir> ssl=0
fd: cram-md5.c:152-0 sending resp to challenge: u8/6LAM+5A+Jc4/nzR/NiC
fd: job.c:263-0 <dird: JobId=24472 Job=<job name> SDid=1197 SDtime=1347377831 Authorization=<authorization key>

From there, it goes into-

fd: job.c:263-0 <dird: storage address=<FQDN of bacula director server> port=9103 ssl=0
fd: job.c:279-0 Executing storage command.
fd: job.c:1474-0 StorageCmd: storage address=<FQDN of bacula server>port=9103 ssl=0
fd: bsock.c:220-0 Current host[ipv4:10.10.x.100:9103] All host[ipv4:10.10.x.100:9103]
fd: bsock.c:154-0 who=Storage daemon host=<FQDN of bacula director server> port=9103
fd: cram-md5.c:133-0 cram-get received: auth cram-md5 <1575779100.1350052057@sd> ssl=0
fd: cram-md5.c:152-0 sending resp to challenge: B9/fK0xcX6E+K4+oZl/KPB
fd: cram-md5.c:80-0 send: auth cram-md5 <786312287.1350051814@ovpn1-fd> ssl=0
fd: cram-md5.c:99-0 Authenticate OK A9/4P+/TBjg+N8dRc4/aPD
fd: job.c:263-0 <dird: backup

And the job completes in adequate time.

Problem Host:

The output on the problem host is completely different. First, for connecting, it looks like this:

clcd-fd: job.c:263-0 <dird: storage address=<FQDN of bacula server> port=9103 ssl=0
clcd-fd: job.c:279-0 Executing storage command.
clcd-fd: job.c:1474-0 StorageCmd: storage address=<FQDN of bacula server> port=9103 ssl=0
clcd-fd: bsock.c:220-0 Current host[ipv4:10.10.5x.100:9103] All host[ipv4:10.10.5x.100:9103] host[ipv4:10.10.5x.100:65535] host[ipv4:10.10.7x.100:65535] host[ipv4:10.10.x.x:65535] host[ipv4:10.10.x.100:65535] host[ipv4:10.10.x.10:65535] host[ipv4:10.10.1x.100:65535] host[ipv4:10.10.2x.100:65535] host[ipv4:10.10.2x.100:65535] host[ipv4:10.10.5x.100:65535] host[ipv4:10.10.5x.100:65535] host[ipv4:10.10.5x.100:65535] host[ipv4:10.10.5x.100:65535]
clcd-fd: bsock.c:220-0 Current host[ipv4:10.10.5x.100:9103] All host[ipv4:10.10.5x.100:9103] host[ipv4:10.10.5x.100:9103] host[ipv4:10.10.7x.100:65535] host[ipv4:10.10.x.x:65535] host[ipv4:10.10.x.100:65535] host[ipv4:10.10.x.10065535] host[ipv4:10.10.1x.100:65535] host[ipv4:10.10.2x.100:65535] host[ipv4:10.10.2x.100:65535] host[ipv4:10.10.5x.100:65535] host[ipv4:10.10.5x.100:65535] host[ipv4:10.10.5x.100:65535] host[ipv4:10.10.5x.100:65535]
clcd-fd: bsock.c:220-0 Current host[ipv4:10.10.7x.100:9103] All host[ipv4:10.10.5x.100:9103] host[ipv4:10.10.5x.100:9103] host[ipv4:10.10.7x.100:9103] host[ipv4:10.10.x.x:65535] host[ipv4:10.10.x.100:65535] host[ipv4:10.10.x.100:5535] host[ipv4:10.10.1x.100:65535] host[ipv4:10.10.2x.100:65535] host[ipv4:10.10.2x.100:65535] host[ipv4:10.10.5x.100:65535] host[ipv4:10.10.5x.100:65535] host[ipv4:10.10.5x.100:65535] host[ipv4:10.10.5x.100:65535]
clcd-fd: bsock.c:220-0 Current host[ipv4:10.10.x.x:9103] All host[ipv4:10.10.5x.100:9103] host[ipv4:10.10.5x.100:9103] host[ipv4:10.10.7x.100:9103] host[ipv4:10.10.x.x:9103] host[ipv4:10.10.x.100:65535] host[ipv4:10.10.x.100:6555] host[ipv4:10.10.1x.100:65535] host[ipv4:10.10.2x.100:65535] host[ipv4:10.10.2x.100:65535] host[ipv4:10.10.5x.100:65535] host[ipv4:10.10.5x.100:65535] host[ipv4:10.10.5x.100:65535] host[ipv4:10.10.5x.100:65535]

I changed all the IP's for security purposes, but hopefully you get the idea.

Once it connects to the right IP, it passes authorization-

fd: job.c:263-0 <dird: storage address=<FQDN of bacula server> port=9103 ssl=0
fd: job.c:279-0 Executing storage command.
fd: job.c:1474-0 StorageCmd: storage address=<FQDN of bacula server> port=9103 ssl=0
fd: bsock.c:220-0 Current host[ipv4:10.10.1x.100:9103] All host[ipv4:10.10.1x.100:9103]
fd: bsock.c:154-0 who=Storage daemon host=<FQDN of bacula server> port=9103
fd: cram-md5.c:133-0 cram-get received: auth cram-md5 <1356737337.1350055466@chef-sd> ssl=0
fd: cram-md5.c:152-0 sending resp to challenge: I//Wz5ZIkS9l/9+sE8Q8cC
fd: cram-md5.c:80-0 send: auth cram-md5 <57904060.1350055466@clcd-fd> ssl=0
fd: cram-md5.c:99-0 Authenticate OK gkUm/T1qS/cEuT/4+4RasB
fd: job.c:263-0 <dird: backup


Once it finally gets through that stage of the process, I get a ton of these messages (there's a lot more than shown below)-

fd: heartbeat.c:96-0 wait_intr=0 stop=0
fd: heartbeat.c:96-0 wait_intr=0 stop=0
fd: heartbeat.c:96-0 wait_intr=0 stop=0
fd: heartbeat.c:96-0 wait_intr=0 stop=0
fd: heartbeat.c:96-0 wait_intr=0 stop=0
fd: heartbeat.c:96-0 wait_intr=0 stop=0
fd: heartbeat.c:96-0 wait_intr=0 stop=0
fd: heartbeat.c:96-0 wait_intr=0 stop=0
fd: heartbeat.c:96-0 wait_intr=0 stop=0
fd: heartbeat.c:96-0 wait_intr=0 stop=0
fd: heartbeat.c:96-0 wait_intr=0 stop=0
fd: heartbeat.c:96-0 wait_intr=0 stop=0

Then eventually -

fd: heartbeat.c:143-0 Send kill to heartbeat id
fd: backup.c:211-0 end blast_data ok=1
fd: job.c:282-0 Quit command loop. Canceled=0
fd: job.c:388-0 Calling term_find_files
fd: job.c:391-0 Done with term_find_files
fd: jcr.c:181-0 write_last_jobs seek to 192
fd: job.c:393-0 Done with free_jcr

So, after comparing the two, I'm not sure why the problem host tries to connect to every IP on each VLAN that the bacula server runs on, instead of just the appropriate one like the working hosts.

We don't have heartbeat configured on any clients. We use all the default ports for bacula.

I wasn't able to get debug output for a failure (shocking), but with what I see so far, something is really odd with the networking on the problem host.

We have numerous hosts on this VLAN that have no problem connecting to the bacula server. All other VLANs work fine as well.

Anyone have any ideas? TIA

View user's profile Send private message
Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB