SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
HELP!!!!
Author Message
Post HELP!!!! 
Hi All,

The situation is getting crazy. Last night 17% of our backups failed with error code 47. It happened on only 6 of the 58 media servers. All the jobs were trying to backup up to one of two of the four VTL libraries. Looking at the <16> and <32> errors in /usr/openv/netbackup/logs I see CORBA errors on 3 of the six and Robot Failures on the other three. While we have many 47 errors on the weekends, this is a first of this magnitude for a week day. The only change I am aware of is: last week we increased the memory of 5 of the 6 media servers from 12GB to 32GB. Is it possible to have TOO much memory?

Environment:
RedHat Linux 64bit running 32Bit NetBackup 6.5.6
4 EMC VTL Libraries (sorry don’t know model #) 164 drives configured on each.
The failing clients are both UNIX and Windoze with one Oracle backup failure.

OH, and they only seem to happen between 23:00 and 04:00 (approximately)

ANY suggestions would be greatly appreciated.

Regards,

Patrick Whelan
VERITAS Certified NetBackup Support Engineer for UNIX.
VERITAS Certified NetBackup Support Engineer for Windows.

netbackup < at > whelan-consulting.co.uk ([email]netbackup < at > whelan-consulting.co.uk[/email])

View user's profile Send private message
Post HELP!!!! 
I have cleaned up some of our “DNS” problems, although they were not the clients in question, and will see how it goes tonight. It also turns out that the medias servers had “files dns” in /etc/nsswitch.conf whereas the master had “dns host”. I’ve changed the master to match. They also changed the NIC cards on the master to 1GB instead of auto negotiate. So we will see what happens tonight. If it is a problem with hitting the DNS servers too hard it should get worse tonight. J

Thank all of you for your suggestions.

Regards,

Patrick Whelan
VERITAS Certified NetBackup Support Engineer for UNIX.
VERITAS Certified NetBackup Support Engineer for Windows.

netbackup < at > whelan-consulting.co.uk ([email]netbackup < at > whelan-consulting.co.uk[/email])



From: Bahnmiller, Bryan E. [mailto:bbahnmiller < at > dtcc.com]
Sent: 27 September 2011 15:44
To: Patrick
Subject: RE: [Veritas-bu] HELP!!!!



Patrick,

That is strange. I’m wondering if something else is going on. I have seen situations where you beef up your environment and it introduces you to other problems that used to be masked by a limited environment. With that many drives and that much memory, you are going to be able to queue up and run more jobs. If you are creating jobs faster, I wonder if you are running into name resolution problems now. Can you find out how loaded your DNS server is during the same time frame? I have seen where one of the older NBU environments I had was pounding the DNS servers to the point that they were running 100% cpu. I thought 6.x was much better at this, but it could possibly be related to the way your Linux servers are doing name caching and how hard they hit the DNS servers.

One other possibility would be the VTL. I’ve had better luck with the newer DataDomain’s from EMC than their older “DL’s”. It may be possible that they are slow in responding to requests when they get busy, but I wouldn’t think those would show up as error 47’s.

Does /var/log/messages show anything around the same time frame?

Bryan

From: veritas-bu-bounces < at > mailman.eng.auburn.edu ([email]veritas-bu-bounces < at > mailman.eng.auburn.edu[/email]) [mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu] ([email][mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu][/email]) On Behalf Of Patrick
Sent: Tuesday, September 27, 2011 4:17 AM
To: veritas-bu < at > mailman.eng.auburn.edu ([email]veritas-bu < at > mailman.eng.auburn.edu[/email])
Subject: [Veritas-bu] HELP!!!!



Hi All,

The situation is getting crazy. Last night 17% of our backups failed with error code 47. It happened on only 6 of the 58 media servers. All the jobs were trying to backup up to one of two of the four VTL libraries. Looking at the <16> and <32> errors in /usr/openv/netbackup/logs I see CORBA errors on 3 of the six and Robot Failures on the other three. While we have many 47 errors on the weekends, this is a first of this magnitude for a week day. The only change I am aware of is: last week we increased the memory of 5 of the 6 media servers from 12GB to 32GB. Is it possible to have TOO much memory?

Environment:
RedHat Linux 64bit running 32Bit NetBackup 6.5.6
4 EMC VTL Libraries (sorry don’t know model #) 164 drives configured on each.
The failing clients are both UNIX and Windoze with one Oracle backup failure.

OH, and they only seem to happen between 23:00 and 04:00 (approximately)

ANY suggestions would be greatly appreciated.

Regards,

Patrick Whelan
VERITAS Certified NetBackup Support Engineer for UNIX.
VERITAS Certified NetBackup Support Engineer for Windows.

netbackup < at > whelan-consulting.co.uk ([email]netbackup < at > whelan-consulting.co.uk[/email])



_____________________________________________________________
DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

View user's profile Send private message
Post HELP!!!! 
Would it not be better to hard code the nic speed than auto/auto ?
Simon

From: veritas-bu-bounces < at > mailman.eng.auburn.edu [mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu] On Behalf Of Patrick
Sent: 27 September 2011 15:17
To: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] HELP!!!!




I have cleaned up some of our “DNS” problems, although they were not the clients in question, and will see how it goes tonight. It also turns out that the medias servers had “files dns” in /etc/nsswitch.conf whereas the master had “dns host”. I’ve changed the master to match. They also changed the NIC cards on the master to 1GB instead of auto negotiate. So we will see what happens tonight. If it is a problem with hitting the DNS servers too hard it should get worse tonight. J

Thank all of you for your suggestions.

Regards,

Patrick Whelan
VERITAS Certified NetBackup Support Engineer for UNIX.
VERITAS Certified NetBackup Support Engineer for Windows.

netbackup < at > whelan-consulting.co.uk ([email]netbackup < at > whelan-consulting.co.uk[/email])



From: Bahnmiller, Bryan E. [mailto:bbahnmiller < at > dtcc.com]
Sent: 27 September 2011 15:44
To: Patrick
Subject: RE: [Veritas-bu] HELP!!!!



Patrick,

That is strange. I’m wondering if something else is going on. I have seen situations where you beef up your environment and it introduces you to other problems that used to be masked by a limited environment. With that many drives and that much memory, you are going to be able to queue up and run more jobs. If you are creating jobs faster, I wonder if you are running into name resolution problems now. Can you find out how loaded your DNS server is during the same time frame? I have seen where one of the older NBU environments I had was pounding the DNS servers to the point that they were running 100% cpu. I thought 6.x was much better at this, but it could possibly be related to the way your Linux servers are doing name caching and how hard they hit the DNS servers.

One other possibility would be the VTL. I’ve had better luck with the newer DataDomain’s from EMC than their older “DL’s”. It may be possible that they are slow in responding to requests when they get busy, but I wouldn’t think those would show up as error 47’s.

Does /var/log/messages show anything around the same time frame?

Bryan

From: veritas-bu-bounces < at > mailman.eng.auburn.edu ([email]veritas-bu-bounces < at > mailman.eng.auburn.edu[/email]) [mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu] ([email][mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu][/email]) On Behalf Of Patrick
Sent: Tuesday, September 27, 2011 4:17 AM
To: veritas-bu < at > mailman.eng.auburn.edu ([email]veritas-bu < at > mailman.eng.auburn.edu[/email])
Subject: [Veritas-bu] HELP!!!!



Hi All,

The situation is getting crazy. Last night 17% of our backups failed with error code 47. It happened on only 6 of the 58 media servers. All the jobs were trying to backup up to one of two of the four VTL libraries. Looking at the <16> and <32> errors in /usr/openv/netbackup/logs I see CORBA errors on 3 of the six and Robot Failures on the other three. While we have many 47 errors on the weekends, this is a first of this magnitude for a week day. The only change I am aware of is: last week we increased the memory of 5 of the 6 media servers from 12GB to 32GB. Is it possible to have TOO much memory?

Environment:
RedHat Linux 64bit running 32Bit NetBackup 6.5.6
4 EMC VTL Libraries (sorry don’t know model #) 164 drives configured on each.
The failing clients are both UNIX and Windoze with one Oracle backup failure.

OH, and they only seem to happen between 23:00 and 04:00 (approximately)

ANY suggestions would be greatly appreciated.

Regards,

Patrick Whelan
VERITAS Certified NetBackup Support Engineer for UNIX.
VERITAS Certified NetBackup Support Engineer for Windows.

netbackup < at > whelan-consulting.co.uk ([email]netbackup < at > whelan-consulting.co.uk[/email])



_____________________________________________________________
DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

Post HELP!!!! 
Only on the older 10/100 NIC cards. They tightened up the RFC when gigabit came around. If it don't autoneg, something is wrong.





From: "WEAVER, Simon (external)" <simon.weaver < at > astrium.eads.net> To: "Patrick" <netbackup < at > whelan-consulting.co.uk>, <veritas-bu < at > mailman.eng.auburn.edu> Date: 09/27/2011 03:12 PM Subject: Re: [Veritas-bu] HELP!!!! Sent by: veritas-bu-bounces < at > mailman.eng.auburn.edu



Would it not be better to hard code the nic speed than auto/auto ?
Simon

From: veritas-bu-bounces < at > mailman.eng.auburn.edu [mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu ([email]veritas-bu-bounces < at > mailman.eng.auburn.edu[/email])] On Behalf Of Patrick
Sent:
27 September 2011 15:17
To:
veritas-bu < at > mailman.eng.auburn.edu
Subject:
Re: [Veritas-bu] HELP!!!!

I have cleaned up some of our “DNS” problems, although they were not the clients in question, and will see how it goes tonight. It also turns out that the medias servers had “files dns” in /etc/nsswitch.conf whereas the master had “dns host”. I’ve changed the master to match. They also changed the NIC cards on the master to 1GB instead of auto negotiate. So we will see what happens tonight. If it is a problem with hitting the DNS servers too hard it should get worse tonight. J

Thank all of you for your suggestions.

Regards,

Patrick Whelan
VERITAS Certified NetBackup Support Engineer for UNIX.
VERITAS Certified NetBackup Support Engineer for Windows.

netbackup < at > whelan-consulting.co.uk ([email]netbackup < at > whelan-consulting.co.uk[/email])


From: Bahnmiller, Bryan E. [[url=Tahoma]mailto:bbahnmiller < at > dtcc.com[/url]]
Sent:
27 September 2011 15:44
To:
Patrick
Subject:
RE: [Veritas-bu] HELP!!!!

Patrick,

That is strange. I’m wondering if something else is going on. I have seen situations where you beef up your environment and it introduces you to other problems that used to be masked by a limited environment. With that many drives and that much memory, you are going to be able to queue up and run more jobs. If you are creating jobs faster, I wonder if you are running into name resolution problems now. Can you find out how loaded your DNS server is during the same time frame? I have seen where one of the older NBU environments I had was pounding the DNS servers to the point that they were running 100% cpu. I thought 6.x was much better at this, but it could possibly be related to the way your Linux servers are doing name caching and how hard they hit the DNS servers.

One other possibility would be the VTL. I’ve had better luck with the newer DataDomain’s from EMC than their older “DL’s”. It may be possible that they are slow in responding to requests when they get busy, but I wouldn’t think those would show up as error 47’s.

Does /var/log/messages show anything around the same time frame?

Bryan

From: veritas-bu-bounces < at > mailman.eng.auburn.edu ([email]veritas-bu-bounces < at > mailman.eng.auburn.edu[/email]) [mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu] ([email][mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu][/email]) On Behalf Of Patrick
Sent:
Tuesday, September 27, 2011 4:17 AM
To:
veritas-bu < at > mailman.eng.auburn.edu ([email]veritas-bu < at > mailman.eng.auburn.edu[/email])
Subject:
[Veritas-bu] HELP!!!!

Hi All,

The situation is getting crazy. Last night 17% of our backups failed with error code 47. It happened on only 6 of the 58 media servers. All the jobs were trying to backup up to one of two of the four VTL libraries. Looking at the <16> and <32> errors in /usr/openv/netbackup/logs I see CORBA errors on 3 of the six and Robot Failures on the other three. While we have many 47 errors on the weekends, this is a first of this magnitude for a week day. The only change I am aware of is: last week we increased the memory of 5 of the 6 media servers from 12GB to 32GB. Is it possible to have TOO much memory?

Environment:
RedHat Linux 64bit running 32Bit NetBackup 6.5.6
4 EMC VTL Libraries (sorry don’t know model #) 164 drives configured on each.
The failing clients are both UNIX and Windoze with one Oracle backup failure.

OH, and they only seem to happen between 23:00 and 04:00 (approximately)

ANY suggestions would be greatly appreciated.

Regards,

Patrick Whelan
VERITAS Certified NetBackup Support Engineer for UNIX.
VERITAS Certified NetBackup Support Engineer for Windows.

netbackup < at > whelan-consulting.co.uk ([email]netbackup < at > whelan-consulting.co.uk[/email])



_____________________________________________________________
DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email._______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

Post HELP!!!! 
My understanding is you cannot get 1000 FDX without auto/auto. We keep all our stuff on automatic.
------------------

Jim VandeVegt | Technical Integrator, ETG
Physicians Mutual | 2600 Dodge Street | Omaha, NE 68131
402.930.2649 | Jim.VandeVegt < at > PhysiciansMutual.com ([email]Jim.VandeVegt < at > PhysiciansMutual.com[/email])

Insurance for all of us.™
health | life | retirement

From: veritas-bu-bounces < at > mailman.eng.auburn.edu [veritas-bu-bounces < at > mailman.eng.auburn.edu] On Behalf Of scott.george < at > parker.com [scott.george < at > parker.com]
Sent: Tuesday, September 27, 2011 14:19
To: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] HELP!!!!



Only on the older 10/100 NIC cards. They tightened up the RFC when gigabit came around. If it don't autoneg, something is wrong.





From: "WEAVER, Simon (external)" <simon.weaver < at > astrium.eads.net> To: "Patrick" <netbackup < at > whelan-consulting.co.uk>, <veritas-bu < at > mailman.eng.auburn.edu> Date: 09/27/2011 03:12 PM Subject: Re: [Veritas-bu] HELP!!!! Sent by: veritas-bu-bounces < at > mailman.eng.auburn.edu



Would it not be better to hard code the nic speed than auto/auto ?
Simon

From: veritas-bu-bounces < at > mailman.eng.auburn.edu [mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu ([email]veritas-bu-bounces < at > mailman.eng.auburn.edu[/email])] On Behalf Of Patrick
Sent:
27 September 2011 15:17
To:
veritas-bu < at > mailman.eng.auburn.edu
Subject:
Re: [Veritas-bu] HELP!!!!

I have cleaned up some of our “DNS” problems, although they were not the clients in question, and will see how it goes tonight. It also turns out that the medias servers had “files dns” in /etc/nsswitch.conf whereas the master had “dns host”. I’ve changed the master to match. They also changed the NIC cards on the master to 1GB instead of auto negotiate. So we will see what happens tonight. If it is a problem with hitting the DNS servers too hard it should get worse tonight. J

Thank all of you for your suggestions.

Regards,

Patrick Whelan
VERITAS Certified NetBackup Support Engineer for UNIX.
VERITAS Certified NetBackup Support Engineer for Windows.

netbackup < at > whelan-consulting.co.uk ([email]netbackup < at > whelan-consulting.co.uk[/email])


From: Bahnmiller, Bryan E. [mailto:bbahnmiller < at > dtcc.com ([email]bbahnmiller < at > dtcc.com[/email])]
Sent:
27 September 2011 15:44
To:
Patrick
Subject:
RE: [Veritas-bu] HELP!!!!

Patrick,

That is strange. I’m wondering if something else is going on. I have seen situations where you beef up your environment and it introduces you to other problems that used to be masked by a limited environment. With that many drives and that much memory, you are going to be able to queue up and run more jobs. If you are creating jobs faster, I wonder if you are running into name resolution problems now. Can you find out how loaded your DNS server is during the same time frame? I have seen where one of the older NBU environments I had was pounding the DNS servers to the point that they were running 100% cpu. I thought 6.x was much better at this, but it could possibly be related to the way your Linux servers are doing name caching and how hard they hit the DNS servers.

One other possibility would be the VTL. I’ve had better luck with the newer DataDomain’s from EMC than their older “DL’s”. It may be possible that they are slow in responding to requests when they get busy, but I wouldn’t think those would show up as error 47’s.

Does /var/log/messages show anything around the same time frame?

Bryan

From: veritas-bu-bounces < at > mailman.eng.auburn.edu ([email]veritas-bu-bounces < at > mailman.eng.auburn.edu[/email]) [mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu] ([email][mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu][/email]) On Behalf Of Patrick
Sent:
Tuesday, September 27, 2011 4:17 AM
To:
veritas-bu < at > mailman.eng.auburn.edu ([email]veritas-bu < at > mailman.eng.auburn.edu[/email])
Subject:
[Veritas-bu] HELP!!!!

Hi All,

The situation is getting crazy. Last night 17% of our backups failed with error code 47. It happened on only 6 of the 58 media servers. All the jobs were trying to backup up to one of two of the four VTL libraries. Looking at the <16> and <32> errors in /usr/openv/netbackup/logs I see CORBA errors on 3 of the six and Robot Failures on the other three. While we have many 47 errors on the weekends, this is a first of this magnitude for a week day. The only change I am aware of is: last week we increased the memory of 5 of the 6 media servers from 12GB to 32GB. Is it possible to have TOO much memory?

Environment:
RedHat Linux 64bit running 32Bit NetBackup 6.5.6
4 EMC VTL Libraries (sorry don’t know model #) 164 drives configured on each.
The failing clients are both UNIX and Windoze with one Oracle backup failure.

OH, and they only seem to happen between 23:00 and 04:00 (approximately)

ANY suggestions would be greatly appreciated.

Regards,

Patrick Whelan
VERITAS Certified NetBackup Support Engineer for UNIX.
VERITAS Certified NetBackup Support Engineer for Windows.

netbackup < at > whelan-consulting.co.uk ([email]netbackup < at > whelan-consulting.co.uk[/email])



_____________________________________________________________
DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email._______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

Post HELP!!!! 
Patrick,
 
What network do your NetBackup servers use for inter-server communication? Is it a shared backup network or is it a dedicated network? Either way, see how much bandwidth is being used.
 
You mentioned you set the NIC on the master to 1GB. If that wasn’t running at 1G, it is quite possible that you were dropping packets or they were timing out because if the network was overloaded or misconfigured (half duplex). Some of those packets are the inter-NBU communication between the media servers and may be the CORBA errors you see.
 
We have run into this problem a couple of times before and we eventually settled on a separate network for inter-server communication. It works well, but it’s a bit complex and troublesome if you don’t understand how it was setup.
 
Have you considered working with support to run an Apparenet scan on the network (I think they changed the name of the tool, but it tests the network and provides you a report of what’s wrong).
 
-Rusty

 
From: veritas-bu-bounces < at > mailman.eng.auburn.edu ([email]veritas-bu-bounces < at > mailman.eng.auburn.edu[/email]) [mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu ([email]veritas-bu-bounces < at > mailman.eng.auburn.edu[/email])] On Behalf Of Patrick
Sent: Tuesday, September 27, 2011 9:17 AM
To: veritas-bu < at > mailman.eng.auburn.edu ([email]veritas-bu < at > mailman.eng.auburn.edu[/email])
Subject: Re: [Veritas-bu] HELP!!!!


 
I have cleaned up some of our “DNS” problems, although they were not the clients in question, and will see how it goes tonight. It also turns out that the medias servers had “files dns” in /etc/nsswitch.conf whereas the master had “dns host”. I’ve changed the master to match. They also changed the NIC cards on the master to 1GB instead of auto negotiate. So we will see what happens tonight. If it is a problem with hitting the DNS servers too hard it should get worse tonight. J
 
Thank all of you for your suggestions.
 
Regards,
 
Patrick Whelan
VERITAS Certified NetBackup Support Engineer for UNIX.
VERITAS Certified NetBackup Support Engineer for Windows.
 
netbackup < at > whelan-consulting.co.uk ([email]netbackup < at > whelan-consulting.co.uk[/email])
 

 
From: Bahnmiller, Bryan E. [mailto:bbahnmiller < at > dtcc.com ([email]bbahnmiller < at > dtcc.com[/email])]
Sent: 27 September 2011 15:44
To: Patrick
Subject: RE: [Veritas-bu] HELP!!!!


 
Patrick,
 
                That is strange. I’m wondering if something else is going on. I have seen situations where you beef up your environment and it introduces you to other problems that used to be masked by a limited environment. With that many drives and that much memory, you are going to be able to queue up and run more jobs. If you are creating jobs faster, I wonder if you are running into name resolution problems now. Can you find out how loaded your DNS server is during the same time frame? I have seen where one of the older NBU environments I had was pounding the DNS servers to the point that they were running 100% cpu. I thought 6.x was much better at this, but it could possibly be related to the way your Linux servers are doing name caching and how hard they hit the DNS servers.
 
                One other possibility would be the VTL. I’ve had better luck with the newer DataDomain’s from EMC than their older “DL’s”. It may be possible that they are slow in responding to requests when they get busy, but I wouldn’t think those would show up as error 47’s.
 
                Does /var/log/messages show anything around the same time frame?
 
                                Bryan
 
From: veritas-bu-bounces < at > mailman.eng.auburn.edu ([email]veritas-bu-bounces < at > mailman.eng.auburn.edu[/email]) [mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu] ([email][mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu][/email]) On Behalf Of Patrick
Sent: Tuesday, September 27, 2011 4:17 AM
To: veritas-bu < at > mailman.eng.auburn.edu ([email]veritas-bu < at > mailman.eng.auburn.edu[/email])
Subject: [Veritas-bu] HELP!!!!


 
Hi All,
 
The situation is getting crazy. Last night 17% of our backups failed with error code 47. It happened on only 6 of the 58 media servers. All the jobs were trying to backup up to one of two of the four VTL libraries. Looking at the <16> and <32> errors in /usr/openv/netbackup/logs I see CORBA errors on 3 of the six and Robot Failures on the other three. While we have many 47 errors on the weekends, this is a first of this magnitude for a week day. The only change I am aware of is: last week we increased the memory of 5 of the 6 media servers from 12GB to 32GB. Is it possible to have TOO much memory?
 
Environment:
RedHat Linux 64bit running 32Bit NetBackup 6.5.6
4 EMC VTL Libraries (sorry don’t know model #) 164 drives configured on each.
The failing clients are both UNIX and Windoze with one Oracle backup failure.
 
OH, and they only seem to happen between 23:00 and 04:00 (approximately)
 
ANY suggestions would be greatly appreciated.
 
Regards,
 
Patrick Whelan
VERITAS Certified NetBackup Support Engineer for UNIX.
VERITAS Certified NetBackup Support Engineer for Windows.
 
netbackup < at > whelan-consulting.co.uk ([email]netbackup < at > whelan-consulting.co.uk[/email])
 
 

_____________________________________________________________
DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

View user's profile Send private message
Post HELP!!!! 
I would be careful with locking a 1GbE NIC. The definition of 1GbE mandates autonegotiation, i.e. it is not valid to lock the speed. You can of course only advertise 1000/FDX, so that would be the only possibility for autonegotiation.

DNS can be slowed down if you have lots of domain names in the domain search list, as it will try them all. You can avoid this by using fully-qualified names with a terminal dot (e.g. server.bigco.com.) but I must admit I don’t as it would confuse people who don’t know what it is for and some tools/scripts will just break with it.

Maybe worth checking with traceroute to your DNS servers and between your servers, to make sure it is using the NICs that you expect (if you have > 1 in any server).

You can use a tool like ‘ping plotter’ to see if there is something really slow in your network, but it is more aimed at WAN testing.

William D L Brown


From: veritas-bu-bounces < at > mailman.eng.auburn.edu [mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu] On Behalf Of Patrick
Sent: 27 September 2011 15:17
To: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] HELP!!!!



I have cleaned up some of our “DNS” problems, although they were not the clients in question, and will see how it goes tonight. It also turns out that the medias servers had “files dns” in /etc/nsswitch.conf whereas the master had “dns host”. I’ve changed the master to match. They also changed the NIC cards on the master to 1GB instead of auto negotiate. So we will see what happens tonight. If it is a problem with hitting the DNS servers too hard it should get worse tonight. J

Thank all of you for your suggestions.

Regards,

Patrick Whelan
VERITAS Certified NetBackup Support Engineer for UNIX.
VERITAS Certified NetBackup Support Engineer for Windows.

netbackup < at > whelan-consulting.co.uk ([email]netbackup < at > whelan-consulting.co.uk[/email])



From: Bahnmiller, Bryan E. [mailto:bbahnmiller < at > dtcc.com]
Sent: 27 September 2011 15:44
To: Patrick
Subject: RE: [Veritas-bu] HELP!!!!



Patrick,

That is strange. I’m wondering if something else is going on. I have seen situations where you beef up your environment and it introduces you to other problems that used to be masked by a limited environment. With that many drives and that much memory, you are going to be able to queue up and run more jobs. If you are creating jobs faster, I wonder if you are running into name resolution problems now. Can you find out how loaded your DNS server is during the same time frame? I have seen where one of the older NBU environments I had was pounding the DNS servers to the point that they were running 100% cpu. I thought 6.x was much better at this, but it could possibly be related to the way your Linux servers are doing name caching and how hard they hit the DNS servers.

One other possibility would be the VTL. I’ve had better luck with the newer DataDomain’s from EMC than their older “DL’s”. It may be possible that they are slow in responding to requests when they get busy, but I wouldn’t think those would show up as error 47’s.

Does /var/log/messages show anything around the same time frame?

Bryan

From: veritas-bu-bounces < at > mailman.eng.auburn.edu ([email]veritas-bu-bounces < at > mailman.eng.auburn.edu[/email]) [mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu] ([email][mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu][/email]) On Behalf Of Patrick
Sent: Tuesday, September 27, 2011 4:17 AM
To: veritas-bu < at > mailman.eng.auburn.edu ([email]veritas-bu < at > mailman.eng.auburn.edu[/email])
Subject: [Veritas-bu] HELP!!!!



Hi All,

The situation is getting crazy. Last night 17% of our backups failed with error code 47. It happened on only 6 of the 58 media servers. All the jobs were trying to backup up to one of two of the four VTL libraries. Looking at the <16> and <32> errors in /usr/openv/netbackup/logs I see CORBA errors on 3 of the six and Robot Failures on the other three. While we have many 47 errors on the weekends, this is a first of this magnitude for a week day. The only change I am aware of is: last week we increased the memory of 5 of the 6 media servers from 12GB to 32GB. Is it possible to have TOO much memory?

Environment:
RedHat Linux 64bit running 32Bit NetBackup 6.5.6
4 EMC VTL Libraries (sorry don’t know model #) 164 drives configured on each.
The failing clients are both UNIX and Windoze with one Oracle backup failure.

OH, and they only seem to happen between 23:00 and 04:00 (approximately)

ANY suggestions would be greatly appreciated.

Regards,

Patrick Whelan
VERITAS Certified NetBackup Support Engineer for UNIX.
VERITAS Certified NetBackup Support Engineer for Windows.

netbackup < at > whelan-consulting.co.uk ([email]netbackup < at > whelan-consulting.co.uk[/email])



_____________________________________________________________
DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.


This e-mail was sent by GlaxoSmithKline Services Unlimited
(registered in England and Wales No. 1047315), which is a
member of the GlaxoSmithKline group of companies. The
registered address of GlaxoSmithKline Services Unlimited
is 980 Great West Road, Brentford, Middlesex TW8 9GS.

View user's profile Send private message
Post HELP!!!! 
Hi All,

As of this morning we had NO 47 Errors. Yesterday, in addition to the NIC change, I changed the nsswitch.conf on the master to match the media servers and insured that all relevant media servers could to talk to all relevant clients. We went from 80% success rate to 97%.
Thank you all for all your ideas and suggestions. I don’t know what I would do without this group. J

Regards,

Patrick Whelan
VERITAS Certified NetBackup Support Engineer for UNIX.
VERITAS Certified NetBackup Support Engineer for Windows.

netbackup < at > whelan-consulting.co.uk ([email]netbackup < at > whelan-consulting.co.uk[/email])



From: veritas-bu-bounces < at > mailman.eng.auburn.edu [mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu] On Behalf Of William Brown
Sent: 28 September 2011 14:15
To: veritas-bu < at > mailman.eng.auburn.edu
Subject: Re: [Veritas-bu] HELP!!!!



I would be careful with locking a 1GbE NIC. The definition of 1GbE mandates autonegotiation, i.e. it is not valid to lock the speed. You can of course only advertise 1000/FDX, so that would be the only possibility for autonegotiation.

DNS can be slowed down if you have lots of domain names in the domain search list, as it will try them all. You can avoid this by using fully-qualified names with a terminal dot (e.g. server.bigco.com.) but I must admit I don’t as it would confuse people who don’t know what it is for and some tools/scripts will just break with it.

Maybe worth checking with traceroute to your DNS servers and between your servers, to make sure it is using the NICs that you expect (if you have > 1 in any server).

You can use a tool like ‘ping plotter’ to see if there is something really slow in your network, but it is more aimed at WAN testing.

William D L Brown


From: veritas-bu-bounces < at > mailman.eng.auburn.edu ([email]veritas-bu-bounces < at > mailman.eng.auburn.edu[/email]) [mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu] ([email][mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu][/email]) On Behalf Of Patrick
Sent: 27 September 2011 15:17
To: veritas-bu < at > mailman.eng.auburn.edu ([email]veritas-bu < at > mailman.eng.auburn.edu[/email])
Subject: Re: [Veritas-bu] HELP!!!!



I have cleaned up some of our “DNS” problems, although they were not the clients in question, and will see how it goes tonight. It also turns out that the medias servers had “files dns” in /etc/nsswitch.conf whereas the master had “dns host”. I’ve changed the master to match. They also changed the NIC cards on the master to 1GB instead of auto negotiate. So we will see what happens tonight. If it is a problem with hitting the DNS servers too hard it should get worse tonight. J

Thank all of you for your suggestions.

Regards,

Patrick Whelan
VERITAS Certified NetBackup Support Engineer for UNIX.
VERITAS Certified NetBackup Support Engineer for Windows.

netbackup < at > whelan-consulting.co.uk ([email]netbackup < at > whelan-consulting.co.uk[/email])



From: Bahnmiller, Bryan E. [mailto:bbahnmiller < at > dtcc.com] ([email][mailto:bbahnmiller < at > dtcc.com][/email])
Sent: 27 September 2011 15:44
To: Patrick
Subject: RE: [Veritas-bu] HELP!!!!



Patrick,

That is strange. I’m wondering if something else is going on. I have seen situations where you beef up your environment and it introduces you to other problems that used to be masked by a limited environment. With that many drives and that much memory, you are going to be able to queue up and run more jobs. If you are creating jobs faster, I wonder if you are running into name resolution problems now. Can you find out how loaded your DNS server is during the same time frame? I have seen where one of the older NBU environments I had was pounding the DNS servers to the point that they were running 100% cpu. I thought 6.x was much better at this, but it could possibly be related to the way your Linux servers are doing name caching and how hard they hit the DNS servers.

One other possibility would be the VTL. I’ve had better luck with the newer DataDomain’s from EMC than their older “DL’s”. It may be possible that they are slow in responding to requests when they get busy, but I wouldn’t think those would show up as error 47’s.

Does /var/log/messages show anything around the same time frame?

Bryan

From: veritas-bu-bounces < at > mailman.eng.auburn.edu ([email]veritas-bu-bounces < at > mailman.eng.auburn.edu[/email]) [mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu] ([email][mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu][/email]) On Behalf Of Patrick
Sent: Tuesday, September 27, 2011 4:17 AM
To: veritas-bu < at > mailman.eng.auburn.edu ([email]veritas-bu < at > mailman.eng.auburn.edu[/email])
Subject: [Veritas-bu] HELP!!!!



Hi All,

The situation is getting crazy. Last night 17% of our backups failed with error code 47. It happened on only 6 of the 58 media servers. All the jobs were trying to backup up to one of two of the four VTL libraries. Looking at the <16> and <32> errors in /usr/openv/netbackup/logs I see CORBA errors on 3 of the six and Robot Failures on the other three. While we have many 47 errors on the weekends, this is a first of this magnitude for a week day. The only change I am aware of is: last week we increased the memory of 5 of the 6 media servers from 12GB to 32GB. Is it possible to have TOO much memory?

Environment:
RedHat Linux 64bit running 32Bit NetBackup 6.5.6
4 EMC VTL Libraries (sorry don’t know model #) 164 drives configured on each.
The failing clients are both UNIX and Windoze with one Oracle backup failure.

OH, and they only seem to happen between 23:00 and 04:00 (approximately)

ANY suggestions would be greatly appreciated.

Regards,

Patrick Whelan
VERITAS Certified NetBackup Support Engineer for UNIX.
VERITAS Certified NetBackup Support Engineer for Windows.

netbackup < at > whelan-consulting.co.uk ([email]netbackup < at > whelan-consulting.co.uk[/email])



_____________________________________________________________
DTCC DISCLAIMER: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify us immediately and delete the email and any attachments from your system. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.




This e-mail was sent by GlaxoSmithKline Services Unlimited
(registered in England and Wales No. 1047315), which is a
member of the GlaxoSmithKline group of companies. The
registered address of GlaxoSmithKline Services Unlimited
is 980 Great West Road, Brentford, Middlesex TW8 9GS.

View user's profile Send private message
Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB