
Issues Upgrading 4.5 FP7 to NBU 5.1 for Large Environments
Hey Len, thanks for the response.
Please see
http://support.veritas.com/docs/274544
We have 1 HP-UX Master/Media Server and 5 other Media Servers running NBU 5.1 MP2 and IBM 3494 Tape Libraries cross campus using In Line Tape Copy. We also use NetApp Filers for D2D disk backups and use Vault to dupe these backups to tape. We use IBM 3590 and 3592 tape drives along with NDMP backups to the NetApps - a total of around 40 tape drives. We send primary backups cross campus for immediate vaulting and the secondary tape gets vaulted greater than 90 miles away. This gives us the local data, a cross campus tape backup, and a regionally vaulted tape.
Under 4.5 FP7 we would have anywhere from 1,000-3,000 jobs either queued or active and staggerred throughout the weekend and could go to sleep on Friday and wake up on Sunday and do a few reruns of failed backups. In-Line Tape copy creates 3 jobs, a parent and the 2 tape jobs going to each campus. We have been testing our max jobs and it appears to be around 400 total queued and active jobs when everything either gets hung to the point of reboot under 5.1 ... or the job end-writing time may have a 1-5 hour difference between when the job actually posts as complete and releases resources. When NBU finishes the backup but doesn't post the job as complete and hangs onto resources is when daily backups get back-logged.
This past week we have had to bounce NBU because 1,000 jobs are queued, 400 are active, of that 400 the majority are actually done, but no new jobs can start. Our backup window for night time backups closes at 6 AM. There are daytime backups that are then supposed to start. The only way to do this is to crash all 6 NBU instances, let the 1,000 jobs fail with a status 50, and wait about 20-60 minutes for BPSCHED to get its' head on straight, and to get going again.
It is a viscious cycle, because once the daytime backups are going, we try to resubmit 1,000 of these failed backups and can't get this done each day. The schedules then have a 12 hour delay, so if I resubmit them late in the day, they won't run again for another 12 hours even though the window is open, and it is eternal damnation.
Compound that with the fact that we have /opt/openv/netbackup/bin/admincmds/bpconfig -tries 2 and as soon as NBU is recycled, it resubmits thousands of jobs and buries BPSCHED again requiring another recycle.
We do get to a frustration level where we set bpconfig -tries 0 and then manually submit jobs all night long and all weekend long. Thus, go back to my link where Veritas suggests baby-sitting backups and not submitting too many at a time as an Enterprise Level solution.
I hope this answers your questions, I hope my frustration doesn't deter anyone from asking me more questions or providing suggestions. I really do need your input and ideas. I would much prefer your critical and scrutinizing questions vs. having to tell my wife why the phone rings all night long.
Thanks to all !!!
Brian
-----Original Message-----
From: Len Boyle [mailto:Len.Boyle < at > sas.com]
Sent: Saturday, January 29, 2005 7:46 PM
To: DIVEN, BRIAN; veritas-bu < at > mailman.eng.auburn.edu
Subject: RE: [Veritas-bu] Issues Upgrading 4.5 FP7 to NBU 5.1 for Large
Environments
Hello Brian
Two questions, What is the ballpark range for submitting many backups? I do not believe we have seen your problem with 5.1, but then maybe we do not meet the magic number. Or it may depend on the servers used to support the backup server....
Also I searched on support.veritas.com and I could not find anything using the search pattern of 274544. Is there a typo, or did veritas remove the technote?
len
________________________________
From: veritas-bu-admin < at > mailman.eng.auburn.edu on behalf of briandiven < at > northwesternmutual.com
Sent: Sat 1/29/2005 7:20 PM
To: veritas-bu < at > mailman.eng.auburn.edu
Subject: [Veritas-bu] Issues Upgrading 4.5 FP7 to NBU 5.1 for Large Environments
TechNote 274544 provides ideas to reduce the burden on the NBU 5.1 software in large environments. Since our upgrade 9 weeks ago, we have been bouncing NBU almost daily due to hung backups and we are a large 24x7 environment and have seen limited improvements after 9 weeks of an open case with Veritas - and then seeing this TechNote. I find it ironic that the re-branding of 5.1 to Enterprise Server and a technote that says not to stress BPSCHED in Enterprise Server environments can occur, so I'd like to see if I'm alone here.
We have had several issues upgrading to NBU 5.1 MP1 and now MP2 where we are unable to submit many backups (queued or active) at once. The recent TechNote 274544 fits our account perfectly and I am wondering if any other large NBU shops are experiencing similar issues. I have a hard time believing this TechNote was generated just because of us. Veritas shows no desire to address this other than to wait until release 6.x and I could use some friends that will either state that they have an issue or help me push a fix through.
Veritas backline also stated that they won't support us backing off of 5.1 MP1 to 4.5 FP7 where we had a stable environment. They test MP2 to MP1 uninstalls, but when you upgrade from release 4 to release 5, they don't test this and there are some inherent undocumented catalog changes that could mess us up and not be able to recover a 5.1 backup to a 4.5 restore. They only want to fix and go forward. We had many MP2 binaries prior to them being released and then moved to MP2 and we still can't get through a night if we submit all of our backups.
We have exercised every recommendation in this technote and remain unsuccessful.
I need some of my friends to contact me with similar issues to get this fixed if we are to fix and go forward. We need to push Veritas on this issue as a group of large Enterprise Server companies.
Brian