NetBackup: One policy per client (UPdated)

If you're a NetBackup user, I know you think I'm crazy, but this is what I like: One policy per client and per database instance, and I'm going to do my best to convince you that it's a good idea.

What am I, nuts?  That could be thousands of policies!  That's right.  And I am suggesting that thousands of policies is now (as of 4.5) no more difficult to manage than a few dozen policies.  AND I suggest that it presents to you a much more digestable, manageable set of things to manage.  And no one that I've talked into this "crazy" idea has ever regretted.  Once they grok it, they love it.

(Update: There's a discussion on the NetBackup mailing list about this, and the support for it was a lot stronger than I thought.  One person as over 4000 policies and loves it.)  Check this out:

4500 policies

It's all about minimizing complexity and management, right? The Convential Wisdom says the best way to do that  is to make one policy for Unix, one for Windows, one for Oracle, etc.  With Unix, Windows, MacOS, NDMP, Oracle, Informix, SQL Server, Exchange, and Sybase, we've got nine policies — sounds manageable enough. If that's the way it stayed, I'd be all for that — but it never stays that way.  Next thing you know, one or more (or all) of the following happens:

  1. The above assumes every client in each policy can do their full backups in one night.  Next thing you know, that doesn't work.  (Many of you kick the full backups off on Friday night and let them run all weekend.  Next thing you know, it doesn't fit into a weekend.)  Now we have to start spreading it out across the week or month.  Spreading them across the week turns 9 policies into 56 policies really quick.  If you spread them out across the month, you've got 252 policies. All you need to do is create all the policies you need, and move some clients into each policy.  Of course, that means a full backup on each client that you move, since NBU doesn't share level data between policies.
  2. Next thing you know, one of your policies is too full and it's backups won't fit on the night you assigned them to. All you need to do is move some of the clients to another policy. Ooops.  Another full backup.
  3. Along the way, you end up changing naming conventions, and you have backups with all of the following policies in your backup history: Unix, Unix_Thursday, Unix_First_Thursday, etc. 
  4. Now it's time to pass the torch on to the new backup person.  How do they wrap their heads around this mess?    GlassHouse (the company I work for) does hundreds of backup assessments (among other services), and we've seen this over and over.

Let's compare this to my way.  Put every client in its own policy, with a naming convention that tells you what is.  Something like Prod-FS-Unix-clientname-ALL (the fs means filesystem backups).  If it's an RMAN policy, it would be Prod-RMAN-Win-clientname-instancename (where instancename is the name of the instance that policy backs up).  If you need to change their schedules, change their schedules — no full backup required.

Here are the objections, and why I don't think they hold water:

  1. It's easier to make global changes when you have fewer changes.If you want to change a bunch of clients in one policy, you make one change to one policy.  That's got to be harder when you have a bunch of policies.
    1. If you're a GUI person, all you need to do is shift-select all the policies you want to change in the GUI, make the modification you want to make, then save.  NetBackup will update all of the policies.
    2. If you're a command line person, how hard is it to take a command that modifies one policy, and add a for loop around it to have it modify several policies?
  2. When you need to add a new client, adding them to a new policy is harder than just adding them to an existing policy that's already set up.
    1. If you're a GUI person, right click on a policy of the same type, and select "Copy to new policy."  It'll make another policy that's the same as the first one.  Then add the client to that policy.  One extra step. Big deal.
    2. If you're a command line person, the bppolicynew command has a -sameas option to do the same thing.
  3. NetBackup will choke with that many policies!
    1. Prior to 6.0, if you have 6000 policies (I've had this many) and you have it start all the backups at the same time (what I do, too, but that's a discussion for another blog entry), then it will take a while to get all the backups started.  I've had it take up to an hour and a half, and the amount of time it took was predictable.  All I did was move the window up an hour and a half, and all was well.
    2. 6.0's new scheduler doesn't have this problem.

This layout is very easy to understand.  There's no question as to what clients are in what policies.  When you're trying to do a bpimagelist to find a certain backup, that comes in handy.  When you change schedules for load balancing purposes you don't force a full backup.  You can help understand what's scheduled when by having a naming convention for schedules and looking at the "summary of all policies" windows.

I can't wait to see the comments on this one. ๐Ÿ™‚

Written by W. Curtis Preston (@wcpreston), four-time O'Reilly author, and host of The Backup Wrap-up podcast. I am now the Technology Evangelist at Sullivan Strickler, which helps companies manage their legacy data

18 comments
  • I thought for sure someone would jump down my throat about THIS idea! Did I actually convince everyone to do this? I doubt it!

  • [quote name=cpreston]I thought for sure someone would jump down my throat about THIS idea! Did I actually convince everyone to do this? I doubt it![/quote]Heh. From my viewpoint as a programmer who as a sysadmin is just currently a "sophisticated home user" (who’s about to set up a (current plan) OpenSolaris multiple RAID-Z2 home file/archive/backup/RDBMS server), and who has never had to sysadmin a site 1/100th of the size and complexity you’re talking about, I’d say this is "the exception that proves the rule" (the rule being "-)on’t Repeat Yourself" (DRY)).

    As long as you rigorously use naming conventions (that’s where you’d get into trouble, two different policies with the same name), it sounds like a good pragmatic choice based upon hard experience. I’ll take such experience over abstract theory any day.

    – Harold

  • I couldn’t agree with you more! We have around 4,500 policies on one Master and 6 Media servers. We implemented this in 2002 and had to write a few scripts, but it is all self-managing and easy to audit.

  • Snooopnbu, I love you. ๐Ÿ˜‰

    See what I’m talking about, folks? Doesn’t 4500 policies sound INSANE? And yet they like it!

    What kind of scripts did you write?

  • We don’t do this now- we take a hybrid approach where many small app servers share a policy, but larger file servers and DB’s get their own- but I won’t argue with you. Ours works because our server naming scheme includes function, so as long as you have the decoder ring it’s pretty obvious where things go. But I can see this in our future.

  • Your premises for the one policy per client argument are flawed. You do not have to have all fulls in the same policy run on the same day.

    The standard I recommend is a policy with the window open at the same time Monday through Friday nights (sometimes Sun-Thurs depending on the business). All the clients don’t have to run fulls on the same day. Assuming a one week frequency, clients will run on the day they were added to the policy (you could “tune” it, but I’ve never found it worth the trouble). If they fail (incl 196), they will now run on the subsequent day.

    This is a more fault tolerant approach which does not require micromanagement.

    In a perfect world you could use all seven days, but there’s generally some clients that “have” to have fulls on the weekend so I tend to use the business week for everything I can.

    OTOH, there’s going to be enough special cases that I have no objection to using a policy per client if that’s what they want.

    P.S. One policy vs multiple has no effect on the performance of the scheduler. You can launch thousands of jobs concurrently without delay using nothing more powerful than my laptop. When the scheduler seems to take forever you have either clients (when using wildcards or ALL_LOCAL_DRIVES plus Multiple Data Streams) or media servers down. A quick netstat -an | grep SYN during the delay(s) will point this out. With an hour and a half delay, someone has probably cranked up one of the timeouts.

  • …the NUMBER ONE reason for one client per policy; that yahoo admin that calls up and says “can you disable the backups for client X tonight?” Ever try to manage that nightmare when there are 30 other clients in the policy? Delete the client, wait, forget to add it back, get yelled at a month later when someone needs a backup.

  • I completely understand your thinking. Getting server and DB configurations standardized so they can be grouped by policy is difficult. However that’s the route I’m taking.

    I want a logical configuration for server builds, DB deployments, and NB policy configurations that are pretty close. I’m not there yet but that’s where I’m headed.

  • I HATE frequency based backups for anything other than daily backups. They work fine for a while, but when you have some failures or outages on the server, next thing you know, backups start bunching up. I want to control when full/cumulative backups run, and that means calendar-based scheduling, and that means one schedule per policy.

    The 1.5 hour delay was a few versions ago. There’s been a lot of work with the scheduler since then.

  • Why is yahoo administering your backups? ๐Ÿ˜‰

    Did I forget to mention that reason? I can’t believe it! It’s one of my favorites!

  • UNIX clients, you cannot manage exclude lists from the GUI. The problem with putting multiple clients in the same policy is that the backup selections have to be the same for all clients. A newbie NBU administrator might see that client A is backing up all_local_drives, but is totally unaware of its exclude list, since he can’t see it in the GUI.
    So in short, one client per policy makes it easier to specify specific backup selections per client. ๐Ÿ™‚

  • I manage all my exclude lists totally outside of the GUI using a script that pushes them out from the master server. But I get your point. ๐Ÿ˜‰

    I’m not sure if having one or many policies changes this, though. The same bad thing you’re describing could happen in the “many policy” world.

  • I manage all my exclude lists totally outside of the GUI using a script that pushes them out from the master server. But I get your point. ๐Ÿ˜‰

    Script? What script is this? Where can I get it? There are many functionality improvements that Netbackup could use, but thats another thread. ๐Ÿ™‚

    Without the script, I think the best alternative is to have one policy per client backup function.
    At least if you have one policy per client backup function, you could easily set up a specific backup selection list for that client and manage that through the Administrative GUI.

  • I’d like to implement this for about 150 servers, but I agree naming convention is critical.

    What do your naming conventions look like?

    I think if you are using staggered full policy, the day of the week the full runs should be included in the policy name, along with the backup type (sql, fs, etc).

    Also if the server is sercure / PCI, that should go into the name too right?

    Production / PCI db server full weekly:

    pci_servername_domain_sql_weekly_mon

    Doesn’t seem to look right to me.

    Ideas?

  • Your policy name looks fine to me. It may look long, but anyone with a guide to the naming convention will know exactly what’s in that policy, won’t they? That’s so much better than most policies I’ve seen like prod_unix, prod_unix1, prod_unix2, etc.

    I would change the order of things so that it groups differently in a listing. (I would put sql first.) I would also add prod, dev, test.

    I used to put the day of the full backup in the name, but changed my mind because it makes things difficult to move around, especially if you’re using NBU. (If you change the full day then you have to change the policy name, and in NBU, that forces a full backup.)

    I like phrases like this:

    Prod_FS_elvis_ALL: production filesystem backup of ALL_LOCAL DRIVES

    Prod_FS_PCI_apollo_ALL: production filesystem backup of all of apollo’s drives and it’s encrypted.

    Dev_RMAN_elvis_ABCD: development backup of oracle using rman, backing up the ABCD instance on elvis

    Test_FS_elvis_home1: test backup of the /home1 filesystem on elvis

  • I HATE frequency based backups for anything other than daily backups. They work fine for a while, but when you have some failures or outages on the server, next thing you know, backups start bunching up. I want to control when full/cumulative backups run, and that means calendar-based scheduling, and that means one schedule per policy.

    My users tend to always say that their application absolutely has to be 24×7. So I counter with “then it doesn’t matter when the backups run” and run full backups every night of the week, spreading out the load on my clients, servers, and tape devices. There are a few exceptions (and we do typically do most backups in the evenings) but in general, anybody can go any time.

    I have not yet had a client tell me that they’re only 5×12 and that’s why backups have to run on weekends.

  • We also have the one policy per server or DB instance. Part of the reason for this is because we use Control-M as a 3rd party scheduler.

    Also our Operations group does not have access to the NB Java Gui so they cannot restart jobs from that method, they have to use Control-M.

    Group policies doesn’t allow them to function in the role we need them to function in.