Login Form






Lost Password?
No account yet? Register (FREE)

Search Backup Central

Disclaimer

The opinions contained within this website, it's blog(s), forums, and Wikis, are those of the original poster and do not represent the position of my (or any other) employer. This blog is not owned by my employer nor does it officially represent any company.
Forget NDMP & use Synthetic Full backups PDF Print E-mail
Written by W. Curtis Preston   
Tuesday, 20 March 2007
NDMP is a great way to get faster backups for your filers by connecting tape drives directly to them.  But combining Synthetic Full Backups and NFS/CIFS access to your filers might be a lot better.

NDMP (the Network Data Management Protocol) is designed to allow you to properly back up your filers.  The main feature of NDMP is that it allows you to directly connect your tape drives to your filer and back them up without having to send your backup data across the network.  If you're using NDMP to back up across the LAN (AKA three-way NDMP) you're missing the main benefit of NDMP; backing up via NDMP isn't usually any faster than backing up via NFS or CIFS if you're backing up across the LAN.

NDMP also allows the filer vendor to resolve the ACL/permissions issues surrounding the fact that filers support multiple operating systems.  Without NDMP, you need to back up CIFS data via CIFS and NFS data via NFS.  NDMP allows the filer vendor to write a single backup format that can back up both types of data.

NDMP also has limitations, though.  The main limitation being that it does not support cross-filer recovery.  You cannot back up a NetApp and restore it to a BlueArc filer -- even though both support NDMP.  This is because NDMP is a communication protocol only; it does not specify the backup format.  Each filer vendor therefore creates their own format, making NDMP restores between filer vendors not possible.

NDMP backups (like any other traditional backup method) also place a load on the filer and can take a really long time. Finally, your backup product will probably charge you to use NDMP.

Synthetic backups, if your backup product supports them, are a great way to get a good backup of your filer without these limitations. Synthetic backups use the most recent full and incremental backups to create another full backup without having to transfer any data from the client; all the work is done on the backup server.  Some products can also create synthetic differentials (AKA cumulative incremental or level 1) backups that contain all files that have changed since the last full backup.

First, the caveats. 

  1. Your product must support this feature.  NetBackup, NetWorker, & CommVault support this feature. TSM's backup sets are synthetic fulls, but they are designed to be used outside of TSM and can't be used for regular TSM restores.
  2. You will have to backup your data via the correct protocol.  Unix-style data can be backed up via NFS, and Windows-style data can be backed up via CIFS. 
  3. NDMP users will only see a benefit for typical user data.  This won't help much when backing up database data.  The idea is that you only have to perform an incremental backup; if that backup is the equivalent of a full backup, this idea won't help.

So, here's the idea.

  1. Use NFS or CIFS to mount your data to your backup server.  (Please don't mount it to a client.  That sends the data across the network twice - from the filer to the client and from the client to the server.)
  2. Use your backup server to create a full backup of that network drive.
  3. Time passes.
  4. Perform an incremental backup of the filer via the same NFS/CIFS path.
  5. Time passes.
  6. Perform another incremental backup of the filer via the same NFS/CIFS path.
  7. Time passes.
  8. When you think you need another full backup, perform a synthetic full backup.  It will merge the latest full backup and the incrementals to create a new full backup that looks just like a "real" full backup.  It's just that the data will all move directly from one tape to another, without having to bother the filer. 

This way, you get all the benefits of occasional full backups (mainly faster restores) without having to actually make them.

Comments? 

Comments
Search RSS
hga   |2007-03-21 11:12:31
One is making a bet that your vendor will properly construct a synthetic full
backup. Bugs in that might be a bit subtle to detect, and it probably would be
a good idea to keep the real full backup and incrementals until you make another
real full backup.

Otherwise, if time to restore from a total failure is
important, it sounds like a great idea.

- Harold
cpreston   |2007-03-21 11:27:57
Any backup can have bugs. Are you saying that you'd prefer the
known bugs and hiccups of dump to unknown (possible) bugs
in synthetic full?

Have you ever read or heard what Linux
Torvalds is saying these days about dump? (http://lwn.net/20... Here's a few excerpts:

"Dump was a stupid program in the
first place. Leave it behind."

"R ight now,
the cpio/tar/xxx solutions are definitely the best ones
... Whatever
problems they have, they are still better
than the _guaranteed_(*)  data
corruptions of dump."

(Dump
is what NetApp and a few other NAS vendors use underneath
NDMP, BTW.)
hga   |2007-03-22 04:41:52
No, not at all. I have no love for dump, and I don't think I've
used it since 1987 or so (4.2 or .3 BSD on MicroVAX
IIs).

As for the comments by Linus, which I think you mention in
your newest book, I'm not at all impressed: in essence what
he's saying is that it was decided not to carry dump
along with Linux by the time of the 2.4 kernel (perhaps
a good decision, but "cpio/tar", while
good archivers, are no substitute for the real thing),
and that no replacement was even envisioned as of 2001, and
from your book, none has been made.

For someone who
worked with systems like MULTICS, history is repeating itself
as a farce, where we are developing systems that are pale
shadows of what we once did....

Anyway, in all cases
I'm assuming the same utility was used to collect the
original backups, and that the synthetic full is made to look
like it was made with the same utility.

I'm a programmer
who sometimes wears a sysadmin hat, and who knows how
easy it is to create bugs, and how few organizations
really test the software they ship.  Hopefully
commercial backup vendors do an above average job, but you would
know that much better than I.

Me, I'm with Jacques Clouseau
in I think the Return of the Pink Panther: "I suspect everyone, and I suspect no one."

- Harold
cpreston   |2007-03-22 07:00:05
You're exactly right that the same utility /format will be used to create the
synthetic backup. However, if you're going to be doing synthetic fulls, you're
not going to use NDMP; hence, you're not going to use dump.

So here are your
two choices:
* dump with regular occasional "real" fulls
* NFS/CIFS ->
server -> some commercial utility, followed by occasional synthetic fulls

And
I'm saying that the latter has the following advantages:
* It doesn't use dump
(which has had problems since it was designed)
* It allows you to perform fulls
whenever you want, even during the day
* It places no load on the filer or LAN
when you're doing synthetic fulls

So I'm saying give it a shot!
cpreston  - Just throwing out an idea   |2007-04-09 13:21:49
Just threw out an idea to get us talking. Thanks for the feedback.

I agree
with pretty much everything you said. (I'm not sure if you can use All with
NDMP either, depending on your backup software, but I get your point.)

I've
backed up some of the biggest NAS environments out there and have done some
pretty "kooky" things that others said wouldn't scale, and they did, so
I have a little more faith than the average joe about what will work in a large
environment. You might be surprised as to how well this will scale.

Summary:
Just think about my idea. If it works for you, then cool!
cpreston  - VTLs?   |2007-05-01 07:24:32
This actually came from ag100. Had a problem with his post..

Hey - It goes
without saying, but I think this really depends on individual environments and
requirements... From my experience, larger environments with a mix of NFS, CIFS
and multi-protocol shares, this has a hard time scaling. It's also added
complexity and removed some key benefits seen using NDMP (not that I'm
completely thrilled with NDMP, either).

Here's why:

* You generally (perhaps
not in all cases) have to specify save set names manually, instead of using an
"All", when backing up remote file systems. If you have a lot of file
systems (or a growing environment), this increases the possibility of someone
forgetting to add a file system and increases the inherant the risks associated
with human interaction (typos, etc...). Now that some backup software no longer
requires you to specify each file system individually, (assuming your volumes
are sized reasonably, etc...) using "All" can be a nice feature, which
also reduces risk.

Along the same lines, if your filer has a lot of save sets,
you can run into limitations in your backup software... For example, I believe
NetWorker only supports ~900 bytes in a save set listing and 1024 bytes per
client profile. If you run over that, you're stuck managing multiple client
profiles per client, or using scripts to issues save commands manually.


*
Another item that must be considered, depending on requirements, is the network.
Local NDMP backups, which takes advtantage of disk (via VTL) or modern tape
technology perform can perform relatively well. If you had a requirement to
backup file systems with a high data change rate (pst's, databases, etc...), you
might find that performing even daily incremental backups can put considerable
strain on backup resources.

* If you have both NFS and CIFS shares, you may
have to have UNIX and Windows tape servers available. That can be painful,
especially if you don't have both in your environment now.

* Lastly, we've been
receiving an increasing number of requests for multi-protocol shares. As you
mentioned, due to permissions requirements, this is a known limitation of
network based backups NAS backups.

I'm not bashing the idea, and for some
environments, it's a perfect fit, it just hasn't scaled well for me in the
past.

Thoughts?

-A aron
yaron   |2007-05-04 12:05:42
Hi,

From my exprience, LAN based NDMP is 2-3 times faster than NFS based
backup. In my setup, the Networker server and the library are located at a DR
room, over a kilometer away, so pulling fibers just for connecting drives to the
filer is not an option.

There are a few problems with your argument about
dump. Dump handles correctly the issue of not restoring files which were
DELETED. tar/cpio cannot do that. Linus, when speaking of dump's data
corruption, is probably refering to dumps of a mounted file system. This is
something you shouldn't do in the first place. As for NDMP usage of dump, you
might recall that when running dump of a NetApp, the filer takes a snapshot and
backs it up. This should guarantee that the dump will not be corrupted.
Only registered users can write comments!

3.26 Copyright (C) 2008 Compojoom.com / Copyright (C) 2007 Alain Georgette / Copyright (C) 2006 Frantisek Hliva. All rights reserved."

 
< Prev   Next >

Sponsored Links