History must repeat itself

Thirteen years ago, two companies accomplished the impossible and created NDMP.  It’s become such a standard way to back up NAS that you may have forgotten just how revolutionary it was when it came out.  I’m going to remind you of its history and say that history needs to repeat itself with dedupe & virtualizatioin.  Click Read More to see what in the world I’m talking aobut.

Thirteen years ago, NetApp & PDC (the makers of BudTool, remember them?) realized they had a common problem.  NAS filers were this weird thing that no one knew how to back up and PDC was getting a lot of requests to back up filers.  They created the network data management protocol, or NDMP.  There was FTP for file transfer, an SMTP for mail transport, and then there was NDMP for backing up filers.

First, let me commend NetApp for what they didn’t do.  They didn’t create a NetApp-only solution or a PDC-only solution.  They could have, but they didn’t.  (It’s on my list of reasons why I’ve always liked that company.)  They actually got competitors to sit around a table and talk about what they needed to be in (and more importantly outside of) a specification for a backup protocol for filers.  One big concession was that the backup format (e.g. tar, cpio, dump) was outside of the spec. Each vendor could use their own backup format, but they would all be controlled in the same way.  While this created a problem for those wanting to restore NetApps to Celerras, it gave each vendor just enough control to agree to a standard way to back up filers.

Now NDMP is the standard way to back up filers and no one disputes that.  Other than NFS/CIFS-mounting them to your backup server, it’s pretty much the only way to back up filers.  Other methods have tried and failed.  (Remember NetWorker’s ClientPak for NetApp?)  NDMP is the standard.

Now we have two very different technologies that have similar problems: target dedupe & server virtualization.   First let’s talk about dedupe.

Once you’ve backed up to a dedupe target, how do you copy its backups to another one offsite?  How can you do that in a way so the backup software can control it, know about it, and report on it?  What about copying to tape?  If you have replicated from one dedupe box to another, how do you copy that replicated backup to tape at the replicated destination?  Good luck on both these accounts.

I’ve already posted previously about my thoughts about NetBackup’s Open Storage Option (OST).  The short version is that I’m fan of what OST does, but I’m not a fan of how Symantec did it.  When they asked me years ago about the OST idea and their NDMP “direct-to-tape” mechanism (which is a completely different way to solve part of the problem, and is incompatible with OST), I told them I liked the idea, but I begged them to bring more than one ISV (independent software vendor) to the table.  I knew all the OEMs would participate, of course, because they were Symantec and they owned (and still own) the lion’s share of the backup software market.  So we knew that Data Domain, EMC, Quantum, Falconstor, SEPATON, and others would sign up to partner with them on this.  What I wanted, though, was at least one other backup software product to be in there so that they wouldn’t create a Symantec-only solution.  Well, we know what happened.  Open STorage (OST) is anything but open, and Symantec put NetBackup-specific stuff into the NDMP direct-to-tape feature as well.

This leaves me torn.  On one hand, I think that this functionality is so important that users should vote with their dollars and show the other ISVs that they agree by converting to NetBackup just to get this functionality.  On the other hand, I don’t want to reward Symantec for doing what I ultimately feel was a selfish act that was completely opposite of what NetApp did thirteen years ago.  By the way, I’ve also heard from dedupe OEMs and other backup software products that some of them are working on their own product-specific way to solve this problem.  Great….  Thanks for starting a trend, Symantec.

Someone needs to step up to the plate and stop this madness.  We need at least one major ISV and at least one emerging ISV (or another major ISV, of course) and more than one OEM to get together and work this out.  Stop thinking of it as a competitive advantage and think of it as a common problem that you can all work out together much more easily — and then move on to more important things like making your systems faster and more reliable.

  • Will Symantec make Open Storage actually open?  I’d be fine with that, assuming other backup products could actually use that API.
  • Can NDMP be further extended to meet the needs of the dedupe community?  It’s already been extended beyond its original design to include management and cataloging of snapshots, and creation of tapes by VTLs.  Is this so far off?
  • Will anyone from CA/CommVault/EMC//HP/IBM/Arkeia/Atempo/Bakbone, or any combination of the above sit down at the same table?
  • Will Data Domain/EMC/Exagrid/Falconstor/HP/IBM/NETAPP/Overland/Quantum/NEC/SEPATON/Sun, or any combination of the above sit down at the same table?
  • Do any of you have any ideas?  I’d be happy to listen to them, offline if necessary.  Drop me an email.

Let’s not forget server virtualization.  While VMware, HyperV, VirtualIron, and Xen server are all great, backing them up stinks.  It combines the most I/O intensive application in the datacenter (backup) with the system least able to handle intensive I/O loads (virtual servers).  VMware has created VMware Consolidate Backup, or VCB, which makes things better (albeit more complicate) for VMware customers, but what about the rest of these products?  I don’t have as much voice in the virtualization community as I do in the backup community, but it seems to me that the different virtualization products are very analogous to the early filer products, and that someone (probably VMware) should step up to the plate and create an NDMP for VMware.  Hey, they can even use NDMP for all I care.  Just give us a better way to back up our virtual servers that isn’t tied to a specific product.  Can the NetApp of the server virtualization world (that would be you, VMware), be so magnanimous as to start such an initiative, or will they only see it as a way to help out their competitors?  Or will they see it the way NetApp saw it, as a way to further the market and therefore make everybody’s pockets heavier?If I can facilitate any of the above happening, I’d be very glad to do so.  If you can’t wait for The BD Event in June, I’d be happy to broker a meeting at SNW or Storage Decisions.  Just let me know.

Let’s get history to repeat itself, shall we?

Written by W. Curtis Preston (@wcpreston), four-time O'Reilly author, and host of The Backup Wrap-up podcast. I am now the Technology Evangelist at Sullivan Strickler, which helps companies manage their legacy data

13 comments
  • Well, they do say that those who don’t remember history are condemned to repeat it. Which is particularly ironic given your subject… and that you completely forgot to attribute any of the development of NDMP to Legato. For the record Legato was one of the principals responsible for the development of NDMP. It is *not* a NetApp invention exclusively.

    With respect to VMware and VCB, I think that the present version is fairly rudimentary. You are likely to see better things in the future. Just guessing, of course. ๐Ÿ˜‰

    Other relevant notes, include discussion of the NetWorker implementation of OST like functionality can be found in the next few days in my blog.

    I agree with parts of what you are saying, but I am both less and more hopeful than you about this.

  • I believe if you check the "Legato helped invent NDMP" story, you’ll find that it was PDC that played that part (which I mentioned in the story).

    Legato was one of the ones who was trying to do anything BUT NDMP. They tried to do this Java think (referred to in the article as ClientPak for NetApp.) Check out this old newsgroup post talking about just that: http://groups.google.com/group/comp.unix.admin/browse_thread/thread/297a50878f5fa576?q=NDMP#7c958399b943460e (Unfortunately, it looks like Google is now requiring you to sign up and log into “Google Groups,” AKA Usenet, to get access to that link.)

    Then one day, Legato acquired PDC/Intelliguard, and all of the sudden, they started talking about how they helped invent NDMP. As long as you’re inheriting histories, you might as well say EMC helped invent it, since they aquired Legato.

  • Is anyone aware of an open source NDMP implementation of either client or server? I can’t seem to find one anywhere. Odd that there isn’t one since NDMP is such a standard. Normally when people want to ensure adoption of a standard they release a reference implementation under something like the BSD license which everyone can easily incorporate into their own code.

  • The spec is open so anyone can write to it on either side, but no one has done so to my knowledge. Maybe Bacula or AMANDA will do that, now that they have commercial versions.

  • Well, if you want to play that game…

    Symantec didn’t “invent” their backup products (both Backup Exec and NetBackup are acquired). Nor did they invent their email (eVault was acquired). Likewise their HSM. All good products right? And all stuff Symantec would take credit for. So I think it is totally legitimate for Legato/EMC to claim co-inventoryship ๐Ÿ™‚ for NDMP. If you talk to any of the old Legato folks in EMC today they are very passionate about this.

  • No, Symantec didn’t invent their products, and yes, a lot of products were acquired. And I don’t think it’s a game. I just think that when you’re talking HISTORY (which I was), you should use the company names that applied at the time.

    In addition, this story’s a little bit different than the examples you gave. It’s not just that Legato didn’t invent NDMP. Legato was one of the companies that was pushing AGAINST NDMP. Don’t you think it’s a little disingenuous for them to say they invented it when they were actually trying to kill it while the company they would eventually acquire WAS inventing it?

    Finally, the most I would say would be something like "PDC, which would later become Intelliguard, and then get acquired by Legato, and then get acquired by EMC…" But that data point was not germane to the story. What mattered was that competitors sat around a table and solved a common problem — and that needs to happen again.

    Oh, and going back to your original comment. EMC coming out with it’s OWN OST-like product isn’t what I’m looking for. What’s next, IBM’s version, then CommVault’s version, then, then, then? What about these poor IDT vendors that have to program and certify all these APIs? All of you have the same problem and you should all agree on a standard to fix it — NOT each of you come out with your own solution.

  • It’s nowhere close of becoming a commodity. Don’t we have to have everybody even shipping it before that happens? And given how hard it was for the vendor community to ship dedupe, I’m not holding my breath for open source versions any time soon.

  • Whether these guys sit on the table or not, de dupe (at least target based) will surely become a commodity very soon.

    I won’t be surprised if the open source
    file-systems release a version with this built-in.

  • Curtis, backupPC could be considered as a small step towards it ?

    The target based dedup (specially fixed block) is an open technology now.

  • It’s only file-level dedupe. They do an MD5 hash of the first few blocks of the file. If it matches, the diff the two files. If those match, then they replace them with a link. Heck, I can do that with a shell script.

    LONG way from that to what we call dedupe.

  • in this day and age i think you’d have to be a fool to buy into a technology which is not truely open (ie open source) or based on a recognised standard.
    it’s scandalous when a products get released with a ‘open’ in their name but are anything but open at all.

  • I can’t think of anything interesting that’s happened technologically in my space (backup, that is) that didn’t first start out with someone doing it in a closed way first. That’s how standards get started. Someone does it, another person does it, then we realize we need a standard cause everybody does it. If I only bought open source stuff, or only bought things once standards had been decided, I can think of many things that I would have had to wait YEARS to get.

    In this case, we have this problem now. Is someone a fool for using a non-open way to solve the problem, just because it hasn’t been standardized yet? I don’t think so, as long as they realize they may have to change how they do it once a standard way comes along.