My thoughts on EMC's announcements

Last week I received what appeared to be an LP in a Fedex package.  (If you don’t know what an LP is, I really don’t know what to say about that.)  But it was a cardboard facsimile from EMC that was broken into pieces to tell me that they would be breaking some records today.

You’d have to have been under a news/social networking rock to not have know they were announcing some big things today.  The announcements fell into three categories: VNX, Symmettrix, and Data Domain.  VNX is a unified storage platform, Symmetrix is a smart storage array, and Data Domain is a fast dedupe disk target.  I’ll leave the storage platforms to others to cover and focus just on the Data Domain announcements.

Quick Summary: Did they have some awesome stuff to announce?  Absolutely. Did they break records? Yes.  But not as many as their announcements would suggest.  And I have some concerns about at least one of the announcements.

The announcements

There are several of them with the Data Domain line.  I put them into three categories.

  • Faster, bigger Data Domain controllers
    • They introduced the DD 890, that ingest and dedupes backups inline at 3944 MB/s (14.2 TB/hr) and has 384 TB of raw capacity. (Scott Waterhouse’s blog said that translates into 285 TB of usable capacity.)
  • Faster, bigger GDA w/VTL support, and support for all backup apps
    • The GDA is now built using two DD 890s, which gives it 768 TB (384×2) of raw capacity.  The throughput isn’t quite doubled at 7916 MB/s (26.3 TB/hr).
    • The GDA now supports VTL, which means it will work with any supported backup applications.  (The original GDA only worked with Symantec NBU/BE & OST.  They added support for NetWorker in October.  This adds support for everything else.)
  • Introduction of the Data Domain Archiver
    • This is a tiered system with a DD 860 head with multiple independent disk arrays for long-term retention.  They’re independent in a lot of ways that make them very different than just additional shelves in a typical DD system. For example, each shelf is a self-contained unit that provide fault isolation.  If one shelf fails, only the data in that shelf is lost.

Impressive Stuff

The DD 890 and the GDA are truly impressive systems.  Data Domain has always had very strong appliances, and with this announcement they do indeed have the fastest single controller dedupe systems available today.

The fact that the GDA now supports all backup products via a VTL interface is also very nice.  I’ve been pushing for GDA support for more backup apps and its finally here.  Large customers wishing to have a very fast (26.3 TB/hr) system can have it via EMC.

The claims

I always look askance at claims like “fastest,” “biggest,” or “first.”  My experience is they are rarely true unless you qualify them really, really well.  So let’s take a look at some of these “record breaking” claims and see if they match reality.  The following is a list of some of the claims that I read in these press releases:

New Record Breaking EMC Data Domain Backup Systems 7 Times Faster Than Competition

EMC Attacks Tape’s Last Major Hideout With First Storage System For Backup And Archive

“With throughput of up to 14.7 and 9.8 TB/hr respectively, the new DD890 and DD860 systems are the fastest single controller deduplication storage systems available in their respective classes.”

Definitely.  Frankly, no one else is even close.

“[The] faster throughput of up to 26.3 TB/hr (7,300 MB/s) gives the new GDA an ingest rate that is more than 7 times faster than its dual-controller competitor.”

What the quote says is true, but it’s hoping you make an assumption that isn’t true.  They’re more than 7 times faster than IBM’s dual-node ProtecTier system.  That’s not quite the same as being seven times faster than the closest competitor.  They’re only twice as fast as the closest competitor (SEPATON at 4440 MB/s and six controllers.) They’re four times faster than Exagrid and Falconstor (if you only count their dedupe rates) and five times faster than Quantum.

As to “record breaking,” that’s only possible if you ignore NEC.  NEC gets the credit for being the world’s fastest target dedupe system, and they’re inline at that.  At 27,500 MB/s, NEC can even say that they’re four times faster than their closest competitor, EMC.   EMC doesn’t consider NEC a real competitor because they really don’t run into them out there  They consider NEC a “science project” that very few people actually buy.  But NEC would definitely win a throughput race if they were allowed to enter it.

“Today [EMC] announced the EMC Data Domain® Archiver, the industry’s first long term retention system for backup and archive.”

Huh?  Isn’t tape a unified long term retention system for backup and archive?

“cost effective retention of backup and archive data for seven or more years.”

I’ll believe that when I see it.  So far, disk (even when deduped) doesn’t come close to the cost of tape — especially when you consider the cost of power and cooling in long term retention requirements.  It’s really hard to beat the cost of a tape on a shelf.

Well, they broke one record.

The Data Domain Archiver

EMC tells me that people are asking them to store backups longer and longer on their Data Domain systems, and the Archiver is designed to do just that.

They’re saying:

  • The long term disk on the Archiver will be cheaper than regular DD disk
  • They isolate older backups so that one disk array dieing doesn’t take out the whole thing (current DD systems share deduped blocks between all arrays behind a given DD head)
  • It’s got strong backup performance since it’s based on a DD 860.
  • “Operational backup and recovery typically involves data retention periods of weeks or months, whereas long term retention of data is measured in multiple months or years.”
  • “Today’s most common method of extended data retention is to keep tapes made for backups longer.”
  • “DD Archiver supports today’s most common data archiving method, which is long term retention of backups”
  • “DD Archiver can also be leveraged with popular archiving solutions such as EMC SourceOne and EMC File Management Appliance”

Do I think that having multiple disk arrays that have fault isolation is good?  Absolutely.  Do I like that they’re touting this as a place to store backups for several years?  No.

Do you want to store your backups longer on your Data Domain system?  Then the DD Archiver is for you.  Is it a good idea to store your backups for “multiple months or years?”  No.  Absolutely not.

Is “long term retention of backups…today’s most common data archiving method? No it is not.  Do people use old backups as archives?  Yes.  But that doesn’t make that an archiving method.  There is a very broad line between backups and archives.  Old backups are old backups.  They are not archives. Just because you keep your backups for five or seven years does not make them archives.  It makes them really old backups.

Old backups make lousy archives. (That is, unless you’re talking backup software that supports archiving functionality, like CommVault Simpana.)  Try retrieving all of Fred’s emails that contain six different words over the last seven years — from backups. You’ll spend hundreds of thousands of dollars in consulting dollars and it will take forever — no matter what you store those backups on.  It is the retrieval process that takes forever, not the loading of tapes.  IMO, putting it on disk isn’t going to make it much faster.  OR you can buy archive software and do it in five minutes.  If you want archives, use archiving software — not backup software.

I’m glad to hear that it also works with archiving solutions, but I refuse to refer to old backups as archives — and that’s all I’m going to say about that.

Summary

All in all, there were some pretty impressive announcements from EMC today. They did make a few claims that are really just marketing speak and not really “true.”  And I’m not a very big fan of putting multiple years of backups on the Data Domain Archiver.  That’s my story and I’m sticking to it.


Written by W. Curtis Preston (@wcpreston), four-time O'Reilly author, and host of The Backup Wrap-up podcast. I am now the Technology Evangelist at Sullivan Strickler, which helps companies manage their legacy data

1 comment
  • Really old backups aren’t archives? I supposed next you’re going to tell me that auto-tiered primary storage isn’t ILM? 😉

    Good post, Curtis. Looking forward to meeting you at Tech Field Day in a few weeks.
    -Sean Clark, @vSeanClark