It's about time — sort of — Part I: Data Domain's GDA

Monday EMC announced the Data Domain Global Deduplication Appliance.  I’ve had a lot to say about Data Domain’s lack of global dedupe over the years, so I’ve got to post about this.

I’ve posted a lot about the need for global deduplication here, here, and here. I believe it’s an essential feature for environments both large and small, and I’ve eagerly awaited Data Domain’s global deduplication feature for years.  If I recall correctly, it’s been promised for at least three years, maybe more.

What EMC has announced (but not released) is what I would call a good step in the right direction.  It’s a good step because Data Domain can finally provide dedupe across more than one appliance.  I want you to hear this first, because I’m about to explain why they’ve still got a long way to go.

The reason why it’s only a step in the right direction is that it’s designed to meet a very small percentage of Data Domain’s customer base.
1. It’s only for two nodes, not multiple nodes
2. It’s only for the DD 880, the top of the DD’s liine
3. It’s only for NetBackup
4. It’s only for NetBackup users who have paid for OST

Only the very largest of EMC’s customers are going to need an 880 to start with, and only the very largest are going to need two of them.  Remember that 99.7% of all businesses are small businesses.  It supports NetBackup and Backup Exec, but I’d be extremely surprised if there is a single Backup Exec customer who is backing up a datacenter bigger than 140 TB, which is when you’d need a second 880.  (If that’s you, I’d love to hear from you.)  So essentially this product is aimed at only the largest NetBackup customers, and only those who have sprung for an OST license.

What I’ve tried to explain in previous posts is that global dedupe is not just for the largest customers.  Global dedupe (AKA Multi-node dedupe) should be a standard feature on any multi-node target dedupe system.  Data Domain’s Brian Biles seems to think that if you buy a smaller Data Domain box and you need more than it can supply, all you need to do is swap the head out with a bigger head.  I think completely different.  I think swapping a head out is a waste of money.  People pay 10s of thousands of dollars (or more than $100K) just for the head.  Throwing that head away and buying a new one may sound great to the company selling you the head, but to me it seems like wasted money.

So hopefully one day Data Domain will support all backup products with their global dedupe offering, and will support the feature on any node.  They tell me that the code would support it, and it’s a matter of testing and support, and that they’ll do what the market tells them to do.  Well, market, if you agree with me, tell them.

Written by W. Curtis Preston (@wcpreston), four-time O'Reilly author, and host of The Backup Wrap-up podcast. I am now the Technology Evangelist at Sullivan Strickler, which helps companies manage their legacy data

7 comments
  • Curtis –

    In your opinion, what vendor do you feel does an adequate job with a global dedupe offering?

  • I agree, cause if they don’t I’ll be looking at other solutions once my existing “small” Data Domain is full. With the way dedupe is changing (meaning availability and usage in the industry) Data Domain is going to have to push really hard to maintain any perception of value for the premium dollars they demand.

  • [quote name=Natalie Voican]In your opinion, what vendor do you feel does an adequate job with a global dedupe offering?[/quote]
    Since Curtis is still a consultant, $1 says his reply will be either, “That is a billable question,” or “Sign up for TruthInIT.” =D

  • What the companies advertise is this:

    * Data Domain supports two nodes
    * Exagrid supports 10 nodes
    * FalconStor supports four nodes
    * IBM supports two nodes
    * NEC supports 55+ nodes
    * Sepaton supports six nodes

    Quantum is the only target dedupe vendor left to have no global dedupe.

    I can also tell you that these solutions are not created equal (and I’m not just talking about their numbers). There are vast differences in their offerings and the level of satisfaction that their customers have. Talking about THAT is a billable discussion ๐Ÿ˜‰ (Someone owes Brinton $1.) The most affordable way to go down the “billable” path is to look at truthinit.com. It’s a very unique offering.

  • I believe Symantec’s PureDisk as well has Global Dedupe… Up to 16 nodes per Storage pool.

  • But you’re right. Symantec PureDisk and CommVault both have global target dedupe for their products. (Before someone jumps in, Avamar has global dedupe, but it’s a source dedupe product.)

  • I swear I was told that PureDisk does both Source and Target global dedupe. The client-side hash catalog is sent to the Puredisk pool, and PureDisk replies back with “only send these bits of data back, I’ve got the rest.” Or at least that was the high level that was fed to me.