Performance Comparison of Deduped Disk Vendors

[This article was slightly updated May 7, 2009.  New comments are in brackets.]

This blog entry is, to my knowledge, the first article or blog entry to compare the performance numbers of various [target] dedupe vendors side by side.  I decided to do this comparison while writing my response to Scott Waterhouse’s post about how wonderful the 3DL 4000 is, but then I realized that this part was enough that it should be in a separate post.  Click Read More to see a table that compares backup and dedupe performance of the various dedupe products.











First, let’s talk about the whole “global dedupe” thing, because it’s really germane to the topic at hand.  Global dedupe only comes into play with multi-node systems.  A quick definition of global dedupe is when a dedupe system will dedupe everything against everything, regardless of which head/node it arrives at.  So if you have a four-node appliance, and the same file gets back up to node A and node B, the file will only get stored once.  Without global dedupe (otherwise known as local dedupe), the file would get stored twice.

Let’s talk about Data Domain, as they currently own the [target] dedupe market hands down.  But they have local dedupe, not global dedupe.  (This is not to be confused with Data Domain’s term global compression, which is what they called dedupe before there was a term for it.)  When you hit the performance limit of a single Data Domain box, their answer is to buy another box, and two DD boxes sitting next to each other have no knowledge of each other; they do not dedupe data together; they don’t share storage; you cannot load balance backups across them, or you will store each backup twice.  You send Exchange backups to the first box and Oracle backups to the second box.  If your Oracle backups outgrow the second box, you’ll need to move some of them to a third box.  It is the Data Domain way.  They are telling me they’ll have global dedupe in 2010, but they don’t have it yet.

What Data Domain is doing, however, is shipping the DDX “array,” which is nothing but markitecture.  It is 16 DDX controllers in the same rack.  They refer to this as an “array” or “an appliance” which can do 42 TB/hr, but it is neither an array nor an appliance.  It is 16 separate appliances stacked on top of each other.  It’s only an array in the general sense, as in “look at this array of pretty flowers.”  I have harped on this “array” since the day it came out and will continue to do so until Data Domain comes out with a version of their OS that supports global deduplication.  Therefore, I do not include this “array’s” performance in the table at the end of this blog article.

When talking about the DDX “array,” a friend of mine likes to say, “Why stop at 16?  If you’re going to stack a bunch of boxes together and call them an appliance, why not stack a 100 of them?  Then you could say you have an appliance that does 50,000 MB/s!  It would be just as much of an appliance as the DDX is.”  I have to agree.

In contrast, Diligent, Exagrid, Falconstor, and SEPATON all have multi-node/global deduplication.  Diligent supports two nodes, Falconstor four, SEPATON five, and Exagrid six.  So when Diligent says they have “a deduplication appliance” that dedupes 900 MB/s with two nodes, or SEPATON says their VTL can dedupe 1500 MB/s with five nodes, or Falconstor says they can dedupe 1600 MB/s with four nodes, or Exagrid says they can do 450 MB/s with six nodes, I agree with those statements – because all data is compared to all data regardless of which node/head it was sent to.  (I’m not saying I’ve verified their numbers; I’m just saying that I agree that they can add the performance of their boxes together like that if they have global dedupe.)

By the way, despite what you may have heard, I’m not pushing global dedupe because I want everything compared to everything, such as getting Oracle compared with Exchange.  I just want Exchange always compared to Exchange, and Oracle to Oracle – regardless of which head/node it went to.  I want you to be able to treat deduped storage the same way you treat non-deduped storage or tape; just send everything over there and let it figure it out.

NetApp, Quantum, EMC & Dell’s [target dedupe products] have only local dedupe.  [Both EMC & Symantec have global dedupe in their source dedupe products.)  That is, each engine will only know about data sent to that engine; if you back up the same database or filesystem to two different engines, it will store the data twice.  (Systems with global dedupe would store the data only once.)  I therefore do not refer to two dedupe engines from any of these companies as “an appliance.” I don’t care if they’re in the same rack or managed via a single interface, they’re two different boxes as far as dedupe is concerned.

Backup and Dedupe Speed

No attempt was made to verify any of these numbers.  If a vendor is flat out lying or if their product simply doesn’t work, this post is not going to talk about that.  (If I believed the FUD I heard, I’d think that none of them worked.)  I just wanted to put into one place all the numbers from all the vendors of what they say they can do.

For the most part, I used numbers that were published on the company’s website.  In the case of EMC, I used an employee (although unofficial) blog.  Then I applied some math to standardize the numbers.  In a few cases, I have also used numbers supplied to me via an RFI that I sent to vendors.  If the vendor had global/multi-node/clustered dedupe, then I gave the throughput number for their maximum supported configuration.  But if they don’t have global dedupe, then I give the number for one head only, regardless of how many heads they may put in a box and call it “an appliance.”

For EMC, I used the comparison numbers found on this web page. EMC declined to answer the performance questions of my RFI, and they haven’t officially published dedupe speeds, so I had to use the performance numbers published this blog entry on Scott Waterhouse’s blog for dedupe speed.  He says that each dedupe engine can dedupe at 1.5 TB/hr.  The 4106 is one Falconstor-based engine on the front and one Quantum-based dedupe engine on the back.  The 4206 and the 4406 have two of each, but each Falconstor-based VTL engine and each Quantum-based dedupe engine is its own entity and they do not share dedupe knowledge.  I therefore divided the numbers for the 4206 and the 4406 in half.  The 4406’s 2200 MB/s divided by two is the same as the 4106 at 1100 MB/s.  (The 4206, by that math, is slower.)  And 1.5 TB/hr of dedupe speed translates into 400 MB/s.

Data Domain publishes their performance numbers in this table.  Being an inline appliance, their ingest rate is the same as their dedupe rate.  They publish 2.7 TB/hr, or 750 MB/s for their DD690, but say that this requires OST.  It’s still the fastest number they publish, so that’s what I put here.  I would have preferred to use a non-OST number, but this is what I have.

Exagrid’s numbers were taken from this web page, where they specify that their fastest box can ingest at 230 MB/s (830 GB/Hr).  They have told me that their dedupe rate per box is 75 MB/s.  They support global dedupe for up to 6 nodes, so these numbers are multiplied times 6 in the table.

For Falconstor, I originally used this data sheet where they state that each node can back up data at 1500 MB/s and that they support 8 nodes in a deduped cluster.  (However, I subsequently found out that, despite what that data sheet says, they do not yet fully support an 8-node cluster. They have only certified a 4-node cluster, so I have updated the numbers according.) They have not published dedupe speed numbers, but they did respond to my RFI.  They said that each node could dedupe at 500 MB/s.

IBM/Diligent says here that they can do 450 MB/s per node, and they support a two-node cluster.  They are also an inline box, so their ingest and dedupe rates will be the same.  One important thing to note is that IBM/Diligent requires FC or XIV disks to get these numbers.  They do not publish SATA-based numbers.  That makes me wonder about all these XIV-based configs that people are looking at and what performance they’re likely to get.

NetApp has this data sheet that says that they do 4.3 TB/hr with their 1400.  However, this is like the EMC 4400 where it’s two nodes that don’t talk to each other from a dedupe perspective, so I divide that number in half to make to make 2150 GB/hr, or just under 600 MB/s.  They do not publish their dedupe speeds, but I have asked for a meeting where we can talk about them.

Quantum publishes this data sheet that says they can do 3.2 TB/hr in fully deferred mode and 1.8 TB/hr in adaptive mode.  (Deferred mode is where you delay dedupe until all backups are done, and adaptive dedupe runs while backups are coming in.)  I used the 3.2 TB/hr for the ingest speed and the 1.8 TB/hr for the dedupe speed, which translates into 880 and 500 MB/s, respectively.

Finally, with SEPATON, I used this data sheet where they say that each node has a minimum speed of 600 MB/s, and this data sheet where they say that each dedupe node can do 25 TB/day, or 1.1 TB/hr, or 300 MB/s.  Since they support up to 5 nodes in the same dedupe domain, I multiplied that times 5 to get 3000 MB/s of ingest and 1500 MB/s of dedupe speed.

Backup & dedupe rates for an 8-hour backup window

Vendor Ingest Rate (MB/s) Dedupe Rate (MB/s) Caveats
EMC 1100 400 2 node data cut in half (no global dedupe)
Data Domain 750 750 Max performance with OST only, NFS/CIFS/VTL performance appx 25% less
Exagrid 1388 450 6 node cluster
Falconstor/Sun 6000 2000 8 node cluster, requires FC disk
IBM/Diligent 900 900 2 node cluster, requires FC or XIV disk
NetApp 600 Not avail. 2 node data cut in half (no global dedupe)
Quantum/Dell 880 500 Ingest rate assumes fully deferred mode (would be 500 otherwise)
SEPATON/HP 3000 1500 5 nodes with global dedupe


However, many customers that I’ve worked with are backing up more than 8 hours a day; they are often backing up 12 hours a day.  If you’re backing up 12 hours a day, and you plan to dedupe everything, then the numbers above change.  (This is because some vendors have a dedupe rate that is less than half their ingest rate, and they would need 24 hours to dedupe 12 hours of data.)  If that’s the case, what’s the maximum throughput each box could take for 12 hours and still finish it’s dedupe within 24 hours? (I’m ignoring maintenance windows for now.)  This means that the ingest rate can’t be any faster than twice that of the dedupe rate, if the dedupe is allowed to run while backups are coming in. 

This meant I had to change the Quantum number because the original number assumed that I was deferring dedupe until after the backup was done.  If I did that, I would only have 12 hours to dedupe my 12 hour backup. Therefore, I switched to its adaptive mode, where the dedupe is happening while the backup is coming in.

Backup & dedupe rates for a 12-hour backup window

Vendor Ingest Rate (MB/s) Dedupe Rate (MB/s) Caveats
EMC 800 400 2 node data cut in half (no global dedupe)
Data Domain 750 750 Max performance with OST only, NFS/CIFS/VTL performance appx 25% less
Exagrid  900  450  6 node cluster
Falconstor/Sun 4000 2000 8 node cluster, requires FC disk
IBM/Diligent 900 900 2 node cluster, requires FC or XIV disk
NetApp 600 Not avail. 2 node data cut in half (no global dedupe)
Quantum/Dell 500 500 Had to switch to adaptive mode
SEPATON/HP 3000 1500 5 nodes with global dedupe


Dedupe everything?

Some vendors will probably want to point out that my numbers for the 12-hour window only apply if you are deduping everything, and not everybody wants to do that.  Not everything dedupes well enough to bother deduping it.  I agree, and so I like dedupe systems like that support policy-based dedupe.  (So far, only post-process vendors allow this, BTW.) Most of these systems support doing this only only at the tape level.  For example, you can say to dedupe only the backups that go to these tapes, but not the backups that go to those tapes.  The best that I’ve seen in this regard is SEPATON, where they automatically detect the data type.  You can tell a SEPATON box to dedupe Exchange, but not Oracle.  But I don’t want to do tables that say “what if you were only deduping 75%, or 50%,” etc.  For comparison-sake, we’ll just say we’re deduping everything.  If you’re deduping less than that, do your own table. 😉

Restore Speed

When data is restored, it must be re-hydrated, re-duped, or whatever you want to call it.  Most vendors claim that restore performance is roughly equivalent to backup performance, or maybe 10-20% less. 

One vendor that’s different, if you press them on it, is Quantum, and by association, EMC and Dell.  They store data in its deduped format in what they call the block store.  They also store the data in its original un-deduped, or native format, in what they call the cache.  If restores are coming from the cache, their speed is roughly equivalent to that of the backup.  However, if you are restoring from the block pool, things can change significantly.  I’m being told from multiple sources that performance can drop by as much as 75%.  They made this better in the 1.1 release of their code (improving it to 75%), and will make it better again in a month, and supposedly much better in the summer.  We shall see what we shall see.  Right now, I see this as a major limitation of this product.

Their response is simply to keep things in the cache if you care about restore speed, and that you tend to restore more recent data anyway. Yes, but just because I’m restoring the filesystem or application to the way it looked yesterday doesn’t meaning I’m only restoring from backups I made yesterday.  I’m restoring from the full backup from a week ago, and the incrementals since then.  If I only have room for one day of cache, only the incremental would be in there.  Therefore, if you don’t want to experience this problem, I would say that you need at least a week of cache if you’re using weekly full backups.  But having a week of cache costs a lot of money, so I’m back to it being a major limitation.

Summary

Well, there you go!  The first table that I’ve seen that summarizes the performance of all of these products side-by-side.  I know I left off a few, and I’ll add them as I get numbers, but I wanted to get this out as soon as I could.





Written by W. Curtis Preston (@wcpreston), four-time O'Reilly author, and host of The Backup Wrap-up podcast. I am now the Technology Evangelist at Sullivan Strickler, which helps companies manage their legacy data

35 comments
  • Honestly, I am not that impressed with the performance analysis. You cite numbers for DD that pertain in, what, 1% of the use cases? People with both OST and 10 GigE. Without that, maximum performance is not 25% less, it is 90% less: 400 MB/s per DD690. EMC’s top end box, the DL4406 3D, does 2,200 MB/s ingest and 800 MB/s dedup. If you are willing to consider clustered speeds for others, why not EMC? I think you excuse this by saying it is not a global dedup pool?

    Depending on your perspective, neither is Diligent nor Sepaton (I don’t think). Correct me if I am wrong, but because these two dedup at an object and backup client level, if you have the same object, with a different name on a different backup client, that will not be recognized as common (and therefore deduplicated) be either system. To understand that the “issue” you identify is really not, we just need to realize that more than 90% of deduplication comes from deduplicated data from the same client over repeated full backups. The EMC system will always have the data from a given backup client going to the same dedup node–because a given client will always use the same pool of virtual tape resources. So which approach is “better”? One that can’t recognize the same object from a different location–even if it shares the same pool of virtual tape for a target–or one that can? Seems to be the difference is likely to be both trivial, and non-decisive in the sense that this issue is so minor that there are a dozen other things that will contribute to a customer decision before this.

    There are a lot of games being played with that performance table, unfortunately.

  • I didn’t expect you to like my performance analysis, especially since it puts EMC in last place. 😉

    How exactly is 400 MB/s 90% less than 750 MB/s? I believe 90% less would be 75 MB/s. Also, I believe they just had a code refresh that pushed that 400 MB/s up to 500+.

    As to OST only applying to 1% of the world, since when is NetBackup 1% of the world? I’ll agree that very few people have actually deployed OST, but don’t you think that will change once they realize it increases the performance of their DD systems and/or increases the functionality of any disk system that supports it?

    For each vendor, I used the best published numbers I had, and gave the caveats that were required. I said that DD needed OST to get that number, and FS and IBM needed FC disk to get their numbers. I think everyone realizes not to trust vendor published numbers until you test them and see them with your own eyes, but you can start with comparing published numbers.

    Then you go on to dismiss global dedupe as if it is not important. If it has no effect, why are Quantum & Data Domain working so hard to add it? Why is Diligent working very hard to go from 2 nodes to more than 2? Let me reiterate why global dedupe allows a vendor that supports it to add their nodes together, and local dedupe requires vendors that support it to only use the numbers from one node.

    In a local dedupe world (DD/Quantum/EMC/Dell/Exagrid/NetApp), each backup must be pointed to _a_ head/engine/node, such as one engine in your two-node appliance. You cannot load balance across multiple nodes/heads/engines, because if you do, you’ll end up messing up your dedupe ratio. Since each node/head/engine is a dedupe island unto itself, I do not believe it is valid to add multiple nodes together.

    In a global dedupe world (IBM/Falconstor/SEPATON), you can send backups to any head/node/engine you want to, and it will always be compared against all other backups. You can treat all the nodes in the "cluster," although I don’t like that word for this application, as one big node and send backups to wherever you want with no ill effects. Since you can treat it as one big appliance, I believe it’s valid to add the throughput of those nodes/heads/engines together.

    If you want to debate this belief of mine (and pretty much every other independent person in this space I know), please debate this belief. Don’t use what I like to call a "yeah, but" tactics, as in "Yeah, we don’t have global dedupe, but they have this other problem." Explain to my why I should allow you to add your nodes’ numbers together as if they were one appliance, or why I shouldn’t add their nodes’ numbers together.

    As to the "yeah, but" issue you brought up, neither of the vendors you suggest have this problem have it. Both SEPATON and FalconStor will identify redundant files (or even parts of files) that happen to reside in multiple filesystems. In the case of SEPATON, they "untar" the backup and are looking at individual files. This allows them to do delta differential comparisons on the files they know are the same (e.g. repeated backups of the same Exchange database) and additional checks to look for the scenario you describe. In the case of Falconstor, they chunk up files and streams much the same way Quantum/EMC does, so they will also find redundant blocks that happen to reside in multiple places. I do believe that IBM/Diligent has the problem you describe, as they do a delta differential that compares like backups to each other, and do no other comparison. However, the increased level of granularity that delta differentials brings tends to make this whole discussion a wash, because they get more dedupe on the things they DO compare, so it counters the things you compare that they don’t.

  • Hi Curtis, I am curious where the number 5 nodes with Sepaton? I thought they could scale to 16 nodes?

    Thanks
    Lowell

  • Curtis,

    Great summary; thanks for getting this all in one place. This is a valuable resource.

    There are two comments I’d like to make. First, regarding Disraeli’s "lies, damn lies, and statistics": It looks like all the numbers you have are vendor-published. Do you have any independent figures you could add as additional columns? Deduplication is particularly prone to the "best case scenario" problem — many of these vendors can dedupe endless streams of zeroes (or any other repeating pattern) much more quickly than random data!

    Second, I’d like to stress that dedupe is for more than just backup! I understand that the name of the site is Backup Central, and you are the Backup Guru, but this I think this is still an important point to make. Dedupe is a great way of cutting costs in VTL and D2D backup, but also integrating deduplication much earlier in the data lifecycle can cut costs significantly.

    Right now data is written to tier 1 primary storage at a cost of $30 to $50/GB, and then backed up at an aggregate total cost of several dollars more. Much of this data can be moved much earlier to an archive primary storage tier at $3/GB or less, and an effective cost even lower with deduplication. A perhaps sacrilegious statement here: replication can reduce or eliminate the need for backup of this tier entirely. When you’re talking about petabytes of data, you can’t always afford to be down for the restore period. With economic pressures, any business would be remiss not to look at deploying an effective archive tier.

    At Permabit we’ve developed our Enterprise Archive product specifically to serve these needs, and believe we have developed the only truly scalable deduplication solution for archive data, while also providing levels of data protection far beyond what is available with RAID. I talk a little more about the underlying economics over at my blog in the article at http://blog.permabit.com/?p=77

    Regards,
    Jered Floyd
    CTO, Permabit

  • Apologies for the strange line breaks — I can’t seem to figure out how to prevent them. Curtis?

  • The problem is annoying and it happens to me too. I’m hoping it goes away with a future software update.

  • I am aware of no independent lab-based comparisons of these products. I’m not even aware of any published non-independent comparisons of these products. No one is more frustrated about that than me. I get all kinds of stories about many of these vendors, but they come so haphazardly that it’s hard to tell truth from fiction. When I hear BAD stories about a given product, it’s also hard to tell which ones are due to the product itself or how the product was being used. I think that only independent testing would work all that out.

    FWIW, the lack of independent testing of IT products has been at the forefront of my mind and is the basis of the new company I’m trying to start right now.

    As to the other part of your post…

    I agree that a lot of stuff that shouldn’t be backed up is backed up. I agree that if we could simply get it out of the primary storage we could save a lot of dough. So you’ve got no argument with me there.

  • Curtis,

    You put a lot of trust in the numbers supplied to you by each vendor and we have all seen how some vendors greatly exaggerate, like Sepatons claims of 500:1 deduplication ratio.

    You also might want to recheck the numbers on your winner FalconStor. You said “I used this data sheet where they . . . support 8 nodes” AND “They said that each node could do 400 MB/s” for a total of 3200MB/sec performance.

    However, the FalconStor data sheet you reference clearly states “Deduplication architecture supports N+1 cluster of up to 4 nodes”

    So, right there the performance of the Falconstor solution, which you declared the winner, is only half of what you claim.

    You know what would be really valuable? A true apple-to-apple comparison based on actual test results. For example, given 100TBs of backup data (with lots of duplicates) how long does it take to completely ingest and deduplicate that data? What was the total amount of disk required to process that data? What was the final deduplication ratio? Make vendors do this test with a single node system and then let them use their largest multi-node cluster.

    As Americs most well known backup expert, you are just the guy to get such a comparison done. I know that IBM would be happy to participate in such a comparison because we are confident our ProtecTIER (formerly Diligent) solution is the fastest on the market today.

    Victor

  • I’d be more interested in how fast they could restore the 100TB of data to the multiple appropriate locations. Backup time is nice to know, but restore time is when the user is waiting…and breathing down your neck.

    Curtis – nice post and very nice blog.

  • Backup is one thing. Restores are everything.

    That’s why I included that bit at the end about restores.

    Thanks for the compliment.

  • Some of the increased ingest values are skewed by the difference between what is inline versus what is out of band.

    I am just taking a few seconds here to drop an input.

    Any solutions associated with any of these allow some type of cascading site A to B&C or A –> B –> C. Looking for something that can handle the ability to travel large distances and keep the data synce and not duplicated?

  • I’m not sure if I would use the word “skewed.” It’s just that they can ingest faster than they can dedupe. It’s fine to list the ingest number as long as you talk about how long the dedupe process takes.

    Many of the systems on this list support cascaded replication. Some also support many to one replication.

  • EMC date de-dupe is what solution exactly? Restore Point? or something else. My misguided understanding is that Recover Point does not really de-dupe the data. Also Recover point is not inline at alll and its removal will not shutdown other read write functionality.
    On the other side FalconStor is an line solution and if removed would not only prevent loss of replication but all other traffic would also be an issue. I am still researching this so basically you are helping in this research.
    What specifically is a solution other than RepliStor that can do a cascading of data replication or data deduplication?

  • RecoverPoint is CDP and is not dedupe. Avamar is dedupe.

    I’m not sure what you mean by “inline,” as it definitely has nothing to do with “inline dedupe.” First, RecoverPoint is CDP, not dedupe, and is a client-side product. Inline/post-process terminology only applies to target dedupe solutions. Second, Falconstor is post-process in the target dedupe sense of the word.

    Again, most of the VTLs and IDTs (intelligent disk targets) support cascaded replication. But in your question, you’re mixing up target dedupe with source CDP, and I’m really not sure how to answer your question.

  • Curtis, I appreciate you publishing this report on performance. It was a nice first effort to help the market visualize and understand data deduplication performance. Granted, it is a bit like herding cats and even more challenging when you can’t include other factors like the underlying cost of the systems. As you mentioned, some of these systems require expensive high end processor and disk technology to achieve their results. In addition, when making performance comparisons, one has to be careful in comparing vendor claims to real world results – while I wouldn’t want to accuse anybody in the marketing community of hyperbole, what’s achievable in the lab is frequently less achievable (or even less valid) in the real world.

    Performance selling has been a long held tradition in the storage (and particularly backup) industry. Remember the benchmarking ‘wars’ of the mid 1990s? (Now I’ve dated myself) The press releases were always the same – benchmarking of a huge parallel environment of high end servers and tape libraries indicating vendor proof of a claim of at least a TB/hour, usually of an Oracle backup. Did any customer ever install any of these multi-million dollar configurations? I don’t think so. And so, while performance is clearly a key consideration for any backup system (including dedupe solutions), we need to be careful to not compare worst-case numbers in corner-case situations with best-case datasheet figures from another vendor, particularly figures from solutions that haven’t been vetted yet in real customer environments. Are there any plans to do a price-performance comparison in dedup? That might be interesting?

    I can bet that this will not be the last time performance comes up as an issue this year; as you observed, my own employer appears to have some plans in this area. And particularly with the introduction this year of major new hardware technology, I expect all the duduplication vendors will be making improvements.

    Performance is also very use case centric – while it is the case that it’s not practical to load an entire week’s worth of potential restores into the disk cache, for most customers last night’s full backup of their most critical database (and their email system) is likely to be all ready for recovery. And while files represent the vast majority of total backup capacity, the size of the average restored file is still quite small. And unlike in the bad old days, with new backup technology, if you want to recover a file, you don’t recover more than just the last full write of the file and its incrementals to achieve and complete restore.

    Finally, I observe that many of the different deduplication solutions offer choices that provide features and benefits that have a greater impact to customers than just performance. My friend, Jered, I think hit it on the head. It’s not just about backup, but increasing operational efficiencies within the data center that benefit the customer situation with the goal of reducing complexity and cost. Customers are looking for companies that offer differentiators such as single pane of glass management, as well as complete service offerings that include sizing, installation and implementation services. Many vendors also have different policies for long term retention vs. short term backups. Policy-based managment optimizing the consolidation and management of duplicated data, non-deduplicated data and tape creation for long-term retention often trumps raw performance. Customers still have a big investment in tape, and features like our path to tape offer real value to customers.

    In architecting a system like a deduplication appliance, vendors are clearly trying to address a series of customer operational and business pain points; in Quantum’s case, we purposely architected a solution with policy-based flexibility. As a result, there are always trade-offs to be made. We did it knowingly, are also continuously improving, and as a result of our architecture have many more customer beneficial capabilities on the horizon.

    Our focus and perspective is different. We take performance seriously, but we wouldn’t sacrifice the real-world benefits of a policy-based approach simply for raw speed. So far our customers seem to agree.

  • Curtis, thank you for your insight, I am a longtime reader, first time blogger. One point I would like to state, I don’t believe it has been pointed out yet, is your statement on performance of the IBM/Diligent ProtecTIER VTL. While, I do agree that your numbers are fairly accurate at 900MB/s in a 2 Node Cluster, I would like to point out that this does NOT require FC disk. While it is true that ProtecTIER is disk intensive, versus CPU intensive of hashing algorithms, 900MB/s can achieved with a SATA repository. These numbers have become possible with the HW improvements of the IBM TS7650G gateway, and the release of the 2.1 code. One solution option TS7650G Gateway with a XIV disk array repository, which is an all SATA solution, and supports 900MB/s throughput.

  • Logan,

    I assume you’re IBM and this information is authorative? It’s certainly different than what I’ve heard in the past.

  • @Matthew

    I do remember those tests, but I disagree that they weren’t needed. The whole point of those tests were that more and more people needed that kind of throughput. In reality, 1 TB/hr is now assumed. I would argue that those tests were all the vendors showing that they could what only a few people needed them to do. But those companies that needed it NEEDED it. The same is true now when we’re talking 1000s of MB/s, which translates into 3-10 times faster than those tests from the 90s. Few environments (from a percentage of companies perspective) need that kind of throughput, but that ones that need it need it BAD.

  • Curtis, thank you for posting information and creating useful debate that can help customers sort out the various product offerings.

    I wanted to provide some additional information and clarification on the numbers you cite for ExaGrid. Your current table shows the backup rate for one of ExaGrids single EX5000s at 188 MB/sec. That system would be sold to a customer with no more than 5 TB of primary data to be backed up.

    ExaGrid provides customers with a simple way to scale backup and restore throughput for larger data amounts and as their data grows. For each 5 TB of additional data a customer plans to backup, ExaGrid allows the customers to add an entire storage server which includes not only the additional capacity but also additional network ports, processor, and memory. Therefore, as a customers data grows, we are providing all of the necessary elements to maintain a short backup window and fast restores.

    So, a detailed table of ExaGrids performance numbers would look like:

    * 5 TB of primary data – 1 x EX5000 – 188 MB/sec
    * 10 TB of primary data – 2 x EX5000 – 376 MB/sec
    * 20 TB of primary data – 4 x EX5000 – 752 MB/sec
    * 30 TB of primary data – 6 x EX5000 – 1128 MB/sec

    30 TB is currently our largest supported GRID configuration. Our GRID architecture means that customers with multiple EX5000s manage them as a single cooperative system. All configuration, monitoring, and reporting is done from a single web interface via a single log in. Customers can easily migrate data between the devices without starting the deduplication benefit from scratch. Further, the systems will automatically capacity load balance across the devices.

    Given that most of the other vendor performance numbers you cite are for their largest systems and are meant for environments of 30 TB, 40 TB, or more, our 30 TB performance number of 1128 MB/sec is a more accurate comparison as it is our largest system. For customers with 5 or 10 TB, most of the vendors you are covering have lower end, lower performing units that would be proposed. If they did not, their price would be too high for those target customers. Or, alternately, they do not target customers with that amount of data and instead focus on customers with 100 TB or more.

    Marc Crespi, VP of Product Management for ExaGrid Systems

  • I did verify Logan’s comment, BTW. Diligent’s numbers are supported with an XIV repository.

  • @Marc Crespi

    I appreciate your politely worded response to my post. I do understand that your grid is a BIT different than, say, a DDX. A Data Domain DDX is not much more 16 DD690s stacked on top of each other that do not share storage or anything else. I understand that the nodes in your grid can move older deduped data to other nodes in the grid to help balance the STORAGE load. This is not to say that you are load balancing the I/O load (which is what this post was about), or that you have global dedupe, as you do not. Deduped data migrated in this fashion will have to be moved back to the primary node if it is needed for a restore, and this move will come at the cost of performance. (Of course, this is older data and generally people don’t do big restores of older data, so this is probably a non-issue for most.) I also hear you say that the grid is managed as a single system. I haven’t done a head-to-head YET, so I can’t speak to how much this is true.

    But this post was about performance, and whether or not a vendor should be allowed to add their numbers together (such as FalconStor, IBM, NEC & SEPATON), or not (Data Domain DDX, EMC 3D 4000, Exagrid, NetApp). Even if I acknowledge that your five node system is managed as a single entity from YOUR end, it will NOT be managed as a single entity from the backup software side, as doing so would really hurt the dedupe ratio. Customers using your six node system will suffer all of the downsides of local dedupe I laid out in this post: (https://backupcentral.com/content/view/231/47/). Therefore, I still see your 188 MB/s number as 188 MB/s. If I allowed you to add six nodes together to make 1128 MB/s, I’d have to also allow DD to display their DDX numbers.

    As to the “their largest system vs your smallest system” part of your comment, I definitely disagree. I compared their largest SINGLE NODE performance numbers with your largest SINGLE NODE performance number, unless, of course, they had global dedupe — in which case they could list multi-node numbers. It was as fair as I can make it. You think of your five or six nodes as one grid, but from a dedupe perspective, they are NOT a grid.

    Am I saying that no one should buy your product because other vendors have faster systems? Absolutely not. I would never advise buying something solely on performance. I’m just saying that customers should realize that your 1128 MB/s “system” is based on 6 separate nodes, and Data Domain’s system that does 500 MB/s is based on one node — and there’s a difference there that the customer should be aware of. I’m assuming that customers interested in both products will actually TRY both products, and make their choice based on what they feel to offer the most value. You say that you’ll be easier to manage and less expensive than the other guy’s product. If that’s true, then you should do just fine. And if you’re truly less expensive, you should do really well in the market you’re aiming for, as price is king in that world.

    This post was about getting a dialogue going, and it looks like we’ve done that.

  • The problem with many (all?) of the dedupe systems is the read performance degrades over time. Simply doing an initial qualification test isn’t going to give you accurate long term restore speeds. As the system ages, and gets a higher dedupe ratio, the data becomes more fragmented and scattered across the disk subsystem. At that point, the system starts having to frequently seek the disks to recover any single data stream. Incurring a couple ms every few SCSI tape reads can consistently drop your read performance on a single stream into the sub 10MB/sec range. If you have a couple of streams seeking all over the whole array, the system performace is going to be _VERY_ bad.

    When the vendors are quoting 10% performance loss doing reads, its because they are restoring a deduped stream that very nearly matches, or may even just be the original data stream. In other words, that 10% is the performance loss you see when you first install the machine. After a few months it _WILL_ get worse, count on it.

  • Jeremy,

    The phenomenon you described is something to watch out for, and it’s one of the reasons that I recommend that at least 90 days worth of data be copied into a dedupe system for testing.

    However, I do not agree that it is something that all systems will suffer from. Definitely they won’t all suffer evenly. I’d say it’s something that they all have to design around, and from what I’ve seen some have done a better job than others.

  • Curtis, Jeremy,

    With respect to restore performance, I would like to add that ProtecTIER does not suffer from performance loss when restoring. This is inherent to the design of the product.

    The product relies on random read performance of the backend disk when deduplicating, to perform full byte-for-byte data compare. This ensures enterprise class data integrity. It relies on the same random read performance when reassembling data from across the repository into restore streams. The restore performance is therefor just as good as the backup performance.

    Disclaimer, I work for IBM but do not speak for IBM. I present my personal opinion and knowledge of the product.

  • Jeremy,

    As Curtis identified, what you mention has a varying degree of effect based on how the solution is designed. On factor is the method of referencing that is used in the vendors deduplication algorithms. For those that use Forward Referencing (the minority it seems use it, SEPATON is one of the ones that does, I believe there are others), the most recent data is kept in its undeduplicated format as opposed to Reverse Referencing where the reverse is often true and the most recent backup is the “most deduplicated” for lack of a better term. Forward Referencing creates some unique challenges from a design perspective (especially when you begin to talk about deduplicated replication), but the idea is that most restores are done from the most recent backup so that’s the one that you want to be in an undeduplicated format where the effect you mention does not exist.

  • udubplate,

    Giving preference to the most recent copy, simply reverses the performance / age curve. A number of systems use a portion of the disk space as a first level cache to keep the most recent copy fast. In those cases the performance curve is U shaped which can completely hide any advantages to “forward referencing” the data for many workloads.

    Fundamentally, “forward referencing” doesn’t solve the problem of having to seek all over the disk to build a volume stream. In fact, the problem of defragmenting reclaimed space becomes harder and more important. For large systems its possible the defragmenting/reclamation process becomes the system bottleneck.

    If you fail to adequately defragment the space reclaimed from existing volumes, then the space to store the new volumes ends up being scattered in non optimal ways across the array as time progresses. If you cannot find large contiguous regions to store the new data, then you end up seeking.

    If the user is backing up extremely well behaved data, where references between data streams are close in time, and those references are fairly large, the problem won’t initially be as noticeable. In that circumstance its probably possible to even have datasets which don’t fragment. The space is reused before the system reaches a capacity where another stream is interleaving into the space of a stream still stored on the machine. As the machine fills up, this behavior is going to be minimized. Its also going to be minimized if the volumes are expiring and being reused at diffrent rates.

    The problem will probably be pushed off to the point where the sales guys are long gone. I’m not sure I would want to be the guy left standing there waiting for the system to rebuild an “archive” tape, or wondering why the dedupe process no longer completes in its window.

  • Jeremy,

    As mentioned above, each solution has a varying degree of effect based on how the solution is designed as well as how its used. By reversing the curve, it may be good enough for some but that is dependent on what your requirements are, where 99% of your restores are coming from (ie the last backup or not), how much data is being stored on the device (there’s a big difference between 1 week vs 1 year of retention for example), and various other factors. As should always be the case, everyone should test the solutions, and make sure they’re testing performance of restores based on the desired retention period you have (ie don’t simply test restore speeds for a week’s worth of backups if you’re going to retain a years worth on the device as the effect you mention may vary widely based on time parameters based on the solution).

  • Hi Curtis,

    Thanks for collecting this info. Valuable and arguable, the best combo for any blog posting.

    Below I’ll delve into IBM’s TSM storagepool features and how I would see a best fit between backup, restore, storage capacity and performance.

    IMO, you should dedup wat is best suitable for dedup.

    When backing up fileserver data, TSM only backups new and changed files which they call forever incremental. Chances are you’d see low dedup ratios. So, instead of using expensive VTL capacity, expensive both in costs as in performance, you could best store the fileserver data in a filebased storagepool. You could name this a software-VTL without any dedup or compression. Creating such a pool of cheap 1/1.5 TB SATA drives may give you both the backup and restore performance you would need for fileserver data.

    Database backups typically are full backups each night, so dedup could work out very nice, both in terms of storage capacity and restore performance. Backup performance is somewhat trickery. Using multiple streams each to a seperate LTO4 drive would easily outperform any dedup solution. As such is only best practice for really big databases I’ll take this not into consideration for now.

    The one really nice feature Diligent offers (oops, nowadays IBM!) is that it can take a LUN from almost any storage system. I myself would be very curious how Compellent would work out as disk storage for a Diligent VTL. Compellent is rather cheap, writes incoming datablocks to Tier 0 (SSD) or Tier 1 (FC disk) and is able to migrate all new datablocks overnight to SATA disks. In short, it uses its SSD/FC storage as a cache for lower cost RAID5 SATA storage layers and this could be very effective in both backup and restore performance.

    If this Diligent/Compellent combo really sings it could be a nice solution for any Backup Server (NetBackup, CommVault, TSM, etc.).

    Sadly, till date I have not had the possibility to test such a solution 🙁

  • I specifically said I didn’t verify any of the numbers, that I was just compiling all of the numbers that each company published.

    I actually spoke directly to Falconstor regarding the difference between the PDF file you referenced and this page http://www.falconstor.com/en/pages/?pn=VTLFeatures, and they said that the latter was more up to date — that they had just qualified 8 nodes in their cluster, and had not updated the PDF version yet. (Hey, Falconstor! Update your stinking PDF already!)

    As to SEPATON’s "exaggerated" claims of 500:1 dedupe ratio, consider this. When they were using those numbers, they were talking about "backup-over-backup" dedupe, meaning last night’s backup got reduced by 500:1. While the numbers they were giving were valid (when looking at them that way), I and others counseled them that it made them look silly, as no one cared about how last night’s backup got deduped. What we care about is how much ALL my backups were getting reduced. The result is that they changed their messaging about that a while ago; they don’t claim those numbers any more. Look all over their site, and the most you’ll is 50:1, and it will have caveats that say that this is most likely to happen in an Exchange-centric environment. (Try a google of "site:www.sepaton.com.com 50:1" or "site:www.sepaton.com 40:1" and you will find hits. What you won’t find is "site:www.sepaton.com 500:1." So I really wouldn’t say that they are more likely to exaggerate than anyone else.

    I actually think all of you are exaggerating. But since I can’t verify (without independent testing) how MUCH each of you are exaggerating, I’m just publishing advertised numbers.

    I completely agree with you on the need for an independent test. It will be the subject of a later blog.

  • Hi Preston,

    I’m the Co-Chair of the SNIA DMF Data Deduplication and Space Savings Special Interest Group (DDSR SIG). I currently work at EMC and have previously worked at IBM, HDS, VERITAS, and Troika Networks (acquired by QLogic).

    Some comments on a few of your statements:

    “Global dedupe only comes into play with multi-node systems”

    After a year of vigorous debate by DDSR SIG members, the industry consensus on what global data deduplication means is captured by this definition:

    Data deduplication which stores only unique data across multiple deduplication systems. For example, global data deduplication stores only unique data across multiple target appliances or sends and stores only unique data from multiple source clients.

    At first glance this agrees with your initial comment but it does not coincide with your later comment: “NetApp, Quantum, EMC & Dell, have only local dedupe.”

    Are you restricting your comments to target data deduplication only? EMC has both source and target implementations. Your statement is at odds with the facts regarding EMC Avamar (just one example) which supports global data deduplication.

    “Let’s talk about Data Domain, as they currently own the dedupe market hands down.”

    I’ve been in the storage business over 30 years in management, engineering, product management, marketing, field support, and consulting roles. I get that you need to evangelize the desires of your clients to make money. However, statements like the above do a disservice to all of our customers. Try to stick with the facts.

  • Thanks for joining the discussion and for your polite demeanor, even though it’s obvious you really didn’t like one part of the post. 😉

    I realize I didn’t specify that I’m specifically talking about target dedupe, but I am. Perhaps I’ll update it just to say that, and insert the word target in a number of places.

    I am not an analyst. I am not a paid blogger. Data Domain has not paid me a dime to do anything. None of the vendors mentioned above are my clients. In fact, I’m as much of an annoyance to Data Domain as I am to EMC and others. (They’d really rather I stop pointing out that they don’t have global dedupe.) If I say something it’s because I believe it to be fact or at a minimum I believe it as my own opinion.

    In fact, if you had continued reading the paragraph where the sentence to which you objected was found, you’d see that I gave Data Domain more crap than praise. I basically said, “Yeah, they own the market, BUT they still don’t have global dedupe.” (I point this out in advance because SOME would argue that I must be wrong on global dedupe because the market leader doesn’t have it. I want you to see that I know who they are in the marketplace, but I also want to you to see that they don’t have global dedupe.

    Now. as to the "owning the market" comment, I should have put the word "target" in there, so I will (and have edited the original comment to reflect that):

    "Let’s talk about Data Domain, as they currently own the target dedupe market hands down."

    They’ve got around 3000 customers and many more shipped systems than any vendor of which I’m aware, and that number goes up every day. The mindshare they have with end users is also unparalleled. When I talk to customers and I’m talking about target dedupe, they automatically start talking about Data Domain, as if the two are synonymous. If EMC works hard enough and long enough, and continues the practices to which I alluded in my other post (https://backupcentral.com/content/view/234/47/), they might indeed change this, but I certainly feel that the statement holds true today.

    If they don’t own the target dedupe market, I don’t know who does.

  • I noticed that you mentioned in your comments that NEC has global dedup, but they are not in your list. Is there a reason for this? Or do you just not have this information for them?

  • Curtis,
    Often dedupe performance on the first write of new data is lower than later re-writes of (substantially the same) data, so are the performance numbers you’ve discussed mostly for first writes or later writes? It would seem that the most common case is re-writing mostly the same data, so perhaps there is a reason to focus on quoting that number. I’d appreciate your viewpoint on this issue.
    Thanks,
    Matt