Target deduplication appliance performance comparison

The world of target dedupe has changed significantly since I wrote my first performance comparison of target deduplication vendors 18 months ago.  It wasn’t until I made the chart for this blog entry that I realized how much they had changed.

Caveat Emptor

As I said in the last post, please remember that I am publishing only what the vendor advertises.  Some vendors that didn’t like the numbers I published last time said that I’m validating the numbers by publishing them here and that’s nonsense; I am only collecting information from a variety of public sources and normalizing them so they can all be put into the same table.  If they say they support 4, 8, or 55 nodes in a global dedupe system, then that’s what I put here.  If they say they do 1000 MB/s, then that’s what I put here.  I have verified some of these numbers personally; however, since I haven’t verified them all, I do not believe it would be fair to speak publicly about the ones I have verified.  Not to mention that is actually how I make my living. 😉

Global Dedupe Matters

Before discussing the numbers, I once again feel it is important to explain why I allow some products to add the throughput of multiple nodes together, and I require other products to use only the numbers for a single node.  It’s simple.  If a vendor’s nodes behave like one node from a dedupe standpoint (i.e. they have global, multi-node, dedupe), then they can put them together.  If they behave like multiple independent nodes (i.e. they don’t have global dedupe), then why should they be allowed to put them together and call them one node?

I’ll give you two examples from the target dedupe market leader, EMC/Data Domain, to illustrate my point. Let’s talk about their new GDA, or Global Dedupe Appliance and the DDX.   The GDA uses NetBackup’s OST and Data Domain’s Boost to load balance data across two DD880s and ensures that data is globally deduped across both nodes.  It is two nodes acting as one; therefore, it is perfectly valid to publish its throughput rate as one number.  The DDX, on the other hand, is a different story.  EMC continues to advertise the DDX as “a 16-controller DDX array [that] provides up to 86.4 TB per hour throughput.”  The problem is that it’s only an array in the general sense (e.g. an impressive array of flowers), not in the IT sense (e.g. a disk array).  The DDX Array is really just 16 Data Domain boxes in a rack.  They know nothing of each other; if you send the same exact data to each of the 16 arrays, that data will be stored 16 times.  Therefore, I do not use the numbers for the DDX in this table.

Vendors More Open

One of the things that has changed since I did this 18 months ago is that vendors now publish all their numbers.  Specifically, the post-process vendors put their dedupe rates on their website, rather than just publishing their ingest rates.  I don’t know if it was this blog post that coerced them to do that or not, but I’d like to think I helped a little.  The result is that this table was published using only numbers that are publicly available on their website, and in no cases did I find a vendor that didn’t have their numbers somewhere publicly on their site.  Bravo, vendors.

Backup Only

I am publishing only backup numbers for two reasons.  The first (and biggest) reason is that they tend not to publish their restore numbers, and I wanted this post to use only published numbers.  The second is that (with a few exceptions), the performance numbers for restore tend to be in line with their performance numbers for backup.

Having said that, backup is one thing, restore is everything.  Just because fast disk-based backup devices usually make fast disk-based restore appliances, do not assume this to be the case.  Test everything; believe nothing.

If I’ve told you once, I’ve told you 1024 times

I used 1000, not 1024, when dividing and multiplying these numbers. If that bothers you, go to and find some movies to submit goofs on and stop picking on me.  I was consistent, and that is what matters in a table like this, IMHO.

The Comparison

The vendors are listed alphabetically, of course.  The product names are all links to the documents from which I derived the numbers.  If the product is an inline product, then I put a number in the Inline Backup Speed column.  If it is a post-process product, then I put numbers in the Ingest Speed and the Dedupe Speed columns.  The Daily Backup Capacity is my attempt to compare the two different types of products (inline & post-process) side-by-side.  Assuming you’re going to dedupe everything, then you can really only ingest as much data in a day as you can dedupe in a day.  I took the value in the Inline Backup Speed column for inline vendors and the Dedupe Speed column for post-process vendors and multiplied it by 86400 (the number of seconds in a day), then divided by 1,.000,000 to get the number of terabytes they could back up in a day. The usable capacity is the maximum amount of space that you have to store deduped data on that particular appliance.  (This would be minus RAID overhead and does not include any deduplication.  The amount of backup data you could store on each appliance would be a function of what your deduplication ratio was multiplied times the usable capacity.)

Update: My first version of this table had NEC coming in at something like 16K MB/s, but it’s been updated with a much bigger number.  This is because I was using some older numbers from their website that they didn’t know were still there. I am now using the most up-to-date numbers.



Inline Backup Speed (MB/s)

Post-process Ingest Speed (MB/s)

Dedupe Speed (MB/s)

Daily Backup Capacity

Usable capacity







307 TB

384 TB (raw)

2 nodes, NBU/OST only





129 TB

192 TB (raw)


DD880 w/Boost




211 TB

192 TB (raw)

NBU, NW only






172 TB

200 TB

10 nodes, NBU/OST only





172 TB

200 TB

10 nodes






172 TB

268 TB

8 VTL nodes
4 SIR nodes


GB 4000

950 MB/s



82 TB

108 TB







57 TB

36 TB







86 TB

1000 TB

2 nodes


HydraStor HS8-2000




2376 TB

1320 TB

55 accelerator nodes,
110 storage nodes


DXi 8500




153 TB

200 TB







200 TB

1600 TB

8 nodes


NetBackup 5000




619 TB

96 TB

6 nodes, requires NBU Media Server dedupe to get this throughput



The big winner here is NEC, coming in more than three times as fast as their closest competitor.  This is, of course, a function of the fact that they support global dedupe, and that they have the resources to certify a 55-node system.  (It helps to have an $86B company behind you.)  This is one of the reasons that I referred to them in a previous blog post as the best product you’ve never seen.  In addition to being fast, they also have a very interesting approach to availability and resiliency.  They actually got left out of the last comparison I did only due to an oversight on my part.

The big surprise to me personally is the NetBackup 5000, as it is the newest entry to this category.  It’s only for NetBackup, but it’s pretty impressive that they’re coming in second when they just entered the race.  This is also a function of global dedupe and them supporting six nodes in a grid.  I still don’t think this is a good move for Symantec, as it puts them right in competition with their hardware partners, but it is a respectable number.

Update (11/12): The NetBackup 5000 uses the NetBackup Media Server Deduplication option to get this performance number.  Like EMC’s Boost, the data is deduped before ever getting to the appliance.  They have not published what their dedupe throughput would be if you did not use this option.

Speaking of being a NetBackup customer, Data Domain is looking a lot better than they used to due to the advent of Boost, which supports NetBackup and NetWorker customers.  Boost works by running a plug-in on the NetBackup media server or NetWorker storage node, and doing some of the heavy lifting and deduping before it’s ever sent across the network.  This spreads the load out over more CPUs, and gives a significant effective increase in throughput to those boxes that support it.  Notice that Boost increases the effective throughput of the single-node DD880 to faster than the 8-node Sepaton, 4-node Falconstor, 2-node ProtecTier, or 10-node Exagrid system.  Having said that, I still think global dedupe is important, here and here are some old posts to explain why.  I’ve also got an article coming out next month on about this as well.

I was kind of surprised that FalconStor doesn’t support more than four nodes yet, and their numbers might look very strange if you don’t know the reasoning behind them.  They support an 8-node VTL cluster, but they only support 4 SIR (dedupe) nodes behind that cluster (for a total of 12 nodes).  This is why they can ingest 12000 MB/s, but they can only dedupe 2000 MB/s, which severely limits their daily backup capacity to only 172 TB.

Another surprise was that Quantum came in with a respectable daily backup capacity of 153 TB a day, even though they do not support global dedupe.  That’s right behind Sepaton, which uses 8 nodes to do the same job.

Three vendors told me they were about to do major refreshes by the end of this year, but I decided I’d waited long enough to publish this table.  When they do their refreshes, I’ll refresh the table with another post.

Happy hunting!

