EMC DL3D: A post-processing target dedupe system

I remember when I first started talking to Quantum about dedupe and they were trying to call their “immediate” deduplication “inline” because it’s happening at the same time as backup.  They eventually stopped referring to it as inline, as it does not meet the definition of inline dedupe that was around long before they came out with their product.  Unfortunately, now that EMC is now selling their Quantum-based product, they’re apparently trying to do the same thing – or at least one of their bloggers is. As usual, I’m drawing a very thick line between inline and post-process.  Click Read More to see why.

The definition of inline deduplication (which was decided at least five years before EMC entered the target dedupe market by OEMing Quantum) is dedupe that is done in such a way that the native, non-deduped data is never written to disk – ever.  If your product ever writes the data in its native format, then it’s doing post-process.  The EMC product always writes data in its native format (according to the blog post referenced above), so it’s a post-process product.  (I have been told that the Quantum 7500 can do true inline dedupe up to about 150 MB/s, but I haven’t verified that.  See my other blog post about that.  But it appears that the EMC products based on Quantum are not configured to work that way.)

Since I have been quoted many times as saying that I don’t care whether you use post-process or inline, why do I care if EMC calls what they do inline?  The first reason I care is that it is confusing when a term has been used one way for many years and a newcomer comes along and starts using it to mean something else.  I’m trying to help people understand the market and when someone does that (particularly a large vendor), it gets very confusing for people.

The second reason that the differentiation between inline and post-process is important is that post-processing systems can get “behind” and inline systems cannot.  It’s analogous to asynchronous replication.  Since asynchronous replication acknowledges the write to the application as soon as it’s done (whether it’s been replicated or not), the replicated copy can get “behind” the primary copy – from seconds to hours.  In fact, asynchronously replicated systems sometimes get so out of synch that they cannot catch up.

Just like asynchronous replication, a post-processing dedupe system allows the native data to continue writing faster than it can process it, causing a backlog of dedupe work to happen at the end of the backup window that doesn’t exist in an inline system.  It is possible (depending on the environment and the dedupe system) that the system could get so far behind that it would never be able to catch up.  This is especially true of systems where the ingest rate is significantly faster than their dedupe rate.  If a system can ingest data at 4 TB/hr, but can only process it at 1 TB/hr, you could get it impossibly behind by backing up 4 TB/hr for 10 hours.  It would have 40 TB to dedupe in 24 hours – and it can only dedupe 24 TB in 24 hours.  This is not to say that this is BAD – it just means it’s something you have to plan for in a post-processing system that you don’t have to plan for in an inline system.

The final reason that it is important to differentiate between these two types of dedupe is the difference in the amount of I/O each system must perform.  Post-processing systems actually have to perform about 300% more I/O than inline systems.  (I blogged about this here.)  This extra I/O has an infrastructure cost that will be reflected in the price of the system.

This is not to say that I prefer inline systems, or that I think post-processing systems are poorly designed.  I actually really like some of the post-processing systems.  There are advantages to both methods, and a casual reader reading that a system is inline might assume that this system has the advantages of an inline system.  If it’s truly an inline system, then it will.  If it’s actually a post-processing system, it won’t.

EMC can call what they do “immediate” or “concurrent” (the way SEPATON does), but they can’t call what they do “inline.”  It’s misleading.  “That’s all I’ve got to say about that…

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Evangelist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

12 thoughts on “EMC DL3D: A post-processing target dedupe system

  1. Scott Waterhouse says:

    W Curtis… I welcome the debate, but… I think there are two levels to this conversation: one is what the user cares about. And honestly I think that just boils down to a performance conversation–in the sense of throughput, time to finish backup, and time to finish replication. The other level is the technical minutiae of what is really going on with the device. The second is also partly semantics in this case.

    Let me start with the second level. You say that you don’t want the market confused. Fair enough. Neither do we. Having said that, I think you are trying to fit a square peg in to a round hole in order to define something. (By the way, we are fine with the "immediate" definition if you prefer that to in line.)

    Why do I say that? You further wrote: "the second reason that the differentiation between inline and post-process is important is that post-processing systems can get “behind” and inline systems cannot.” Well, the DL3D can’t either. When it is running in “in-line” mode it will not build up a backlog of data on cache. It will deduplicate data as it is received. If you send it data faster than it can deduplicate, it will bottleneck (and slow down the reception of data). Just like any other in line system.

    So honestly, the DL3D meets at least one of the tests you proposed for in line deduplication.

    Now back to the first level–and the reason I brought up the subject in the first place: why does it matter to users? Only for performance. And in this respect, the DL3D approach to in line (or immediate) deduplication has a huge advantage over competitive approaches. Our approach allows you to restore data up to 6 times faster than you can from an appliance that doesn’t employ any sort of disk cache (like Data Domain). So it has every performance characteristic of an in line solution (limit on write speeds, replication is hooked to deduplication and happens simultaneously, restore from truly deduped data is 1/4 the speed of a write to the system, etc.) except that if you are restoring from cached or non-truncated data, you can get up to a six times performance improvement. And it seemed to me that was worth mentioning.

  2. Scott Waterhouse says:

    I think your site is truncating my comments! The rest of it can be found here: http://thebackupblog.typepad.com/thebackupblog/2008/11/dl3d-the-benefits-of-immediate-deduplication.html#comment-138121910

    Note from Curtis: I cut and pasted the rest of your comment back in. The truncation was caused by cutting and pasting special characters. In this case, it’s the fancy quote marks that are around the word "behind,", which aren’t in the base ASCII set. I’m going to see if an updated version of the comment tool will fix that problem. Sorry for the inconvenience.

  3. Aaron Kristoff says:

    I think there is something worth mentioning about the disk caching taking place with the EMC/QTM that they are not letting us know. Don’t get me wrong. The restore/copy performance gained is pound for pound and head and shoulders above the in-line de-dupe folks. I mean it would have to be right? If you are restoring/copying data from a cache in its native form it must be faster than if the data has to be re-duped before restore/copy.

    However, there comes a point when your the appliance will truncate that native data from cache leaving only the meta data available for restore/copy. The QTM configuration out of the box is 70% of used disk capacity. Not sure if the DL3D is setup the same way or not and I cannot find any reference to this so I am going to put them in the same boat as QTM. Feel free to knock me down if I got it wrong.

    So, if you take a 36 TB capacity system, it will store all data that has been ingested in its native form up until the used capacity has reached 70% or 25.2 TB. It will then truncate the oldest 10% of data in this cache until the used capacity drops down to 60% or 21.6 TB. This gives you a cushion of about 3.6 TB of capacity that will be used for the newest data ingested to be stored in this cache. Not too bad. Best case scenario is that you will always have the most recent data in native form available in the cache for exceptional restore/copy performance.

    But what happens when your data retention size exceeds this 70% capacity threshold? What if your retained data size comes to 32 TB of the 36 TB capacity? The system will continuously operate in truncation mode. That means all new data that is ingested will be cached only for a period of time until it is de-duped and then the meta data will only be left. You will no longer have that disk cache for fast restore/copy because truncation will have already taken place.

    The point that the in-line folks are making is that you are never going to be fooled by the performance of an in-line system. You get the same restore/copy performance whether you have used 1% or 90% of capacity.

    The question then becomes

  4. Aaron Kristoff says:

    The question then becomes, I have gotten so attached to the exceptional restore/copy performance I was getting from this disk cache and now that is gone. How can this be fixed? I think the answer you will probably get is to add more capacity to your system. Now instead of a 36 TB system, you will need to go to the next capacity level, lets say 54 TB. Your 70% threshold has now become 37.8 TB. You once again will have the capacity available to cache the data in its native form for best performance restore/copy. But you now need more disk (which is not free by the way, trust me).

    So, I believe in order for the DL3D products to sustain the performance claim of being able to “restore data up to 6 times faster than you can from an appliance that doesn’t employ any sort of disk cache” you need more capacity than a system that does not employ any sort of disk cache. There is a cost involved here. And for some people, it is sensible to purchase a 36TB system for 32TB of data retention than it is a 54 TB system.

    Bottom line is for EMC or QTM or whoever to educate their customers that this disk cache is a great advantage over the in-line competition but tell them up front that additional capacity may be necessary to sustain it. Do not size a 36 TB system for a customer who has 32 TB of data retention and let them find out the hard way that they will ultimately need more capacity to sustain this great restore/copy performance.

  5. Scott Waterhouse says:

    OK, so I am going to move right past the fact that the comment could be written by Data Domain.

    Here is the simple version, in my mind. There are two options: 1) you can restore from native data; 2) you can restore from deduplicated data.

    (And what is this idea “restore from meta data”? You restore data that has been deduplicated. Exactly as you would from any other deduplication scheme. What does Data Domain need to make this sound bizarre or scary? It is a restore. Full stop.)

    So if #1 is true, you restore from native data, and the DL3D is much faster than DD (by about 6 times). If #2 is true, you restore from deduped data, and the DL3D is about as fast the the DD.

    Since when is this flexibility a bad thing? And I agree with you–if you consistently want to restore (fast) from native data, you need more capacity. If you have an SLA that cannot be met by a restore from deduped data, you will probably be willing and able to pay more to provision to meet that tighter SLA.

    A good dedup vendor should let you understand those choices (and give you hardware that is flexible enough so that you have choices in the first place).

  6. Aaron Kristoff says:

    Don’t worry Scott. Data Domain I am not. Glad you didn’t try to allude to that in your post. Oh wait… 😉
    And I will correct my comment about restoring from meta data. I was not trying to make anything sound bizarre or scary. I just mis represented what I was trying to say.

    Thanks for seeing it my way regarding the extra capacity needed to maintain top performance. My point is for the vendors to let the customers know that up front because we both know that once the thing is on the floor and in production, the customers have no other option except to just say okay Mr DeDupe vendor. We will pump more money for disk into this thing even though we had no clue that the restore/copy performance would go to crap once we fill it up to 90% capacity. What can the customers do? Either reduce retention or expand. These are the options. And the cusotmer added dedupe to extend retention so we know that is not going to happen.

    Then the you know what hits the fan because managers want to know why this thing was under sized from the beginning and why this disk cache thing wasn’t taken into consideration from the beginning and why are we not getting tape copies offsite because the data cannot be re-duped and sent to tape in any reasonable amount of time and why should we bother with a 2nd one of these things if we cannot get the first one right. Do not laugh. This is what goes on. These are how customers think and their bosses and their bosses bosses think.

    And it is not just disk costs but all of the other costs involved: disk, RAID Controllers, De-Dupe licensing for the expanded capacity, additional hardware/software maintenance.

    I am just saying, be up front with the capacity needs when talking about your 6 times faster restore performance.

  7. cpjlboss says:


    I actually know Aaron personally, and he is neither DD, nor a customer of DD. I don’t want to speak for him, nor give more information than he is willing to give out publicly, but let’s just say that he is speaking from experience and he’s not a shill.

  8. Scott Waterhouse says:

    OK… I just found it interesting that some of the vocabulary Aaron used is identical to a wildly misleading paper DD is circulating to anybody who will listen at the moment. I apologize for any further insinuation.

    And I can’t agree more that we all need to disclose better. I have written repeatedly on my blog about the need for accurate sizing. And there are at least two dimensions to that: performance (how do you get it, where are the thresholds, etc.) and what kind of deduplication ratios can you get.

    And yes, there are other costs. Again, all we can disclose is where those costs come. For a DL3D, there are no additional licenses required by capacity (except from your backup software vendor, perhaps). There are no additional hardware costs except the disk and trays (i.e. no RAID controllers, etc.) and no uplifts to existing maintenance.

    So yes we have the responsibility to be upfront, educational, disclose all you should know, and so forth. Couldn’t agree more.

  9. cpjlboss says:

    BTW, if he WAS from DD, I would think he would attack your “6 times faster” allegations, as I’m sure DD wouldn’t agree with that. It sure doesn’t match what hear from DD customers.

    Consider two recent posts on the NetBackup mailing list where they came up:

    Here’s a discussion about DD vs DXi, and not one poster says “they’re great except for the fact that restore speed is 1/4th that of backup speed.”

    Here’s a guy that is getting slow performance with his DD565 (which advertises 179 MB/s). He’s getting 45-50 MB/s. Everyone seems to agree that something must be wrong with his setup, as he should be getting better than that. Of special note is the replier who says, “Speed for backup is typically gigabit speed as long as the clients can drive it, same for restore, 80-110MiB/s.” Please note that he says that his restore performance is the same as his backup performance.


    You can ding them (and I do as well) for advertising throughput numbers that are “benchmark, best/case” numbers, rather than steady state, real-world numbers, but I don’t see the evidence for a claim that restore performance is 1/4th to 1/6th that of backup speed.

    BTW, if you want to see what real people think about their and your products, feel free to go to to the forums at Backup Central http://www.backupcentral.com/phpBB2 and run some searches. There are 151K posts there from around the world. In my searches, I wasn’t able to find a single person who complains about what you’re saying. Sure, I found complaints about DD, but never one matching what you’re claiming. The overwhelming majority of posts are extremely positive.

Comments are closed.