I kept reading stories like this one that said that Quantum’s dedupe is inline. Then I would hear from those “in the know” that said it was post-process. Different people at Quantum would say different things. Some would say that they run the dedupe at the same time as the ingest, so they considered it inline, although data is hitting disk before it’s deduped. They say since it only hits disk for a few seconds, it’s really inline. I said, “No it’s not.” So what’s the scoop? Read on to see.
If any data is written to any disk before it is deduplicated, then the vendor is using a post-process approach. In a response to my comment on the Byte & Switch story the Quantum spokesperson said that “de-duplication of ingested data typically will finish within seconds of ingest.” That is a post-process approach. “Within a few seconds of ingest” is not the same thing as “at the same time as ingest and before it is written to disk.”
Please note: I am not saying post-process is bad. I’m merely saying that what the Quantum spokesperson is describing is not inline; it is post process. Just because Quantum marketing calls it inline doesn’t make it so.
Speaking of Quantum marketing, I kept getting different messages depending on to whom I was speaking. Therefore, I was given the chance to sit down and talk with Quantum’s CTO about this issue, and he assures me that the DXi7500 will do true inline deduplication — data will not hit the disk until it has been deduplicated — but it will do so only at speeds significantly slower than the 7500’s advertised ingest rates. Once it passes a certain ingest rate (~100-160 MB/s), it will switch to post-process dedupe, and the post-process dedupe will be happening as the data is coming in — making it asynchronous.
Therefore, I say that if you are using the DXi7500 at anywhere near it’s advertised ingest rates, it is using a post-process approach. If you’re staying under 150 MB/s, it would be inline — but why would you buy a device that could go that fast and run it that slow?
Update: Some read this blog entry and thought that I was saying that the DXi7500 only dedupes at 150 MB/s. That would not match what they have told me and is NOT this blog entry was trying to say. I was merely attempting to clarify some ambiguity with what I was being told about whether Quantum does inline dedupe or not. The short answer is that if the ingest rate is less than 150 MB/s, then they are using inline dedupe. If it is greater than 150 MB/s, then they use post processing dedupe. They refer to this as adaptive dedupe, as it adapts to the incoming conditions. They also offer scheduled dedupe, which means it runs completely after the backups, and they dedupe faster when they’re deduping outside the backup window.
So how much data can the DXi7500 ingest in a day and dedupe it before the next day’s backup? They say they can dedupe 1.6 TB/hr (444 MB/s) if they’re deduping during the backup window and 2 TB/hr (555 MB/s) if they’re deduping outside the backup window. If we assume a typical 12-hr backup window and calculate 1.6 TB/hr during the window and 2 TB/hr outside the window, it can dedupe 43.2 TB a day. This means they could ingest data at 3.6 TB/hr (1000 MB/s) for 12 hours and still dedupe it before the next day. Since this is less than their advertised ingest rate of 8 TB/hr, they should be able to do it. (This is, of course, leaving no room for error or maintenance, but I’m not sure what figures to put in for that.) If you ingest data at their advertised ingest rate of 8 TB/hr (2222 MB/s), you could only do that for 5.2 hours and still dedupe it all in a day.
I hope this clears up any confusion.
----- Signature and Disclaimer -----
Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Evangelist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.