I’ve now seen some vendors saying things to the effect that “once you get 10:1, dedupe ratio doesn’t matter.” They state that 10:1 saves 90% of disk, and 20:1 saves 95% of disk, so the difference is only 5% — so why is everyone so concerned about dedupe ratio? To this 90%/95% comment I say, “balderdash!” Click Read More to see more.
This reminds of what my father always said: “figures never lie, but liars always figure.”
Here’s how they come up with their numbers. If you back up 100 TB using a dedupe ratio of 10:1, you need 10 TB of disk to store it. If you backup 100 TB with a dedupe ratio of 20:1, you need 5 TB of disk to store it. The difference between 10 TB and 5 TB when backing up 100 TB is 5%. By that math, the difference between 20:1 and 30:1 is only 2.5%, and so on. Therefore, they say, why do some vendors use dedupe numbers like 50:1? There’s only an 8% difference between 10:1 and 50:1!
Again I say, “balderdash!”
The reason their math “works” is that they’re comparing the deduped data to the original size of the data. But that’s not what matters in a competitive situation, which is the scenario in which they are using it. What matters (when talking dedupe ratio) is how much disk one vendor will need versus how much disk the other vendor will need to hold the same amount of backups. And if vendor A is getting 10:1 and another vendor B is getting 30:1, then customers using vendor A to store their backups will need to buy three times more disk than customers using vendor B. So saying that is insignificant is, how should I say… balderdash!
How much disk you have to buy is really important. Disk isn’t free even if you didn’t pay for it. You’re going to need to provide it power and cooling for its whole life. Suppose two competing vendors made up for their bad (or good) dedupe ratio by changing the software side of their pricing, so both 400 TB dedupe systems cost $1M. If one dedupes the data down to 20 TB (20:1), and the other dedupes it down to 40 TB (10:1), that’s 20 more TB of disk you’re going to have to provide power and cooling for. So again, don’t tell me that doesn’t matter.
I know that other things affect the amount of disk you must buy as well, like whether or not you need a landing zone or cache area, but this post is about the claims from some vendors that dedupe ratio doesn’t matter. The way I see it, saying dedupe ratio doesn’t matter is another way of saying that you have a bad dedupe ratio.
----- Signature and Disclaimer -----
Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technologist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.