I’m surprised that Dipesh Patel of CommVault joined the “dedupe ratio doesn’t matter” chorus with his blog post “How to analyze dedupe ratios and its impact on cost savings.” I’ve met him and know him to be an intelligent person, so I’ll post this and hope for the best.
The basic argument in Dipesh’s blog (and other blogs before him) is that since a 10:1 dedupe ratio reduces data by 90% and a 20:1 dedupe ratio reduces it by 95%, the incremental savings between the two is only 5%. Dipesh says that vendors that argue “that a doubling of dedupe ratios is a doubling of savings” are using “sleight of hand.”
I argue exactly the opposite and did so in a blog post over a year ago. I say that vendors who are using incremental savings are using sleight of hand. The question in the end is how much disk will you have to buy, manage, power, and cool — and I believe that the manage, power and cool parts of that equation are extremely important and are completely absent from Dipesh’s calculations.
A customer that is able to dedupe their data at 10:1 will buy twice as much disk as if they were able to dedupe that same data at 20:1. (If you have 100 TB of backups and you dedupe it at 10:1, you need 10 TB of disk. If you dedupe it at 20:1, you need 5 TB of disk.) That’s twice as much disk to manage (monitor, replace on failure, etc), power, and cool. The IT department is the largest part of most company’s power bill, and the storage department is often the largest part of the IT department’s power bill. Since a backup system typically holds 10-20 GB for every 1 GB on primary storage, I argue that the backup system’s disk power bill could possibly be the biggest percentage (backups) of the biggest percentage (storage) of the biggest percentage (IT) of the power bill. Cutting that in half (or not) is kind of a big deal.
Update: Jay Livens of SEPATON posted his thoughts on this subject on his aboutrestore.com blog. In addition to the power/cooling costs I posted, he pointed out the same thing can be said about replication costs and bandwidth.
So much for the incremental savings argument.
Dipesh’s blog post also makes the argument that most vendors save about the same amount of disk and that the characteristics of your data are what really determines your dedupe ratio. While I completely agree with the latter, I do not agree with the former, unless I highlighted the word most. I do argue that most of the time it is not the vendor that you buy that determines your dedupe ratio, it is the characteristic of your data and how you back it up that determines your dedupe ratio. Having said that, I have seen scenarios where one vendor got 100 times more dedupe than another vendor — with the same data! This is why I think you should always be testing more than one vendor when testing dedupe solutions.
Since this blog is talking about calculating costs I think it’s important to point out that most customers are using something other than CommVault Simpana. Why is that important? CommVault makes the argument that their dedupe is superior to Data Domain, SEPATON, Quantum, IBM & Exagrid’s target dedupe solutions. They are able to do things with their dedupe solution that the target vendors cannot do (such as encrypt & compress data before sending it over the network). But unlike the target dedupe vendors, you have to switch your backup software from whatever you’re using to CommVault in order to get the benefits they’re offering. That conversion comes at a huge cost and risk that includes your initial purchase, education classes, possible professional services for installation, hours spent poring over new manuals and on support calls to understand your new backup solution — all while your backup system goes through a possibly very long period of instability. I don’t care how good a backup software product is; the above things are going to happen. You may still feel that the change is worth the cost and the risk — just make sure that you consider all these costs into your TCO analysis.
----- Signature and Disclaimer -----
Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technologist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.