My good friend, Steve Duplessie, wrote a blog article that basically said that the issue that dedupe was designed to solve is “NO LONGER VALID” (Caps his). He didn’t say dedupe is bad, but he says that he can buy JBOD for 1/9th the cost of a deduped system; “therefore, using deduplication to solve an economic scarcity issue is no longer legitimate.” He also said that “if I’m off by 100%, I’m still 1/4 the cost.”
With all due respect to my good friend, I don’t agree with him any more on this than I do on whether or not you should root for the Patriots or the Chargers. His logic is sound, but his numbers are off — off by a whole lot more than 100%. And there’s another “scarcity” factor that he’s not considering.
Steve’s post is a follow-on to another post of his that was inspired by a book called “Free” by Chris Anderson. Among other things, this book apparently talks about the concept of “scarcity” and how things that solve scarcity problems make money. He says, and I agree, that backing up to disk was to solve the scarcity of time when backing up to tape. Then dedupe was made to solve the scarcity of money, since backing up to disk costed too much. But then he says that this is no longer an issue because cheap disk costs so much less now, and he makes his point by comparing the alleged cost of a 30 TB Data Domain system (which he says costs $90K) to a 30 TB JBOD system (which he says costs $10K). He’s saying therefore the issue that dedupe was trying to solve is no longer valid.
First, let’s talk the Data Domain pricing used in the blog post, as it definitely doesn’t match what I’m seeing. I verified today that the list price of a Data Domain system capable of holding 30 TB of backups is $32K — not $90K. That’s for a DD510 with one expansion tray that gives you 2.7 TB of usable capacity. With an 11:1 dedupe ratio, which is a realistic ratio, that will hold 30 TB of backups. Just for comparison, Quantum also has a 1.9 TB system that lists for $12.5K. Two of those and a 10:1 dedupe ratio and you’ve got yourself 40 TB of backup capacity for $25K. Taking out the additional capacity makes the effective list for 30 TB of Quantum about $19K ($25K * 30 / 40).
As to the JBOD system, I have no idea where I’m supposed to buy a 30 TB disk array for $10K. Let’s look at a few disk systems. A Dell MD 3000 configured with 15 1 TB drives sells direct for $16.5K, so that’s $33K for 30 raw TB (pre RAID5). But I need 30 TB usable, which means I’ll need to buy three of these for a total cost of $50K.
If you’re OK with not buying a big brand name, the least expensive arrays I’ve seen in the middle enterprise are NexSAN arrays, and they list for about $30,000 for 30 TB — raw. A reseller that I know says that their rule of thumb (based on the way customers usually configure them) puts them about $40K-45K for 30 usable TB.
OK, forget any kind of brand name and let’s just go for cheap. The least expensive arrays that I’ve heard of (but never seen in a customer) are Promise arrays. A Promise VTE310F array sells (direct) for $6789 with no hard drives. Filling it with 12 1 TB disk drives from buy.com for $90 apiece adds $1080. So that’s $7869 for 12 raw TB using the cheapest arrays and disks I can find. I would need three of them to get to 30 usable TB, costing $23K.
It’s also important to point out that there are many classes of storage in the enterprise, and comparing a Promise array to a Data Domain or Quantum system is ignoring those classes completely.
So, that’s $19K-32K list price for the dedupe option and $23K-$50K street price for the JBOD option. Without looking at anything else, deduped storage is cheaper than JBOD disk — even if you build it yourself.
And that’s just the acquisition price. Remember that the deduped system will be using at least 10 times less power and cooling than the JBOD system, and that’s a huge deal. Steve said that power isn’t scarce and hasn’t been for a long time. Are you kidding me? I’ve been in datacenters that have been told by the power company that they can’t have any more power. And, even when it’s not scarce, it’s expensive — and anything we can do to reduce it is a good (and a green) thing.
Secondly, deduping backups enables replicating those backups, which simply isn’t possible in most datacenters without dedupe. Trust me — bandwidth is scarce. Now you can have onsite and offsite disk-based backups without moving tapes around. If you want to make tape copies, make them offsite — and they never get moved.
Finally, Steve’s post missed another important point. Disk is not in competition with deduped disk; it’s a component of deduped disk. So as disk gets cheaper, so does deduped disk. As disk prices have fallen over the last several years, so has the per-GB pricing of dedupe systems.
So even if it costed more (which it doesn’t), then it would still be a Good Idea for those reasons. And that’s all I have to say about that.
I like Steve a lot, but I think he needs to check his pricing numbers again.
----- Signature and Disclaimer -----
Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Evangelist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.