Dedupe to tape is definitely crazy. But is it crazy good or crazy bad? I spent two days in (lovely?) Oceanport, New Jersey surrounded by a bunch of CommVault Kool-Aid drinking, Frank Slootman hated, but seriously technical people that knew their product very well. Over those two days, I had every question I had about CommVault answered, and one of the questions was: “Why the heck would you want to dedupe to tape?”
In case anyone’s interested, I don’t get paid to blog and this is no exception. CommVault is not paying me to write this.
My position on deduping to tape has been a consistent one: unconvinced. I’ve read Dave West’s blog entries about it and seen some of their sales presentations on it, and I’ve always responded with the following thought: if I dedupe to tape, I’m going to need multiple tapes to restore one file! I don’t care how much money I save, that’s going to have a significant impact on restore performance and I’m just not interested.
They took a tack that I didn’t expect: they agreed with me. No one I talked to at CommVault’s corporate headquarters wanted to do restores from deduped tape. Now that I didn’t expect.
First let me explain how their dedupe to tape works. If you’re going to dedupe to tape, you first have to dedupe to disk. You create what they call a silo on disk, which is a full backup and a set of deduped incrementals based on (and deduped against) that full backup. The retention on that silo should be long enough to satisfy most of your operational restore requests. (Typically that’s 30 days, but it could be longer in your environment.)
Once the silo’s time period is passed, they migrate the previous silo to tape. Once the silo has been migrated to tape, it is deleted from disk to make room for new backups. The idea is that most restores should come from disk, but in the rare case that you would need to restore something that you don’t have on disk, they can get it from tape.
They would have to load multiple tapes to restore a single file, but they don’t have to read all those tapes. They track a file’s locations on tape to a much more granular level than most products, so they just have to do a lot of fast-forwarding.
Everyone (including the CommVault folks) agrees that no one would want to do any significant portion of their restores from deduped tape. But I also agree that if I typically do all my restores from within the last 30 days, and someone asks me for a 31 day-old file, it’s generally going to be the type of restore where the fact that it might take several minutes to complete is not going to be a huge deal. (In the case that you did need to do a large restore from a deduped tape set, you could actually bring it back in to disk in its entirety before you initiate the restore.)
Now here’s the business case. Anyone who has done consulting in this business for a while has met the customer where everyone knows that 99% of the restores come from the last 30-60 days — and yet they keep their backups for 1-7 years. What a waste of resources. CommVault is saying, “Hey. If you’re going to do that, at least dedupe the tapes.” They showed me two business cases from two customers that doing this was saving them over $500K per year in their Iron Mountain bill. WOW.
Call me convinced.
----- Signature and Disclaimer -----
Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technologist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.