When I first heard about the EMC disk archiver, I blew my stack. I don’t remember exactly how it was presented to me, but what I heard was that EMC was coming out with a disk product that was designed to hold backups for seven years or more. Since storing backups for seven years or more is fundamentally wrong (and no one — and I mean no one — argues with that), the idea that EMC was coming out with a product that was designed specifically to do that angered me. Brian Biles, VP of Product Management for EMC’s BRS division, said with a wry smile, “so you’re saying we’ve become a tobacco company.”
I replied saying, “No, you’ve become a cigarette case manufacturer. You shouldn’t smoke, kids, but here’s a really pretty gold case to hold your ciggies in.” I had a similar conversation with Mark Twomey (@storagezilla) on Twitter.
Since that time, I have come to a detente. I still wouldn’t buy one of these for my long term storage needs, but I can see why some other people might want to do so — and I don’t think those people are wrong or committing evil or data treason. This blog post is about how I got here from there.
Here were my arguments against this product:
There’s no way that this could cost less than tape
Some of the messaging that I saw for the Archiver suggested that it was as affordable as tape. That’s simply not possible. First, let’s talk about what we’re competing with. (For these comparisons, I am assuming you have either a tape system or a Data Domain box, and that what we’re talking about is adding the cost of extra capacity to support long term storage of backups or archives.)
A backup or archive that is kept for that long is not kept in the tape library; it’s put on a shelf. (This is because chances are that it’s never going to be read from.) Therefore, the cost for tape is about $.02/GB, which is the cost of an LTO-5 tape cartridge. The daily operational cost of that tape’s existence is negligible, assuming it’s onsite.
The last time I checked target dedupe appliances, they were about $1/GB after discounting. I also saw a slide that this archiver is supposed to be about 20% cheaper than a regular Data Domain. That puts it at around $.80/GB — 40 times greater than the cost of a tape on a shelf. And the daily operational cost of that disk is higher than the tape because it is going to be powered on. (The Archiver does not currently support powering down unused shelves, although it may in the future.)
Then there is the issue of dedupe ratio. The deduped disk price above is assuming a 20:1 dedupe ratio. Dedupe ratios do not go up over time; they actually decrease. This is because eventually we start making new data. (The full backup you take today is going to contain quite a bit of new data when compared to the full backup from a year ago.) Then there’s the fact that the Archiver needs to start each tier (a collection of disks) with a new full backup, thus decreasing the overall dedupe ratio of the entire unit. (It must do this in order to keep each tier self-contained.) The result is that you will probably get a much lower dedupe ratio on your long term data than on your short-term data. This increases your cost.
If you’re going to do the right thing and use archive software to store data for several years (instead of backup software), any good archive software has single-instance-storage. So if you’re using archive software, you’re going to get an even lower dedupe ratio.
Which brings me back to my belief that there is no way this can be anywhere near as inexpensive as tape.
The good news is that I didn’t hear EMC saying that the Archiver is as cheap as tape when I saw them speak about it at EMC World. When I talked to the EMC people at the show, I told them I had heard stories of EMC sales reps showing this unit cheaper than tape by using dedupe ratios of 100:1. (The idea is that you’re going to store 100 copies of the same full backups.) They told me that any sales rep quoting ratios like is not speaking on behalf of EMC and talking out of his … Well, you know.
There’s nothing that this unit offers that justifies that difference in price
Disk offers a lot of advantages when used for day-to-day backups. It’s a whole lot easier to stream during both backups and restores. There is no question that it adds a lot of value there. However, the idea of backups or archives that are stored long term is that no one reads them. If they are reading them, it’s for an electronic discovery request, where the amount of time you have to retrieve that is much greater than the time you typically have for a restore. This increased amount of time is easily met with tape as your storage medium. Disk offers no real advantage here.
When I said this, Mark Twomey pointed out that this unit offers regular data integrity checking of backups stored on it. I informed him that if this were important, there are now two tape library manufacturers (Quantum & Spectralogic) that will be glad to do this for your tapes.
I will concede that disk does offer an advantage if you’re using backups as your archives. Having backups that will load instantly helps mitigate the issue of how many restores you’re going to be doing to satisfy a complicated ediscovery request.
It’s just wrong to store backups for many years
You should not be using your backups as archives. You should not be using backups as archives. If you ever get an ediscovery request for all of Joe Smith’s emails for the last seven years — and you happen to have a weekly full for each of the 364 weeks of that time frame — you will remember what I said.
The thing is that EMC agrees. In fact, the EMC Archiver presentation starts with a few slides about how you should be doing real archiving; you should not be using your backups as archives.
They also said that they see this device as a transition device that can store both backups and archives. Just because this device can store backups doesn’t mean you have to store backups on it. You can use proper archive software. (But, if you did, I once again point out that your dedupe ratio will go down and therefore your effective cost per GB will go up.)
So what’s changed, then?
I had a number of good conversations with EMC folks at last week’s EMC World. (Which, for the record, was a really big show.) Some of those comments are above. They know that this is not going to be cheaper than tape, and they’re saying that anyone that is saying that is not being truthful. They know that storing backups for years is wrong; they also know that more than half of the world does it that way.
The reason for the detente, however, is that I realize that many people hate tape. I think they’re wrong, as I’ve stated more than a few times. There are plenty of IT departments that have a “get rid of tape” edict. If the goal is to get rid of tape, the fact that the alternatives are much more expensive is not really an issue. And if you’re going to store backups for a really long time on disk, then at least EMC put some thought into what a disk system would need to do in order to do that right. This includes things like fault isolation. If you lose one tier for whatever reason, you only lose the data on that array. It includes things like scanning data occasionally to make sure it’s still good.
Finally, Index Engines also announced an important product at EMC World that will help increase the value of the Archiver for those using it to store backups. They already have a box that can scan tape backups and basically turn them into archives. (One of the coolest products I’ve ever seen, BTW.) They now support NFS, so you can point an Index Engines box at a DD Archiver and voila! Those backups that you are storing on disk magically become fully searchable, ediscovery-ready archives.
Don’t use your backups as archives. Use archive software instead. Tape is still the most economical destination for long term storage of backups or archives, and it’s a pretty reliable one, too. However, if you’re going to store your backups or archives on disk for many years, there are worse places to put them than the EMC Data Domain Archiver.
----- Signature and Disclaimer -----
Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.