


Written by W. Curtis Preston
Thursday, 21 January 2010 19:25
I've been following cloud backup vendors (e.g. Mozy, Carbonite, Crashplan) quite closely -- and am generally a big fan -- but have not spent a lot of time looking at primary cloud vendors. That is, I haven't spent much time looking at those who would like you to store the only copy of a given piece of data on their storage. Vendors like Amazon, Iron Mountain, and Nirvanix want you to put things like your "persistent" data in their cloud and claim that they can store this data for you cheaper than you can. Some of these vendors are telling potential customers that the data in their cloud doesn't need to be backed up, because they're replicating it all over the place. I've got one word for that: balderdash.
Again, I am not talking about backup vendors. I'm talking about cloud storage vendors, which are a very different beast. And what I'm finding is that they all (with one exception) seem to think that having your data replicated to multiple locations is good enough. You don't need "backup" if you're doing that -- or so they say. I say otherwise.
Remember that replication constantly replicates data from one place to another. You change/delete a file on your replicated storage, and that change/deletion is
immediately replicated to anywhere else you are replicating storage. That means if you're replicating it to 10 different places and you delete a file, it will immediately get deleted from all 10 places.
Since 90% or more of restores are done due to accidental erasure/corruption of data (rather than disk loss), how does replication protect you from that? It doesn't! You accidentally delete your favorite spreadsheet and the replication system will replicate that deletion as fast as you can say "oops!" Get a virus in a file? Guess what? So does your replicated copy! The only thing replication protectst you from is a failed RAID array and a geographical disaster, both of which hardly ever happen. (I'm not saying they never happen and you DO have to protect against those things but what about the thing that happens all the time -- stupid user errors.)
The only cloud storage vendor I've found that has an answer to this question is Iron Mountain, as they are apparently using NetApp storage. NetApp storage has snapshots built into their system, so you can easily change directory to ~/snapshot and grab that file you just messed up. But all the other cloud storage vendors (that I know about) have decided that "regular" storage (like NetApp) is too expensive to use for the cloud, so they built their own purpose-built platform to do it. This platform has replication and security and all that stuff built into, but it doesn't have snapshots -- and they're not backing it up!
Hey cloud storage vendors: if you also do more than replication, let me know and I'll be glad to update this article. (There's got to be more than just Iron Mountain.)
They argue that backups (as I'm defining them) cost too much. No one else is doing backups, and if they did them they wouldn't be cost competitive. No wonder they're cheaper! They don't back it up!
There is no way in this world that I am putting a single Gigabyte of my data on a cloud storage vendor that thinks replication is backup.
Add comment
Comments
Hi there! Do you use Twitter? I'd like to follow you if that would be ok. I'm undoubtedly enjoying your blog and look forward to new updates.
thesis writing
The net of this was that (cloud) replication is not backup and never can be as long as you need to factor in user errors, virus infections and anything else that would constitute a rolling disaster.
If interested see my original post and comments at silvertonconsulting.com/blog/2009/07/28/does-cloud-storage-need-backup/.
Thanks for the link.
Joseph's comments about the TCO study don't surprise me. The cloud as a backup destination instead of tape, disk, or VTL might make economic sense and it might not. It depends on your particular environment. I use Mozy to backup my home computer to the cloud. It gives me a cheap, off-site copy of my data. But if I had a 25TB storage array at home with corporate RPOs and RTOs, this model wouldn't make sense.
The key is to think about ways to utilize the cloud to provide the same data protection mechanisms you get with traditional backup processes, but without the cost and complexity. What's needed is the ability to have your primary instance of data stored in the cloud and innately protected against the same gotchas that traditional backups guard against, but without having to perform them.
RSS feed for comments to this post