I’ve been following cloud backup vendors (e.g. Mozy, Carbonite, Crashplan) quite closely — and am generally a big fan — but have not spent a lot of time looking at primary cloud vendors. That is, I haven’t spent much time looking at those who would like you to store the only copy of a given piece of data on their storage. Vendors like Amazon, Iron Mountain, and Nirvanix want you to put things like your “persistent” data in their cloud and claim that they can store this data for you cheaper than you can. Some of these vendors are telling potential customers that the data in their cloud doesn’t need to be backed up, because they’re replicating it all over the place. I’ve got one word for that: balderdash.
Again, I am not talking about backup vendors. I’m talking about cloud storage vendors, which are a very different beast. And what I’m finding is that they all (with one exception) seem to think that having your data replicated to multiple locations is good enough. You don’t need “backup” if you’re doing that — or so they say. I say otherwise.
Remember that replication constantly replicates data from one place to another. You change/delete a file on your replicated storage, and that change/deletion is immediately replicated to anywhere else you are replicating storage. That means if you’re replicating it to 10 different places and you delete a file, it will immediately get deleted from all 10 places.
Since 90% or more of restores are done due to accidental erasure/corruption of data (rather than disk loss), how does replication protect you from that? It doesn’t! You accidentally delete your favorite spreadsheet and the replication system will replicate that deletion as fast as you can say “oops!” Get a virus in a file? Guess what? So does your replicated copy! The only thing replication protectst you from is a failed RAID array and a geographical disaster, both of which hardly ever happen. (I’m not saying they never happen and you DO have to protect against those things but what about the thing that happens all the time — stupid user errors.)
The only cloud storage vendor I’ve found that has an answer to this question is Iron Mountain, as they are apparently using NetApp storage. NetApp storage has snapshots built into their system, so you can easily change directory to ~/snapshot and grab that file you just messed up. But all the other cloud storage vendors (that I know about) have decided that “regular” storage (like NetApp) is too expensive to use for the cloud, so they built their own purpose-built platform to do it. This platform has replication and security and all that stuff built into, but it doesn’t have snapshots — and they’re not backing it up!
Hey cloud storage vendors: if you also do more than replication, let me know and I’ll be glad to update this article. (There’s got to be more than just Iron Mountain.)
They argue that backups (as I’m defining them) cost too much. No one else is doing backups, and if they did them they wouldn’t be cost competitive. No wonder they’re cheaper! They don’t back it up!
There is no way in this world that I am putting a single Gigabyte of my data on a cloud storage vendor that thinks replication is backup.
----- Signature and Disclaimer -----
Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technologist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.