Login Form

About Me

RSS Feed

Mr. Backup Mr. Backup

Cloud vendors: Replication is not backup


I've been following cloud backup vendors (e.g. Mozy, Carbonite, Crashplan) quite closely -- and am generally a big fan -- but have not spent a lot of time looking at primary cloud vendors. That is, I haven't spent much time looking at those who would like you to store the only copy of a given piece of data on their storage. Vendors like Amazon, Iron Mountain, and Nirvanix want you to put things like your "persistent" data in their cloud and claim that they can store this data for you cheaper than you can.  Some of these vendors are telling potential customers that the data in their cloud doesn't need to be backed up, because they're replicating it all over the place.  I've got one word for that: balderdash.

Again, I am not talking about backup vendors.  I'm talking about cloud storage vendors, which are a very different beast.  And what I'm finding is that they all (with one exception) seem to think that having your data replicated to multiple locations is good enough.  You don't need "backup" if you're doing that -- or so they say.  I say otherwise. 

Remember that replication constantly replicates data from one place to another.  You change/delete a file on your replicated storage, and that change/deletion is immediately replicated to anywhere else you are replicating storage. That means if you're replicating it to 10 different places and you delete a file, it will immediately get deleted from all 10 places.

Since 90% or more of restores are done due to accidental erasure/corruption of data (rather than disk loss), how does replication protect you from that?  It doesn't!  You accidentally delete your favorite spreadsheet and the replication system will replicate that deletion as fast as you can say "oops!"  Get a virus in a file?  Guess what?  So does your replicated copy!  The only thing replication protectst you from is a failed RAID array and a geographical disaster, both of which hardly ever happen. (I'm not saying they never happen and you DO have to protect against those things but what about the thing that happens all the time -- stupid user errors.)

The only cloud storage vendor I've found that has an answer to this question is Iron Mountain, as they are apparently using NetApp storage. NetApp storage has snapshots built into their system, so you can easily change directory to ~/snapshot and grab that file you just messed up. But all the other cloud storage vendors (that I know about) have decided that "regular" storage (like NetApp) is too expensive to use for the cloud, so they built their own purpose-built platform to do it. This platform has replication and security and all that stuff built into, but it doesn't have snapshots -- and they're not backing it up!

Hey cloud storage vendors: if you also do more than replication, let me know and I'll be glad to update this article.  (There's got to be more than just Iron Mountain.)

They argue that backups (as I'm defining them) cost too much.  No one else is doing backups, and if they did them they wouldn't be cost competitive.  No wonder they're cheaper!  They don't back it up!

There is no way in this world that I am putting a single Gigabyte of my data on a cloud storage vendor that thinks replication is backup. 


0 #14 Online Backup Servic 2012-02-09 16:51
The business case for storage virtualization software isn’t new. System administrators have always sought best of breed data storage from a choice of storage vendors and a centralized management framework. With a variety of data storage devices available today, ranging from high-performance SSD and Flash cards to low-cost SATA arrays, a seamless way to automatically tier data across the diversity of storage devices and purpose-built arrays can maximize utilization and cost-efficiency.
0 #13 PR Distribution 2011-07-06 10:50
I followed the vendors for the Protection of the clouds (like Mozy, Carbonite, CrashPlan) very closely - and I am a big fan in general - but I have not spent much time looking at the primary cloud providers.
0 #12 Ray Lucchesi 2010-02-03 19:29
Curtis, my original post on this matter was due to talking with at least two cloud storage providers in under two weeks which mentioned they had no need for backup due to replication. As I couldn't understand how this was feasible I thought it deserved some discussion. I got quite a few comments to my post as did you.

The net of this was that (cloud) replication is not backup and never can be as long as you need to factor in user errors, virus infections and anything else that would constitute a rolling disaster.

If interested see my original post and comments at silvertonconsulting.com/blog/2009/07/28/does-cloud-storage-need-backup/.

Thanks for the link.
0 #11 Josh Goldstein 2010-01-22 18:02
Full disclosure - I am VP of Marketing for Cirtas, a storage vendor solving the very problems (and others) with the public cloud that Curtis wrote about.

Joseph's comments about the TCO study don't surprise me. The cloud as a backup destination instead of tape, disk, or VTL might make economic sense and it might not. It depends on your particular environment. I use Mozy to backup my home computer to the cloud. It gives me a cheap, off-site copy of my data. But if I had a 25TB storage array at home with corporate RPOs and RTOs, this model wouldn't make sense.

The key is to think about ways to utilize the cloud to provide the same data protection mechanisms you get with traditional backup processes, but without the cost and complexity. What's needed is the ability to have your primary instance of data stored in the cloud and innately protected against the same gotchas that traditional backups guard against, but without having to perform them.
+1 #10 joseph martins 2010-01-22 17:36
The money quote from your post is this: "They argue that backups cost too much. No one else is doing backups, and if they did them they wouldn't be cost competitive. No wonder they're cheaper! They don't back it up!"

Two years ago we did a TCO comparison study of in-house tape, disk and off-site cloud "backup" (for a "cloud" vendor). We had concluded that, with the expense of an adequate backup plan (among other things), the cost of the cloud option would not be significantly different from traditional in-house alternatives in the long run. The study projected the costs over a period of 5 years.

They seemed shocked and were quite unhappy with the results. The MBA they assigned to the project disagreed with our numbers and our calculations - and went so far as to cite the inflated numbers used by other analyst firms as if to imply that we're idiots. They refused to pay...as if they're not obligated to pay for our work simply because we're not drinking the same kool aid. Believe it or not, I had to have our legal counsel send a demand letter to the vendor in order to collect payment for our work. I wasn't about to let it slide. I can only imagine the hell a customer will go through dealing with them in the event of a severe data loss.

Anyway, all this because - at the time - they failed to recognize that what they were doing was a) not genuine backup, b) still required backup on their end, and c) the added cost of actually backing up on their end stripped away much of a customer's cost benefit for using their service.

We are now working on an update of that study using current numbers. It'll be interesting to see how things work out. I suspect the cost for each method will not be significantly different relative to the others.

It's a case of client beware - the devil is in the numerous hidden costs of cloud services.
0 #9 W. Curtis Preston 2010-01-22 17:25
It sounds like you just want control, reporting, management, etc. I'd argue that their Data Protection Manager product has come a long way towards giving you waht you want. But I also still want more. But then I again I still want more from NBU/NW/TSM... ;-)

I don't know how you could get more granular than what NetApp does, as far as recoveries are concerned. They even let you do file level restores of VMs you snapped with SM4VI without having to mount the vmdk.
0 #8 Stephen Foskett 2010-01-22 16:46
I'm not saying that snapshots and replication can't be the storage platform for backups. Yes, that's exactly what I use! Time Machine to a Drobo and rsynced to replicated cloud storage.

What I'm saying is that snapshots are "part of this nutritious breakfast" and do not alone make up a backup solution. They're merely a technology for data protection, not a complete solution...

Yes, some people (you, me) can manually put together a real solution with snaps and replication. But it's nothing we should be out there praising. I'm glad NetApp can protect data, but I'd like it better if it was integrated into a real backup system with some concept of data being protected, better granularity, etc...
0 #7 W. Curtis Preston 2010-01-22 15:03
First, absolutely. It meets the definition and the sniff test, AFAIC. I even refer to snapshots and replication as "near-CDP." (A term that 'zilla and some others hate, but that's a different argument.)

Yes, I know they need to be managed, reported on, and all that. So does everything.

I used to fight this a lot with NetApp. I used to say "I don't care how many replicated snapshots you have, I want it in one other form." My problem was: what happens if you get a rolling code disaster? NetApp releases a code version that wipes out your primary, all its snapshots, and the secondary, and so on. After long consideration, I've come to believe that even this concern can be mitigated by having at least one copy that isn't kept constantly up to date. For example:

1. Primary filer w/snapshots
2. Filer 1 is replicated to another filer onsite for HA/BC purposes
3. Filer 2 is replicated offsite to another filer

A delay can be put into Filer 3's replication so that it's only replicated to in batches every few hours so that even the worst case doesn't affect it. But you don't wait too long in between updates because it shouldn't be TOO out of date. Code updates are done in stages so you never wipe-out the whole thing in one shot, etc.

NetApp loves this story (and so do other vendors that don't use copy-on-write snapshots), because they can do a complete DP story -- including dedupe and as much retention as you're willing to buy disk for -- in a single product line. EMC hates this story because they can't do it. They (and many other vendors) use copy-on-write snapshot technology. COW snapshots by design degrade performance as you have more and more of them and keep them for longer periods of time. (One EMC SE told me that if I kept 90 days of snaps on a Celerra, the performance would drop by at least half.)

Here's what I know. In my house I have 4 TBs of DVD and Blu-Ray images. I use rsync and snapshots to back them up. See blog post http://www.backupcentral.com/content/view/282/47/. In fact, Time Machine (which I believe you're a fan of) is basically replication with snapshots. So are you saying that Time Machine is not a backup?

I get why EMC doesn't like me calling snapshots and replication a backup. What recovery scenario doesn't it address in your opinion?

BTW, what does that mean "merely a data protection technology?" Traditional backups are "merely a data protection technology." So?
0 #6 Stephen Foskett 2010-01-22 14:11
I have a question.

On my blog, you said "I will count read-only, replicated snapshots as backup, BTW." Do you really consider simple read-only replicated snapshots to be sufficient backups? Yes, they meet the SNIA definition, but do they meet your gut feel definition?

I do not consider read-only replicated snapshots alone to be anything close to acceptable backup. In fact, they're so far from a real solution (being merely a data protection technology) that they might lull one into a false sense of security! I believe backup is not just a copy of data.

What do you think?
0 #5 W. Curtis Preston 2010-01-22 13:43
@Stephen Foskett

If you have read-only snapshots and they are replicated, I feel just fine calling them a "backup." It meets the SNIA definition and it meets the requirements that backup is made for. You mention in your blog that they must be managed. Well, of course.

Sponsored Links