Is AWS Ready for Production Workloads?

Yes, I know they’re already there.  The question is whether or not Amazon’s infrastructure is ready for them.  And when I mean “ready for them,” I mean “ready for them to be backed up.”  Of course that’s what I meant.  This is, right?

But as I prepare to go to Amazon Re:Invent after Thanksgiving, I find myself asking this question. Before we look at the protections that are avaialable for AWS data, let’s look at why we need them in the first place.

What are we afraid of?

There is no such thing as the cloud; there is only someone else’s datacenter.  The cloud is not magic; the things that can take out your datacenter can take out the cloud.  Yes, it’s super resilient and time-tested.  I would trust Amazons’ resources over any datacenter I’ve ever been in.  But it’s not magic and it’s not impenetrable – especially by stupidity.

  • Amazon zone/site failure
    • This is probably the thing Amazon customers are most prepared for.  All Amazon resources are continuously replicated to three geographically dispersed locations.  Something like 9/11, or even a massive hurricane or flood, should not affect the availability or integrity of data stored in AWS.  Caveat: replication is asynchronous, so you may lose some data.  But you should not lose your dataset.
  • Accidental deletion/corruption of a resource
    • People are, well, people. They do dumb things.  I’ve done dumb things. I can’t tell you the number of times I’ve accidentally deleted something I needed. And, no, I didn’t always have a backup.  Man, it sucks when that happens.  Admins can accidentally volumes, VMs, databases, and any kind of resource you can think of.  In fact, one could argue that virtualization and the cloud make it easier to do more dumb things.  No one ever accidentally deleted a server when that meant pulling it out of the rack.  Backups protect against stupidity.
  • Malicious damage to a resource
    • Hackers suck. And they are out there. WordPress tells me how many people try to hack my server every day.  And they are absolutely targeting companies with malware, ransomware, and directed hacking attacks.  The problem that I have with many of the methods that people use to protect their Amazon resources is that they do not take this aspect into account  – and I think this danger is the most common one that would happen in a cloud datacenter.  EC2 snapshots and RDS snapshots (which are actually copies) are stored in the same account they are backing up.  It takes extra effort and extra cost to move those snapshots over to another account.  And no one seems to be thinking about that.  People think about the resiliency and protection that Amazon offers – which it does – but they forget that if a hacker takes control of their account they are in deep doodoo.  Just ask  Oh wait, you can’t.  Because a hacker deleted them.
  • Catastrophic failure of Amazon itself
    • This is extremely unlikely to happen, but it could happen. What if there were some type of rolling bug (or malware) that somehow affected all instances of all AWS accounts.  Even cross-account copies of data would go bye-bye.  Like I said, this is extremely unlikely to happen but it’s out there.

How do we protect against these things?

I’m going to write some other blog posts about how people protect their AWS data, but here’s a quick summary.

  • Automated Snapshots
    • As I said before, these aren’t snapshots in the traditional sense of the word.  These are actually backups.   You can use the AWS Ops Automator, for example, to regularly and automatically make a “snapshot” of your EC2 instance.  The first “snapshot” copies the entire EBS volume to S3.  Subsequent “snapshots” are incremental copies of blocks that have changed since the last snapshot.  I’m going to post more on these tools later.  Suffice it to say they’re better than nothing, but they leave Mr. Backup feeling a little queasy.
  • Manual copying of snapshots to another account
    • Amazon provides command-line and Powershell tools that can be used to copy snapshots to another account.  If I was relying on snapshots for data protection, that’s exactly what I would do.  I would have a central account that is used to hold all my snapshots, and that account would be locked down tighter than any other account. The downside to this tool is that it isn’t automated.  We’re now in scripting and manual scheduling land. For the Unix/Linux folks among us this might be no big deal. But it’s still a step backward for backup technology to be sure.
  • Home-grown tools
    • You could use rsync or something like that to backup some of your Amazon resources to something outside of Amazon.  Besides relying on scripting and cron, these tools are often very bandwidth-heavy, and you’re likely going to pay heavy egress charges to pull that data down.
  • Third-party tools
    • For some Amazon resources, such as EC2, you could install a third-party backup tool and backup your VMs as if they were real servers.  This would be automated and reportable, and probably the best thing from a data protection perspective. The challenge here is that this is currently only available for EC2 instances.  We’re starting to see some point tools to backup other things that run in AWS, but I haven’t seen anything yet that tackles the whole thing.

So is it ready?

As I said earlier, an AWS datacenter is probably more resilient and secure than most datacenters.  AWS is ready for your data. But I do think there is work to be done on the data protection front.  Right now it feels a little like deja vu.  When I start to think about shell scripts and cron, I start feeling like it’s the 90s.  It’s been 17 years since I’ve revisited, the tool I wrote to automatically backup filesystems on a whole bunch of Unix systems.  I really don’t want to go back to those days.

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Leave a Reply

Your email address will not be published. Required fields are marked *