Is your data in the cloud getting backed up?

Roaming the aisles of Dreamforce 2014 taught me one thing: backups are here to stay.  You can move everything you can into the cloud, but your data still has to be protected against natural and human-created disasters.  Moving it to the cloud doesn’t change that.

I’ve always felt that way, but I thought for a while that maybe I was just a lone reed in the wind; only I was worried about data that had been moved to the cloud.  Everyone else was happy with the backups of their mission-critical data being put into the hands of the cloud provider. 

It was with some joy that I welcomed Backupify to the world when I first heard about them a few years ago.  (To my knowledge, they were the first vendor to offer backup of your data, and the first to backup Facebook, Gmail, and others.)  But I wondered whether or not there would be enough people concerned about their cloud-based data to justify adding that expense to their cloud infrastructure bill.  They might think, for example, that a company the size of is backing up their data – so why should they pay to do it as well.   Only time would tell.

Walking around Dreamforce 2014, though, put my fears to rest.  There were three other companies exhibiting backup solutions for (that I could see), and there are a few others that I found via a simple “backup salesforce”  search.  By the way, I’ll cover these companies in another post.

The key concept I wanted to cover here is that some people believe that by moving their data to the cloud, it’s automatically going to get backed up.  That simply isn’t the case.

Consider, for example.  It is well documented that they back up your data – but not so you can restore it!  Their backup is for them to restore a datacenter that gets destroyed by a disaster, malicious attack, or even just plain human error of one of their many humans.   However, if you need to use that backup to restore your salesforce instance due to error on your end, it will cost you a minimum of $10,000, and it is a best effort restore that might take several days.  In addition, it’s an all-or-nothing restore, so you are forced to roll back your entire salesforce instance to the last good backup they took, which could be several days ago!  Suffice it to say that relying on this service is a really, really bad idea.

This is still better than  They do not back up customer data at all.  Their method of protecting against disasters is to replicate everything all over the place. However, if something catastrophic happens on your end, their replication will simply make it more catastrophic by immediately replicating it to multiple locations.  There is no way to recover your AWS instance if you or someone else manages to take it you.  If you don’t believe me, read my post about the death of

The general rule is that backup of the data you place in the cloud is your responsibility – just like it is in the datacenter.  Moving it to the cloud does not change that.


The first thing you need to do is to figure out what data you actually have in the cloud.  Good luck with that.  I’ve got some ideas, but we’ll save those for another post.

The next thing you need to do is find out what the cloud vendor’s policies are in this area.  Do they backup your data at all, or are backups entirely your responsibility?  Please note that I believe that backups are entirely your responsibility, I just want to know if you’re going to get any help from them or not in meeting that responsibility.  Even if you develop your own backup system, it would be nice to know whether or not there is a Plan B.

If they do backup your data, are you allowed to use it?  If so, is there an extra fee like, or can you use it at will?  It would be really nice to test this backup once in a while so you know how it will work when and if you need it.  But you’re not going to test a backup that costs $10K just to try it.

Finally, since the goal here is to have your own independent backup, make sure to investigate the feasibility and costs of doing that.  With, you’ll probably need more API calls, as a regular backup is likely to exceed your base amount.  With hosting providers, you’re talking about bandwidth.  How much will it cost to perform your first off-host backup of your data, and how much will each incremental backup cost you?  You need to know these numbers before investigating alternatives.

If you’re talking a hosted system of any kind, whether a physical machine in a colo somewhere or a VM inside AWS, you need to find out if regular backup software will run inside that machine, or if you are prevented in any way from running a backup application in that machine.  This could be anything from “we have a customized Linux kernel that doesn’t run regular apps” to “you are not allowed to make outgoing connections on non-standard ports.”  Find out the answers to these questions now.

Examine alternatives

If we’re talking about an application like salesforce, you can start by googling “backup application name.”  If you do that with salesforce, you will find several apps that you can investigate and compare the pricing for. You will find that each has set their pricing structure so they are more or less attractive to small or larger instances.  For example, they may have a base price that includes 50 users.  That’s great if you have 50 users, but not if you have 5.  If you have 500 users, though, you might not want an app that charges by individual user if they don’t start giving discounts at larger numbers.

If you’re talking any kind of hosted system running Windows or Linux, you can use most any cloud backup application that uses either source deduplication, continuous data protection (CDP), or near-CDP (otherwise known as snapshots and replication).  This is because after the first full backup is done, each of these will only send new, unique blocks every time they backup.  Since you are likely paying your cloud provider by the bit, this is both financially wise and doesn’t put you at odds with physics.

If you find yourself running an app that there is no way to backup, see if there is an API that can be used to get some of the data out.  For example, even though there are several apps that backup salesforce, what if there weren’t?  There are other apps that can connect via the API to at least grab your leads and contacts and put them into other systems such as databases or even spreadsheets.  It would be better than nothing if you found yourself running such an app that did not have any automated backup options.

Speaking of that, it’s not really a backup if it’s not automated, and it also needs to be stored in some system other than where the primary data is stored.   Again, I hate to keep using as an example, but please don’t tell me you do a weekly manual export of your various salesforce object using Dataloader.  That is better than nothing, but not by much.  Too much human involvement means too much chance for human error.  Automate it and get it offsite.

Just do it

I can’t explain all the options in an article like this, but I can hopefully get you thinking and asking questions about this.  Is your data being backed up? What about those apps you have running in a Linux VM in AWS?  You can’t fix what you don’t acknowledge, so it’s time to start looking.






Continue reading

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Evangelist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Is a Copy a Backup?

Are we breaking backup in a new way by fixing it?  That's the thought I had while interviewing Bryce Hein from Quantum. It made me think about a blog post I wrote four years ago asking whether or not snapshots and replication could be considered a backup.  The interview is an interesting one and the blog post has a lot of good points, along with quite a bit of banter in the comments section.
What I mean when I say, "is a copy a backup" is this: traditionally, a "backup" changed form during the backup process.  It was put into tar/cpio/dump format, or the format of some commercial backup tool.  In this process, it made it slightly harder for it to be monkeyed with by a black hat.
I'm a fan of putting operational backup and recovery on disk.  I'm even a bigger fan of backing up in such a way that a "recovery" can simply be done by using the backup as the primary while the real primary is being repaired.  It offers the least amount of downtime in some sort of disaster.

But this does beg the question of whether or not leaving the backup in the same format as the original leaves it vulnerable in some way that putting it into a backup format doesn't.  I think the answer is a big fat no.  Specifically, I'd say that a copy is no more of less susceptible than a file on disk that's in some kind of "backup" format.  Either one could be deleted by a malicious admin, unless you were storing it on some kind of WORM filesystem.  The same is true of backups stored on tape.  If someone has control of your backup system, it doesn't take a rocket scientist to quickly relabel all your tapes, rendering them completely useless to your backup system.

As mentioned in my previous post on snapshots and replication, what makes something a backup (versus just a copy) is not its format.  The question is whether or not it has management, reporting, and cataloging built around it so that it is useful when it needs to be.

In that sense, a CDP or near-CDP style backup is actually more of a backup than a tar tape, assuming the tar tape is just the result of a quick tar command.  The tar tape has not management, reporting, or cataloging, other than what you get on the tape itself.  

I just want to close out by saying that backup products that are making instant recovery a reality are my favorite kind of products.  These include CDP and near-CDP style products like SimpanaZerto, Veeam, AppAssure, RecoverPoint, and any of the storage array or storage virtualization products that accomplish backup via snapshots and replication. This is the way backup should be done.  Backup continuously or semi-continuously, and recover instantly by being able to use the backup as the primary when bad stuff happens.

One thing's for sure: you can't do that with tape. 😉


Continue reading

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Evangelist at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.