Roaming the aisles of Dreamforce 2014 taught me one thing: backups are here to stay. You can move everything you can into the cloud, but your data still has to be protected against natural and human-created disasters. Moving it to the cloud doesn’t change that.
I’ve always felt that way, but I thought for a while that maybe I was just a lone reed in the wind; only I was worried about data that had been moved to the cloud. Everyone else was happy with the backups of their mission-critical data being put into the hands of the cloud provider.
It was with some joy that I welcomed Backupify to the salesforce.com world when I first heard about them a few years ago. (To my knowledge, they were the first vendor to offer backup of your salesforce.com data, and the first to backup Facebook, Gmail, and others.) But I wondered whether or not there would be enough people concerned about their cloud-based data to justify adding that expense to their cloud infrastructure bill. They might think, for example, that a company the size of salesforce.com is backing up their data – so why should they pay to do it as well. Only time would tell.
Walking around Dreamforce 2014, though, put my fears to rest. There were three other companies exhibiting backup solutions for salesforce.com (that I could see), and there are a few others that I found via a simple “backup salesforce” search. By the way, I’ll cover these companies in another post.
The key concept I wanted to cover here is that some people believe that by moving their data to the cloud, it’s automatically going to get backed up. That simply isn’t the case.
Consider salesforce.com, for example. It is well documented that they back up your data – but not so you can restore it! Their backup is for them to restore a datacenter that gets destroyed by a disaster, malicious attack, or even just plain human error of one of their many humans. However, if you need to use that backup to restore your salesforce instance due to error on your end, it will cost you a minimum of $10,000, and it is a best effort restore that might take several days. In addition, it’s an all-or-nothing restore, so you are forced to roll back your entire salesforce instance to the last good backup they took, which could be several days ago! Suffice it to say that relying on this service is a really, really bad idea.
This is still better than Amazon.com. They do not back up customer data at all. Their method of protecting against disasters is to replicate everything all over the place. However, if something catastrophic happens on your end, their replication will simply make it more catastrophic by immediately replicating it to multiple locations. There is no way to recover your AWS instance if you or someone else manages to take it you. If you don’t believe me, read my post about the death of codespaces.com.
The general rule is that backup of the data you place in the cloud is your responsibility – just like it is in the datacenter. Moving it to the cloud does not change that.
The first thing you need to do is to figure out what data you actually have in the cloud. Good luck with that. I’ve got some ideas, but we’ll save those for another post.
The next thing you need to do is find out what the cloud vendor’s policies are in this area. Do they backup your data at all, or are backups entirely your responsibility? Please note that I believe that backups are entirely your responsibility, I just want to know if you’re going to get any help from them or not in meeting that responsibility. Even if you develop your own backup system, it would be nice to know whether or not there is a Plan B.
If they do backup your data, are you allowed to use it? If so, is there an extra fee like salesforce.com, or can you use it at will? It would be really nice to test this backup once in a while so you know how it will work when and if you need it. But you’re not going to test a backup that costs $10K just to try it.
Finally, since the goal here is to have your own independent backup, make sure to investigate the feasibility and costs of doing that. With salesforce.com, you’ll probably need more API calls, as a regular backup is likely to exceed your base amount. With hosting providers, you’re talking about bandwidth. How much will it cost to perform your first off-host backup of your data, and how much will each incremental backup cost you? You need to know these numbers before investigating alternatives.
If you’re talking a hosted system of any kind, whether a physical machine in a colo somewhere or a VM inside AWS, you need to find out if regular backup software will run inside that machine, or if you are prevented in any way from running a backup application in that machine. This could be anything from “we have a customized Linux kernel that doesn’t run regular apps” to “you are not allowed to make outgoing connections on non-standard ports.” Find out the answers to these questions now.
If we’re talking about an application like salesforce, you can start by googling “backup application name.” If you do that with salesforce, you will find several apps that you can investigate and compare the pricing for. You will find that each has set their pricing structure so they are more or less attractive to small or larger instances. For example, they may have a base price that includes 50 users. That’s great if you have 50 users, but not if you have 5. If you have 500 users, though, you might not want an app that charges by individual user if they don’t start giving discounts at larger numbers.
If you’re talking any kind of hosted system running Windows or Linux, you can use most any cloud backup application that uses either source deduplication, continuous data protection (CDP), or near-CDP (otherwise known as snapshots and replication). This is because after the first full backup is done, each of these will only send new, unique blocks every time they backup. Since you are likely paying your cloud provider by the bit, this is both financially wise and doesn’t put you at odds with physics.
If you find yourself running an app that there is no way to backup, see if there is an API that can be used to get some of the data out. For example, even though there are several apps that backup salesforce, what if there weren’t? There are other apps that can connect via the API to at least grab your leads and contacts and put them into other systems such as databases or even spreadsheets. It would be better than nothing if you found yourself running such an app that did not have any automated backup options.
Speaking of that, it’s not really a backup if it’s not automated, and it also needs to be stored in some system other than where the primary data is stored. Again, I hate to keep using salesforce.com as an example, but please don’t tell me you do a weekly manual export of your various salesforce object using Dataloader. That is better than nothing, but not by much. Too much human involvement means too much chance for human error. Automate it and get it offsite.
Just do it
I can’t explain all the options in an article like this, but I can hopefully get you thinking and asking questions about this. Is your salesforce.com data being backed up? What about those apps you have running in a Linux VM in AWS? You can’t fix what you don’t acknowledge, so it’s time to start looking.
----- Signature and Disclaimer -----
Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.