What it was like presenting at Cloud Field Day

Presenting at a Gestalt IT “Field Day” (Cloud Field Day in this case) was very different than being a delegate. So I thought I’d blog about it – just like a delegate.

What is Cloud Field Day?

Cloud Field Day is an event put on by Gestalt IT, a company founded by Stephen Foskett (@sfoskett). They put on a variety of “Field Day” events, originally just called Tech Field Day.  They branched out into Storage Field Day, Networking Field Day, Wireless Field Day, and Cloud Field Day.

They bring in a group of 10+ influencers from around the world, each of which has some type of audience. Delegates, as they are called, can be anything from someone with a “day job” in IT who just blogs part time for fun, to someone who is now making money full time as a blogger, speaker, or analyst.  The one thing they have in common is that they are independent; they cannot be employed by a vendor in the space.

I’ve been a delegate to a number of Field Days, and it’s definitely easier being on that side of things. It’s easier to listen to a vendor’s pitch and ask questions than it is to be the vendor making that pitch and answering questions. It’s easy to question why they’re doing something, or to “poke holes” in their strategy.  I can remember a few times where I and my fellow delegates thought the vendor was way off base.  I can even remember one time when it was so bad that the consensus among the delegates was that the start-up in question should immediately go out of business and return any remaining investor money.  (For the record, we were right, and that actually happened to that particular company.)

In all of those field days as a delegate, I was never stressed about being there. It’s quite enjoyable as a delegate.  You’re flown in, driven around in a limo, and constantly fed and catered to.  The only time I remember feeling any “stress” (if I can call it that) is when I found myself on the delivery end of a heated discussion. Even though I felt I was very justified in what I was saying, it’s still stressful being the center of attention in a heated argument that is being streamed live.

Presenting is very different

Being a former delegate made me more nervous, not less. I knew how probing delegates could be. I knew the messages I wanted to get across, but I wondered how those messages would be received.  I also knew that the delegates often drive the presentation, and they can be difficult to “redirect” once they grasp onto a concept they want to discuss.

For example, I watched as one Cloud Field Day 3 presenter before me “lost the room” for quite a while as the delegates debated a related technical topic. I remember thinking how would I handle that if it happened to us. You don’t want to stifle discussion, but you also need to make sure you communicate your message.

We survived

Based on the coverage we received, I think we communicated the core messages we wanted to get across, although it didn’t resonate equally on all ears. Each delegate comes with their own experiences and bias, and you can’t cater to them all.

For example, I think the delegates understood how we are the only data protection product designed for the AWS infrastructure, automatically scaling the resources we use up and down using the AWS native load balancing apps.  We are also the only ones using AWS’ DynamoDB to hold all metadata and S3 to hold all backups. (Other products can copy some backups to S3; we store all of them there.) And I think they understood how that should drastically affect costs that we pass onto the customer.

What I didn’t anticipate was that being designed for the AWS infrastructure would not resonate with those who are proponents of the Azure infrastructure. We were asked why aren’t we running there as well. The answer is two-fold.  Because we are actually designed to use AWS’ native tools (e.g. load balancer, DynamoDB, S3), we can’t just move our software over to Azure. We would need an entirely separate code stack on Azure, so the level of effort is significantly different than those who just run their software in VMs without using native tools. Their approach is more expensive to deliver; ours requires additional coding to move platforms.  Secondly, we just don’t get a big demand for native support in Azure. Most customers don’t care where we run our infrastructure – and don’t have to.  But I can understand how that would fall on deaf ears of an Azure advocate.

But the biggest challenge we ran into with this crowd is not everyone was convinced that backing up to the cloud is the way to go for datacenters. I should have known better, being a former delegate, that technical types are going to think about these things, and we need to address them before they can think about anything else.

If I had a Do-over

I would explain the benefits of our approach before I explain what it was.  I did cover the benefits, but after the architecture. I should have done it the other way around. Cover the problem and what parts of it we solve – then cover how we solve it. Pretty standard stuff really.  But I got excited talking to a technical audience and went technical first. While the field day audience doesn’t want 10 slides on how data is growing, etc, they do want some description of the problem you’re solving before you explain how you solve it.

I would also address the “elephant in the room” first, and explain that our model of backing up everything to the cloud will probably not scale to backup a single datacenter with multiple petabytes. (We do have customers who store double-digit numbers of petabytes with us, but not all from one datacenter.)

I could then explain that we scale farther than you probably think. And since most companies don’t have datacenters like that, why force them to use an approach that is designed for that (onsite hardware). If you could meet your backup and recovery needs without any onsite hardware, why wouldn’t you?

Cloud Field Day is awesome

It’s a bit of a public trial by fire, but it’s a refining fire.  I learned a lot about how to present our solution by presenting at Cloud Field Day. I’d recommend it to everyone.  I know we’ll be back.

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Protect data wherever it lives

It’s more true now than any other time in history: data really can be anywhere.  Which means you need to be able to protect it anywhere and everywhere. And the backup architecture you choose will either enable that process or hamper it.

Data data everywhere, and not a byte to sync

Any product should be able to backup a typical datacenter.  Install a backup server, hookup some disk or tape, install the backup client, and transfer your data.  Yes, I realize I drastically simplified the situation; remember, I did build a career around the complexities of datacenter backup.  I’m just saying that there we have decades of experience with this use case, and being able to backup a datacenter should be table stakes for any decent backup product or service.

The same is mostly true of a datacenter in a collocation/hosting facility.  You can use much of the same infrastructure to back it up.  One challenge will be is that it may be remote from the people managing the backup, which will require a hands-off way of getting data offsite or someone to manage things like tape.  Another challenge can be if the hosted infrastructure is not big enough to warrant its own backup infrastructure.

This is similar to another data source: the ROBO, for Remote Office/Branch Office. While some of them may have enough data to warrant their own backup infrastructure, they usually don’t warrant IT personnel. Historically this meant you either trained someone in the office to swap tapes (often at your own peril), or you hired Iron Mountain to do it for you (at significant cost). Deduplication and backup appliances have changed this for many companies, but ROBOs still plague many other companies who haven’t updated their backup infrastructure.

The truly remote site is a VM – or a bunch of VMs – running in a public cloud provider like AWS, Azure, or Google Cloud. There is no backup infrastructure there, and putting any type of traditional backup infrastructure will be very expensive.  Cloud VMs are very inexpensive – if you’re using them part time.  If you’re running them 24×7 like a typical backup server, they’re going to very expensive indeed. This means that the cloud falls into a special category of truly remote office without backup infrastructure or personnel.  You have to have an automated remote backup system to handle this data source.

Even more remote than a public cloud VM is a public cloud SaaS app.  With a SaaS app you don’t even have the option of running an expensive VM to backup your infrastructure.  You are forced to interact with any APIs they provide for this purpose. You must be able to protect this data over the Internet.

Finally, there are end user devices: laptops, desktops, tablets, and phones. There is no getting around that most people do the bulk of their work on such devices, and it’s also pretty easy to argue that they’re creating data that they store on their laptops.  Some companies handle this problem by converting to cloud apps and telling users to do all their work in the cloud. But my experience is that most people are still using desktop apps to do some of their work.  Even if they’re using the cloud to store the record of authority, they’re probably going to have a locally cached copy that they work on.  And since there’s nothing forcing them to sync it online, it can often be days or weeks ahead of the protected version stored in the cloud.  This is why it’s still a good idea to protect these systems. Mobile devices are primarily data consumption devices, but they still may create some data.  If it’s corporate data, it needs to be protected as well.

All things to all bytes

The key to backing up from anywhere is to reduce as much as possible the number of bytes that must be transferred to get the job done, because many of these backups will be done over slow or expensive connections. The first way to do this is to perform a block-level incremental backup, which transfers only the bytes that have changed since the last backup.  Once we’ve reduced the backup image to just the changed bytes, those bytes should be checked against other clients to see if they have the same changed bytes — before the data is sent across the network.  For example, if you’re backing up Windows systems, you should only have to back up the latest Windows patches once.

The only way to do this is source deduplication, also known as client-side deduplication. Source dedupe is done at the backup client before any data is transferred across the network. It does not require any local hardware, appliance, or virtual appliance/VM to work.  In fact, the appliance or system to which a given system is backing up can be completely on the other side of the Internet.

In my opinion, source-side dedupe is the way backup always should have been done.  We just didn’t have the technology. It saves bandwidth, it increases the speed of most backups, and it makes the impossible (like backing up a server across the Internet) possible.

You can backup some of the previously mentioned data sources with target dedupe (where you put a dedupe appliance close to the data & it does the deduping), but it can’t do all of them.  Target dedupe also comes at a significant cost, as it means you have to install an appliance or virtual appliance at every location you plan on backing up. This means an appliance in every remote datacenter, even if it only has a few dozen gigabytes of data, a virtual appliance (or more) in every cloud, an appliance in every colo – and mobile data gets left out in the cold.  Source dedupe is cheaper and scales farther out to the edge than target dedupe – without the extra cost of appliances in every location.

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.