Someone else driving your car doesn’t make it an Uber

Imagine if you had to lease a car in order to use Uber. That’s the logic behind how many infrastructure software vendors sell their “service” in the cloud. These “services” come in a couple of different flavors, but both of them come down to the same thing: infrastructure set aside for your exclusive use – which means you pay for it.

Private Cloud vendors

There are a lot of “XaaS” vendors that will create a dedicated system for you in their cloud, manage it for you, and then charge it to you as a service. It is definitely “Something as a Service,” as the management of it, including OS and application upgrades, hardware upgrades, and reporting, are all handled for you.  The key to this idea is that you get one bill that includes compute, storage, networking, and management costs.

This is definitely an improvement over managing your own system – from a level of effort perspective.  You don’t have to manage anything but the relationship with the vendor. Depending on your circumstances, you can even make the argument that the vendor in question is better at providing service X than you are. This is especially true of “forgotten” apps like data protection that don’t get the attention they deserve.  You could argue that using a private cloud vendor is better for your data protection than doing it yourself.

What you can’t argue is that it’s less expensive.  There are very few economies of scale in this model. Someone is still paying for one or more servers, some storage, some compute, and some personnel to manage them. They are then marking those things up and passing the cost to you.  There is no way this is cheaper than doing it yourself.

In addition, it’s also important to say that vendors who use the private cloud model don’t come with the same security advantages of those using established public cloud vendors. I know of one vendor that sells their services exclusively via a huge network of MSPs, each of which has a completely different level of capabilities, redundancies, and security practices.  Using a private cloud model requires a customer to look very closely at their infrastructure.

Hosted Software Vendors

Suppose you say you want to use the public cloud for economies of scale, an enhanced security model when compared to private cloud vendors, or maybe someone up higher simply said you needed to start using the public cloud. There are a number of infrastructure vendors that will run their software in VMs in the public cloud, and then offer you a service type of agreement for that software.

Now you are paying two bills: the “service” bill to the infrastructure software vendor, and the cloud provider bill for the compute, storage, and networking services required by this infrastructure vendor. Often in this model, the only service is that the vendor is selling you their software as a subscription.  But the moniker “as a Subscription” doesn’t sound as good as “as a Service,” so they still call this a service.

The problem with this model is that you aren’t getting any of the benefits of the cloud. Typical benefits of the cloud include partial utilization, cloud native services, automated provisioning, and paying only for what you use.  But you’re getting none of those in this model.

Infrastructure products – especially data protection products – are designed around using servers 24×7.  A backup server that isn’t performing any backups is still running 24×7, in case any backup clients request a backup.  That means those VMs you’re running the software on in the cloud have to run 24×7 – so much for partial utilization. A 24×7 cloud VM is very expensive indeed.

Such products are also written to use traditional infrastructure services, like filesystems, block devices, and SQL databases. They don’t know how to use services like S3 and NoSQL databases available the cloud.  In the case of backup software, they might know how to archive to S3 or Glacier, but they don’t know how to store the main set of backups there.

Such products also require manual scaling efforts when your capacity needs grow. You have configure more VMs, configure the software to run on those VMs, and adjust your licensing as appropriate. You’re not able to take advantage of the automated scaling the public cloud offers.

Finally, because you have to provision things in advance, you are often paying for infrastructure before you need it. If you know you’re going to run out of compute, you have to configure a new VM before you do. As you start using that VM, a good portion of it is completely unused. The same is true of filesystems and block storage, especially with backup systems. If your backups and metadata are stored on a filesystem or block storage, you have to manually configure additional capacity before you need it.  This means you’re paying for it before you need it. If the product could automatically make compute available only when you needed it, and use S3 for its storage, you would only pay for compute and storage as you consume it.

Don’t lease a car to take an Uber

See what I mean? In both of these models, you are leasing a car so you can take an Uber.  In the private cloud model, the cost of leasing the system is built into the price of the service, but you’re still paying for that infrastructure 24×7, since it is dedicated to you.  In the public cloud model, you’re paying for the service and you’re leasing the infrastructure 24×7 – even though the service isn’t using the infrastructure 24×7.  Examples of infrastructure products that work like this are Hosted Exchange, SAP Cloud and almost every BaaS/DRaaS/DMaaS vendor.

If you’re going to use the public cloud effectively, you need partial utilization, automated provisioning, and pay-only-for-what-you-use pricing. A true cloud-native product, such as Salesforce.com, Office365, G-Suite, or the Druva Cloud Platform, offers all of those things.  Don’t lease a car to take an Uber.

I don’t often directly push my employers products, but it’s World Backup Day tomorrow so I’m making an exception.  Celebrate it by checking out my employer’s announcement of the Druva Cloud Platform, the only cloud-native data management solution.  It can protect data centers, laptops, mobile devices, SaaS apps like Office 365, G-Suite, and Salesforce.com, and workloads running in the cloud – all while you gain all of the benefits of the cloud, including partial utilization, automated provisioning, and full use of cloud-native tools like S3 and DynamoDB.

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

GDPR Primer #2: What is personal data?

Last week I wrote the first of what will probably be a few articles about GDPR, EU’s General Data Protection Regulation.  It governs the protection of “personal data” that your company is storing from EU citizens living in the EU.  (They must be EU citizens, and they must be currently living in the EU for the regulation to apply.)

Note: This article is one in a series about GDPR.  Here’s a list of articles so far: 

As mentioned in my last article, US companies are subject to the regulation if they have personal data from EU citizens. Nexus or a physical presence is not required, only that you have data from people living there.

Is Personal Data the same as PII?

In the US we have a term we like to use called Personally Identifiable Information (PII), which includes certain data types that can be used to identify a person.  Examples  include social security numbers, birthdays, names, employers, physical addresses, and phone numbers.  It’s usually the combination of two data elements that makes something PII, for example knowing someone’s name and their birthday puts you one data point away from being able to steal their identity.  All you need is the social security number and you’re off to the races.

Personal Data, as defined by the GDPR, includes what we call PII, but it includes “any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.”  This is interpreted so far to include things like IP addresses, social media profiles, email addresses, and other types of data that we don’t think of as PII in the US.

Someone filling out a basic marketing form on your website has submitted what the GDPR considers personal data to your company. If there’s enough for the person to be identified in any way – which a marketing form would most certainly have – then it’s considered personal data as far as GDPR is concerned.

GDPR Is coming

GDPR goes into effect May 28th.  If you haven’t talked to your backup company about it, it’s time to start having that conversation.

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.

Is your data protection company worried about GDPR? They should be.

If you haven’t looked into how your data protection vendors are preparing for the General Data Protection Regulation (GPDR), you’re already behind the power curve.  It goes into effect May 25, 2018. Hopefully this article can get you up to speed so you can start pressuring your vendors about how they are going to help you comply with this incredibly big regulation.

Note: This article is one in a series about GDPR.  Here’s a list of articles so far: 

Disclaimer: I’m not a lawyer or a GDPR expert. My goal of this blog is to get you thinking and maybe scare you a little bit.  Nothing in this blog should be construed as legal advice.

Disclaimer #2: There is no such thing as a GDPR-compliant product, and definitely no such thing as GDPR-certified. A product can help you comply with GDPR.  A product can say “we are able to help you comply with articles 15 and 17,” but a product alone will not make you GDPR compliant.  And there is no certification body to provide a GDPR certification. Anyone who says that is making it up.

US companies must comply with GPDR if

Although this is a European Union (EU) regulation, you are subject to it if you are storing personally identifiable information (PII) (referred to by GPDR as “personal data”) about European citizens (referred to by GDPR as “data subjects”) living within the EU. Where your company is headquartered is irrelevant.

A business transaction is not required.  A marketing survey targeting EU residents appears sufficient to require your company to comply with GDPR.  An EU resident (who was not targeted specifically) filling out a form on your US website that does not have an EU domain might not trigger GDPR protection for that person.  My non-legal advice is that you should look into how you’re preparing for the requirements.

Not complying with GPPR can cost you dearly

Companies not complying with the data privacy aspects of GDPR can be fined 4% of annual revenue, or 20 million Euros, whichever is greater.  It hasn’t gone into effect yet, and no one has been fined yet, so we don’t yet know just how tough the courts are going to be. But that’s what the regulation allows.

How does GDPR affect data protection?

There are several aspects to GDPR protection, but only a few of them affect your data protection system. For example, there is a requirement to gain consent before storing personal data. That responsibility falls way outside the data protection system. But let’s look at some parts that many systems are going to have a really hard time with.

GDPR has articles that talk about general data security, but I think any modern backup system should be able to comply with those articles. The things about GDPR that I think data protection people will struggle with are articles 15, 16 and 17: the right to data access by the subject, the right to correction, and the right to erasure (AKA “right to be forgotten”).

Article 15: Right to data access by subject

If you have data on a data subject (i.e. EU citizen), and assuming that data is subject to GDPR, the subject has a right to see that data. This includes any and all data stored on primary storage, snapshots, backup storage, and archives.  Try to think about how you would comply with that request today and you see where my concern is. Archive software might be ready for this, but most backup systems are incapable of delivering information in this manner.

Article 16: Right to correction

A data subject has the right to have incorrect data corrected. This may not directly affect the backup and archive systems, but it might.

Article 17, Right to erasure (AKA “the right to be forgotten”)

This one is the one that truly scares me as a data protection specialist.  If a company cannot prove they have a legitimate business reason for continued storage of a particular data subject’s personal data, the data subject has the right to have it deleted. And that means all of it.

As previously mentioned, we don’t have any case law on this yet, and we don’t yet know the degree to which the EU courts will order a company to delete someone’s data from backups and archives. But this is article that has me the most worried.

Update: 05/29: I’ve changed a bit in how I think about this.  Make sure to check out this blog post and this one about this topic.

I told you so

The customers that are in real trouble are those that use their backup systems as archive systems, since most backup systems are simply incapable of doing these things. They will be completely incapable of complying with Articles 15-17 of GPDR.

I’ve been telling customers for years to not use their backup system as an archive system. If you are one of the ones who listened to me, and any long term data is stored in an archive system, you’re pretty much ready for GDPR.  A good archive should be able to satisfy these requirements.

But if you’ve got data from five years ago sitting on a dedupe appliance or backup tapes, you could be in a world of hurt. There simply isn’t a way to collect in one place all data on a given subject, and there’s definitely no way to delete it from your backups. Each record is a tiny record inside a filesystem backup stored in some kind of blog, such as tar file or the equivalent for your backup system.

What are your vendors saying?

Has anyone had any conversations about this with their data protection vendors?  What are they saying?  I’d really love to hear your stories.

----- Signature and Disclaimer -----

Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.