I am not a lawyer. I’m not even what I consider a GDPR specialist. But I’ve read a lot of the text of the GDPR, and I’ve read a lot about it and watched a lot of videos. So I’d like to offer my layman’s interpretation of an important aspect of GDPR – the right to be forgotten – and whether or not it means we have to delete data from our backups. For some background info on GDPR and why you should care about it, feel free to reference my other blogs on the subject here and here.
Let’s talk about this
I have an opinion on this issue, but it’s not a legal opinion. I’d love to hear your opinion, especially if it differs from mine. Let’s see some comments on this one, shall we? Here’s the official GDPR website where you can read it for yourself.
The easy stuff
There are all kinds of GDPR articles about making sure we have consent and a reason to store personal data, making sure we take care of it when we do, making sure we store it securely, etc. There’s even a line that says we need “the ability to restore the availability and access to personal data in a timely manner in the event of a physical or technical incident.” My personal opinion is that you should have been doing all this stuff already, which is why I’m calling it “the easy stuff.” (The easy stuff isn’t easy if you haven’t been doing it, but it’s easy in that all the technology is there to do it. All you have to do is implement it.)
A little bit harder
You need a way to search any systems that store personal data. You need to be able to query for any records referencing a given email address, IP address, physical address, etc. Hopefully you have that already, but if not, that will require some work to comply with. This is needed to satisfy the data request and right to be forgotten provisions.
If you’re using “natural keys” as the primary keys in your database, that’ll have to change. Any information that could be deemed personal information by the GDPR should not be used as the primary key in a database.
The first reason is what happens if you are asked to delete a given record that uses the primary key of the IP address where the user filled out a form, or the email address they used to do so. If you reference that primary key in other records, you’ll have to do a cascading delete of any records that reference that key, in addition to deleting the primary record. I’ll discuss the other reason this is important later in the article. Suffice it to say this may require a significant design change in your database system.
It goes way beyond employees
I’ve heard a lot of people talking about employees as if they are the main data subjects under the GDPR. They are covered under GDPR, but I think employees (IMO) fall under the easy stuff. It’s easy to prove consent when you have an employment contract. You’re probably already securely storing that data, and you probably also have a pretty simple way of searching for those records to comply with any requests for that data. You also have a valid reason to not comply with any erasure requests, because you can say that you’re keeping it to be able to defend against any lawsuits, which is an exception to the erasure requirement. (There are several reasons you don’t have to erase data; one of them is if you are keeping it to protect against lawsuits.) My opinion is that everything I just said also applies to customers. You have a contract with them, you have a reason to keep their information, you can easily search for it, and you have a reason to not delete it. Easy peasy. (Remember, I’m not a lawyer, and I’m curious about your take on this.)
The rub comes when you’re storing data about non-employees and non-customers. You will have to prove that you got affirmative consent to store the information, you’ll need to supply it when asked, and you’ll need to delete it when asked. Now things get a little hairy. It’s out of the scope of this blog, but this means you have to do things like have an unchecked checkbox that they have to check to give you permission to store the info. And you should be storing any personal data in a system that allows you to easily search for the data if someone asks for it.
But what about backups? Do I have to delete the backups?
No one knows for sure because there’s no case law on it it and the GDPR itself is somewhat unclear on the issue. We won’t know until someone gets sued under the GDPR for not deleting data from their backups. If a court rules that backups are part of what we’re supposed to delete, we’re all in a world of hurt. If they rule in line with what I say below, then we can breathe easier. Let’s see what the GDRP says about the subject.
The GDPR seems more concerned with live copies of data
This is more a general feeling than anything I can directly quote, but it seems to be interested primarily in online, live copies of data that can be easily accessed. I’m guessing it’s because these are the copies that tend to get hacked and accidentally released to the public. You don’t really see any stories about how some hacker broke into someone’s backup system and restored a bunch of stuff to make it public. Heck, most companies can’t restore their own data properly. How’s a hacker going to do that?
The GDPR doesn’t mention backups.
Go ahead. Search the entire text of the GDPR for phrases like “backup,” or “back up.” You won’t find it. So no help on that front.
The GDPR does mention restores
The writers of the GDPR knew about backups and restores, because they mentioned that you need “the ability to restore the availability and access to personal data in a timely manner in the event of a physical or technical incident.” So they knew the concept of a backup exists, but chose not to mention it in the erasure section.
It does use the words archive and archival, but
When it uses the word archival, it seems to be referring to a large collection of information for a long period of time. And if you can prove you’re doing something like that “for the public good,” then it’s also exempt from erasure. For example, you can’t ask that CNN erase a story about you getting arrested.
The GDPR does mention copies
There’s a section that says you should take reasonable steps to erase any “links to, or copies or replications of those personal data” if you’ve made it public. But, again, this seems focused primarily on online copies of data that are replicated copies of the same data we’re trying to erase.
The GDPR uses the word “reasonable” and “excessive”
There GDPR is filled with phrases like “reasonable” and “excessive”. They understand that not everything is possible and that some things will require an excessive amount of effort. One example of this is in Recital 66 about Article 17 (the right to be forgotten article). It says that if a controller has made the personal data public, it “should take reasonable steps, taking into account available technology and the means available to the controller, including technical measures.”
The GDPR doesn’t use the word “reasonable” in the erasure section
Interestingly enough, right where we’d like to see a “reasonable” section, there isn’t one. There is one when it talks about what you have to do if you’ve already made the data public and are asked to delete it, but it doesn’t mention reasonability when talking about deleting the main source of the data or any backups of that data.
You do have to make sure data stays deleted
If you are asked to delete a particular piece of personal data, you do need to make sure it is deleted – and stays deleted. But it’s virtually impossible (and certainly not reasonable) to delete records out of most backup systems, so how are going to ensure a given record stays deleted if you do a restore?
Now we’re back to natural keys. You’ll need a way to find records pertaining to Steve Smith living at 123 anywhere lane, without storing the values of Steve Smith and 123 anywhere lane. (Because doing that would be violating the deletion request.) This is why you need to use something other than natural keys. If you’re not using natural keys, you can determine that Steve Smith at 123 anywhere lane is lead number 9303033138. That is a unique value that is tied to his record, but is not personal data if you get rid of the other values. You can then create a separate table somewhere that tracks the lead numbers that must stay deleted from the marketing database – even if it’s restored.
If you restore the marketing database, you just need to make sure you delete lead number 9303033138 and any other leads listed in the DeletedLeads table – before you put that database back online. Because if you put the marketing database back online with Steve Smith’s address and email address still there – and then someone kicks off a marketing campaign that contacts Steve Smith after you said his records are deleted – you’re going to have a very easily provable GDPR violation on your hands. Then we’re back to talking about those potentially huge fines.
I don’t think you have to delete data from your backups
My personal non-legal opinion is that as long as you have a process for making sure that deleted records stay deleted even after a restore – and you make sure you follow that process – you have a pretty defensible position. My personal opinion would also be to be upfront about this in your notification to the data subject.
Dear Steve Smith,
We have deleted all references to your personal data in our marketing database. For technical reasons we are unable to delete this information from our backup system, but that system is only used to restore the marketing database if it is damaged. We also have a system in place to ensure that your records will be immediately deleted if the marketing database is ever restored from the backup system.
Backup vendors can and should be part of this process moving forward. Maybe in a few years’ time, we’ll have the ability to surgically remove records from a backup. That would be very nice, and would be more elegant than having to do what I’m suggesting above. This may indeed become a competitive differentiator for one or more backup companies moving forward.
What do you think? Am I being too hopeful here?
----- Signature and Disclaimer -----
Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.