I’m still just thinking out loud here. Again… not an attorney. I have read the GDPR and done some analysis of it, primarily around the right to be forgotten (RTBF) and how it pertains to data protection systems. I just want to start the conversation about some of these topics and see what people are thinking about these very important topics.
Note: This article is one in a series about GDPR. Here’s a list of articles so far:
- Worried about GDPR?
- What is personal data?
- Some hope about GDPR & backups
- Keeping a copy of deleted data
- More thoughts on GDPR
No one is scrubbing backups
As I mentioned in my previous post, my opinion is that it is not reasonable to expect companies to delete data from their backups in order to satisfy an RTBF request. It’s simply not technically feasible given modern technology. I do believe companies should switch to non-natural values for the primary keys of their databases. It’s the latter that I want to talk about, based on some comments I received on my last post.
I stand by my opinion about non-natural keys for databases that store personal information. This allows you to delete a record while storing the record identifier, which isn’t personal data. That way you could easily check in the future if you have data that’s supposed to be deleted, such as if you restore the database to a point before the data is deleted.
But the commenter on my last article has a good point. What if you restore the database to a point before you starting using non-natural keys? Suppose you follow the suggestion and stop using natural keys today. But you still have backups from before today that don’t have natural keys, and you may have to keep those backups for a long period of time. (You shouldn’t, as you should only be keeping archives for that amount of time, but we all know that at least half of you are keeping your backups for years. Even if you were using archives, the problem of scrubbing them is just as hard, so they could cause the same problem.)
But what about this?
So, it’s three years from now and you need to restore a database from a backup you took before you switched to non-natural keys. In the past three years you have received hundreds of RTBF requests that you need to continue to honor, but you just restored a database that has those records in it, and it doesn’t have that non-natural key you stored in order to make sure the data stays deleted. How are you going to find and delete those records if you didn’t keep the natural keys you were using before you switched away from them?
Again, my opinion is that you’re going to have to keep enough data to identify a unique person in order to continue to honor RTBF requests after they’ve been done. Get rid of all data about the person (other than that) and store just enough to identify them — and put that in the most secure database you have. You could then use that database in one or both of the following two ways.
One way would to have an app that could read the data in the database, never display it anyone, but occasionally check if any records in the database are found in one or more databases. The main use case for this method would be after a restore from an older backup. You could point this app to that restored database so it could clean it. You could also use it proactively to periodically check your entire environment for deleted records and delete them if they are found.
Another way to use it would be to set it up so that you could only query it by the unique identifier; data is never exported or sent to another app. So you could run a query to see if SSN 123-34-3222 is in it. If a record is found, it is supposed to be forgotten, so it should be deleted. So, again, in the case of restored database you could check every record in the restored database against the deleted records, and delete any that are found. It’s less efficient than the previous method, but it’s more secure.
I think this is defensible. Do you?
On one hand, it looks like keeping the unique identifier – which was the whole point of the GDPR – goes against the letter of the law for a RTBF request. Yes, it does. But the GDPR also allows you to keep information required to protect against a lawsuit. Not honoring RTBF requests could cost your company big time, so my personal, non-legal opinion is that this is a perfectly valid thing to do after you’ve honored a RTBF request – in order to make sure they stay forgotten.
How are you going to deal with this problem? What do you think of my idea?
----- Signature and Disclaimer -----
Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.