The GDPR is unclear about backups

No one knows for sure whether backups are going to be included in the right to be forgotten (RTBF). Even the GDPR ICO isn’t being entirely clear about it yet. But that hasn’t stopped people from expressing very strong opinions on the subject.

I’ve now written several articles on this topic, and I’ve seen a variety of responses to my comments so far, especially comments to The Register article that Chris Mellor wrote that mentioned my articles. The ones that crack me up are the people that are absolutely sure about how GDPR works – even if the ICO isn’t.  This is especially true when they’re trying to sell me something to fix a GDPR problem.  Be wary of GDPR fear mongers trying to sell you something.

Just in case you’re wondering, even though I work at a data management as a service (DMaaS) vendor, I’ve got no axe to grind here. I’m aware of no products in this space that have a stronger GDPR story than Druva, so I have no reason to convince you this isn’t a problem.  (We are able to help you find files that match certain parameters, and can help you delete them from backups.  Like everyone else, however, we are not yet able to delete data from within a backup of a structured database. I am still unaware of any products that solve this problem.)

My only goal here is to start a conversation about this difficult topic. That I’ve clearly done.  So I’m going to keep talking about it until we know better.

Experts abound

There are what I would call the “GDPR purists,” whose position sounds something like “what part of forgotten do you not understand?” Clearly these people feel that backups and archives are included in the RTBF, and they really don’t care how much it would cost a company to comply.  Most importantly, they’re certain any companies not agreeing with them will be out of compliance and subject to huge fines.

I also get comments that are completely opposite of that, where people are certain that backups are not (and will not be) included in RTBF requests.  While my opinion is that these people are closer to what is likely to happen, they are just as dangerous as those who are certain they are wrong.

It’s all conjecture

The only thing I know for certain is that anyone who says they “know” the answer to this question is definitely wrong. Consider what happened when Chris Mellor contacted the ICO for his article.  It does seem at first that their response seems to favor the “definitely included” folks. They said, “Merely because it may be considered ‘technically difficult’ to comply with some of its requirements does not mean organisations can ignore their obligations.”

The ICO knows that the RTBF is going to be hard (even without the backup part of the problem), and they want you to know that you can’t just say “erasing every reference to a given person is really hard” as a defense for not doing it.  Don’t even think about trying that one, they’re saying.  They need everyone to know they mean business.

But I also don’t think we’re talking about technically difficult; we’re talking technically impossible. After 25 years of experience in this field, I can easily say it is technically impossible to erase data from inside a structured database inside a backup without corrupting the backup. Almost all backups are image based, not record based. Even if you could identify the blocks pertaining to a given record inside a database, deleting those blocks would corrupt the rest of the backup.  You literally cannot do that.

I also want to say that the idea that you would restore every copy of the database you have, delete the record in question, then re-backup the database is simply ludicrous. And you would do that every time you got such a request?  That’s just nonsense. Besides the fact that the process would be so expensive that it would be cheaper to pay the fine, there’s a huge risk element to the process. That means that it places you in possible violation with another part of the GDPR — that you must be able to safely restore personal data that was deleted or corrupted.  (A mistake in the process could corrupt all backups of every database.)

There is another proposed solution of converting your backups to archives by scanning & indexing them, and then deleting the backup tapes. This process sounds interesting until you learn it doesn’t solve the problem I keep bringing up – personal data stored inside an RDBMS.  So it might help, but it’s not a full solution to the problem.

My opinion is that erasing all references to a given person in your production system – while also having a process in place to make sure said person never “resurrects” from the backup system – accomplishes the goal of RTBF without placing the backup data at risk or attempting to do the impossible. If Vegas had odds on this topic, that’s where I’d place my bet.  I think the ICO is going to say that as long as data is not being used to support any current business decisions, and isn’t directly accessible to production systems, it can be excluded from the RTBF process.  But you need to have a process to make sure it never comes back.

No one knows for sure, though, and that’s my point.  Anyone who tries to tell you they know the answer for sure either has no idea what they’re talking about, is trying to sell you something, or both.

The ICO says they’re going to make it clearer soon

The ICO could have said “backups are included. Period.” (in their comment to Chris’ article.) They didn’t. They said “The key point is that organisations should be clear with individuals as to what will happen to their data when their erasure request is fulfilled, including in respect of both production environments and backup systems. We will be providing more information on backups and the right to erasure soon.”

I for one am looking forward to that guidance.

What do you think?

I’m really curious.  Anyone else want to make a guess as to how this all shakes out?

What about this? Do you think that adopting a wait-and-see approach is risky?  Should you spend millions now even if we’re not sure how this is going to end up?

Written by W. Curtis Preston (@wcpreston), four-time O'Reilly author, and host of The Backup Wrap-up podcast. I am now the Technology Evangelist at Sullivan Strickler, which helps companies manage their legacy data

3 comments
  • As you say, until the ICO release definitive guidance, we are all flying blind. I would also agree with the odds of the likely final outcome. One would hope that a common-sense, proactive approach which demonstrates a willingness to comply with the legislation would be sufficient. An organisation which applies appropriate, proactive policies and procedures, audited internally/externally for compliance and at minimal cost which cover the backup and restore of targeted, personally identifiable data (i.e subject deletion request) is likely to be in better shape than one which doesn’t and opts for a wait-and-see approach.

  • “But you need to have a process to make sure it never comes back.”
    My money is on requiring accountability for every data restore.
    Modern police services have extremely stringent controls making them accountable for every access to their police databases. The Hollywood trope in which two cops tell each other their curriculum vitae and say, “Yeah, I looked up your file too …” No, cops can get sacked for that and even go to prison.
    Backup admins have had enormous power up to now, to restore anything, from anywhere and anytime, to any location, circumventing all access controls. My guess is every restore action will have to be ticketed, approved by a RTBF compliance officer, audited, and justified.

    • it might indeed go that way, especially for some companies.

      “Backup admins have had enormous power…” I could say the same for any administrator or person with root powers. Once you are logged in as Administrator or root, there’s not much that can stop you from doing what you want, and then cleaning up after yourself.

      This is why separation of powers is really important. You can do this, but not that. And we have to do everything we can do prevent unrestricted access to root/Administrator. Like requiring sudo for everything, and NOT allowing “sudo su” without setting off all kinds of bells and whistles.

      The same should be true with any decent backup software. All actions are done as an individual user, and all actions are logged. So if someone does restore something they’re not supposed to, it would at least be in a log somewhere. If that log is stored in a third-party system (e.g. cloud backup provider), even better.

      Because it does little good to have a policy that says “you must or mustn’t,” if you can’t check that they did or didn’t.