In my continuing series of challenges with backup with the General Data Protection Regulation (GDPR), I thought I’d look at snapshots, and the unique problem they present. They may be even more problematic than traditional backups.
Note: This article is one in a series about GDPR. Here’s a list of articles so far:
- Worried about GDPR?
- What is personal data?
- Some hope about GDPR & backups
- Keeping a copy of deleted data
- More thoughts on GDPR
In these previous posts I have defined what the GDPR is and how it applies to your company. I’ve also discussed whether or not backups are included when someone asks to be “forgotten” via a “right to be forgotten” request in the GDPR. As I discussed here, here, and here, I do not believe that companies are going to be able to delete such data from their backup systems, nor do I think that the GDPR is going to require them to do it. (But we just don’t know for sure until the ICO clarifies their position.)
The idea is two-fold. The first part is the backups aren’t being used to support any current decisions, nor are they accessible via standard IT systems and queries. The second part is that it’s simply not possible today to delete data from a backup like that.
But what about snapshots?
Someone asked about this on twitter. Snapshots are visible to regular IT systems and could be used to support current decisions. For example, NetApp snapshots appear under the ~snapshot subdirectory of the volume they are protecting. They may not be in the normal path of things, but a user could easily search and access them. It’s kind of the point of how snapshots work.
But guess what? Snapshots are read-only by design. You don’t want people to be able to delete data from your snapshots if you’re using them for backup. But since they’re accessible via a standard IT process, are they now considered primary data?
Out of curiosity, I reviewed the NetApp whitepaper on how they handle this issue, and it was unclear when it got to the part of actually forgetting the data. It mentioned that you couldn’t delete something if you didn’t know where it was, but it didn’t really go into how you would selectively delete something from a snapshot once you found it.
I’m not picking on NetApp here. I’ve always been a fan. I’m simply saying that – like backups – selectively deleting data from snapshots goes against their nature. And I’m pointing out that because they are accessible as regular IT data, they might not get the pass that I believe backups will get.
What is your plan for snapshots?
Have you discovered GDPR RTBF and how it relates to snapshots at your company? Has your storage vendor given you any guidance as to how to solve this problem? Is there a GDPR “back door” that you can selectively use to delete data from a snapshot? Do you want to use it, considering you could corrupt the thing you are using for backup?
I’d really love to hear from you on this.
----- Signature and Disclaimer -----
Written by W. Curtis Preston (@wcpreston). For those of you unfamiliar with my work, I've specialized in backup & recovery since 1993. I've written the O'Reilly books on backup and have worked with a number of native and commercial tools. I am now Chief Technical Architect at Druva, the leading provider of cloud-based data protection and data management tools for endpoints, infrastructure, and cloud applications. These posts reflect my own opinion and are not necessarily the opinion of my employer.