You cannot just keep user information forever "just in case" they are useful again.
To be honest, in the age of modern overprovisioned storage drives that remap blocks frequently, I'm not really sure you can implement genuine "hard" deletes without choosing significantly unorthodox hardware (or destroying a drive every time you need a single bit erased), no matter how much you want to in software. One of those details that I'm both surprised and unsurprised doesn't seem to have been addressed legally. I feel like a court ought to at least buy this aspect of the argument, so maybe they'll buy that it can be difficult in terms of the software too? Who knows. My guess is that a reasonable court would accommodate something that's reasonable for a given company, but there are lots of variations that could fall into that category.
The law is clear, you can't keep user data if the user decides that you should no longer have them. It's your own responsibility to find a way to do that.
> To be honest, in the age of modern overprovisioned storage drives that remap blocks frequently, I'm not really sure you can implement genuine "hard" deletes without choosing significantly unorthodox hardware (or destroying a drive every time you need a single bit erased), no matter how much you want to in software. One of those details that I'm both surprised and unsurprised doesn't seem to have been addressed legally. I feel like a court ought to at least buy this aspect of the argument, so maybe they'll buy that it can be difficult in terms of the software too? Who knows. My guess is that a reasonable court would accommodate something that's reasonable for a given company, but there are lots of variations that could fall into that category.
The deletion should be done in reasonable terms, i.e. you can't reasonably get the data back. Of course when you delete something from an hard drive being that an SSD or HDD till that block is not reused the data is recoverable, but that is another thing.
By the way, it's either not really difficult to secure erase user data: what you need is not to erase all the data of the user, is to encrypt all the data of the user with a symmetric key algorithm (such as AES, that is super fast) and only store the key in a system that gives you secure erase capability. When the user deletes his account you don't have to delete all the data but just the encryption key.
When the GDPR rules came out our company (pretty small at the time) pulled out all the stops to get ourselves into compliance, due to the above and other teeth put into the regulations. It seemed utterly irrational not to. Were we the exception, or the rule?
Probably almost all companies actually, including yourself (just to a different extent) - what did you do about the storage remapping thing I mentioned? Did it come up/did you guys discuss it? Do you believe you're in compliance despite your hardware (most likely) not guaranteeing erasure or overwriting of existing data? I'm curious how your assessment of that went, because I doubt one can be in strict compliance without guarantees from the hardware.
Expensive, maybe, but not risky.
If you're rolling the dice on a GPDR fine, the expected value of vague compliance is still largely positive, while the expected value of actual compliance is still slightly negative.
The expected value of the fine needs to be a larger negative than the expected value of vague compliance.
You don't even need to raise the fine value (which is a percentage/scale of revenue), you just need to make the probability of paying it approach 1.
In fact, with a higher probability of paying the fine, you could even lower the actual fine and still have it have a larger negative expected value than vague compliance.
This is not the purpose of cookies. The purpose of cookies is to store state on the browser side. Does GDPR interpret all possibilities of "cookies" as the storage and communication of privacy-violating data? Because that would be extremely unfortunate
It leads to having to do crazy things like individual keyed encryption per user, escrow hilarity, etc.
GDPR does differentiate between structured data (I believe it uses the term “identifiable records” or similar), and huge heap of unstructured data where an individuals data can’t be quickly retrieved as it’s own atomic unit. With much stricter requirements for anything structured.
So for data on a HDD that could be recovered, but is an unstructured mess. You’re probably ok, as long as you took reasonable steps to protect the data, like full disk encryption (which could occurs below the file system, and thus allow you to recover data deleted from the FS with a full disk scan, as long as you still had the keys). If you just had peoples data unencrypted on a HDD, and didn’t securely erase it before dispose, then that’s probably still a GDPR violation.
Instead of hard-deleting a structured record for Bob Smith, can you leave the record intact and scrub it of identifying data so that I don't e.g. break all of the records for orders that Bob Smith made?
This is obviously not always true. Any European can't, for example, delete their online account with their mortgage company and demand that the mortgage company deletes the records saying that they owe them money so they can get their house for free.
Nor can anyone call up their previous employer and require them to delete all the work they ever did from their company's computer systems.
A company can't respond to a tax audit with "well we lost a billion dollars this year but all our customers wanted their accounts deleted so we can't provide any documentation of it." I mean, you aren't even going to get to the audit if you can't fill out your taxes in the first place because you had to delete all your financial records.
Most computer systems/software products in existence aren't free media apps, yet the "privacy circlejerk" constantly talk in generalities as if they are. I'm sick of this, most applications have a business justification for using some form of soft delete for at least some items.
You can’t delete IP as this is not covered by GDPR, but you sure can ask them to delete your identification data from their records as GDPR also works for employees not just customers. It’s a problem with for example vcs.
Unless there’s another law in ruling, GDPR is the baseline. It’s for any identification of an individual, regardless of usage. Most “media apps” don’t have any other law that’s in effect, so GDPR is mandating what you need to do. (IANAL but worked on implementing GDPR with lawyers).
The examples in the GP were mostly around the data being necessary for fulfilling a contract (can't request deletion of your mortgage records) or meeting legal obligations (can't delete tax records). But there's also legitimate interest.
Legitimate interest is vague and open to interpretation, but I'd be really surprised if there were any decisions from a DPA suggesting that a VCS commit log would be problematic. The legitimate interest seems really strong in that case: being able to audit who actually made each change, e.g. for security reasons.
You can ask, but the right to have data erased is not an absolute right. In fact there's an enumerated list of the circumstances in which the data must be deleted but data controllers are otherwise under no specific obligation to delete data, and there are even a series of criteria which dis-apply that obligation.
It’s almost impossible to guarantee instant delete, folks just care if it’s gone in a documented and/or reasonable amount of time, and predictably.
Correct?
You can’t just declare that it take years to delete data, GDPR sets reasonable limits on how long a company can delay true deletion.
Various types of data have various retention times for various reasons, some being legal reasons.
A user has a long history of participating on your forum and other users have quoted their messages far and wide. Collectively all of the messages posted on your forums (with or without timestamps) reveal a few PII about the user. Do you have to delete those?
The user filed a bug report about a functionality not working, do you have to delete the text of the bug report?
Arguably if your user table look like [user_id, creation_date, deletion_date, status, account_type] then this table does not contain any PII.
Assuming that user content is not automatically PII, whose responsibility is it to track where PII can be?
Sometimes users doxx themselves (like mistakenly sharing tax return forms instead cat pics), in such a case it is the user responsibility to signal this to you.
If the user filed an issue saying "when I insert my name (Abe Cox) the input validation fails" is it you or the user that need to read through all the issues to find this case?
My point is that GDPR + right to be forgotten cannot make it look like you never had an account at all, especially without user assistance.
Technically yes.
> The user filed a bug report about a functionality not working, do you have to delete the text of the bug report?
Depends of the privacy policy of the bug tracking application. If it's stated that the report shouldn't contain any personal data you don't have to delete it. If the user asks for deletion stating that it contain personal data you have to delete it.
> Arguably if your user table look like [user_id, creation_date, deletion_date, status, account_type] then this table does not contain any PII.
It doesn't, as long you no longer have any way to associate user_id to the user personal data, you can keep it.
> Sometimes users doxx themselves (like mistakenly sharing tax return forms instead cat pics), in such a case it is the user responsibility to signal this to you.
In that situation if the user asks for deletion of the content you have to provide it. You are obviously not asked to monitor errors about the user.
For example, if you created a social media app and later left the company, it could still be determined that if you had an account it likely has one of the lowest user ID values if those were assigned sequentially. So even if your name was removed from it and none of your posts provided any identifying information about you, the user ID could still qualify as PII.
Likewise if the user ID is public and non-PII content associated with that ID is linked elsewhere identifying the author of that content (e.g. a news article embedding a post) the identity of the author is still compromised.
The problem is not just that users may submit PII in unexpected places, the problem is that even if the data type could not possibly allow them to, metadata beyond your control may actually still taint your supposedly anonymous data.
As an extreme example, if user A takes a screenshot if their own user setting page on a image sharing site, then sends it to user B that uploads it to the same site. Who is responsible for this piece of data? does share-a-pic.web need to scan all images for the username of a user that is deleting an account?
This is a convoluted example where the answer is obviously no, but it shows that a purely principled answer might be unsuitable.
Especially since non PII data you own mixed with non PII data someone else owns can become PII and you do not know what data others have.
(you could also replace the user ID in all locations with a new tagged UUID to preserve referential integrity in the DB, but there not much more than 2^32 humans in total, it s very hard not to be deanonymizable)
This is the actual text of the law[1]:
> The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay
There is a one month limit of the right of access[2] and right to be informed[3], which flow into some of the articles of the law; and certain articles also mention a month's limit on being informed, e.g. from article 12 parts 3 and 4 [4]:
> The controller shall provide information on action taken on a request under Articles 15 to 22 to the data subject *without undue delay and in any event within one month* of receipt of the request. That period may be extended by two further months where necessary, taking into account the complexity and number of the requests. The controller shall inform the data subject of any such extension within one month of receipt of the request, together with the reasons for the delay. Where the data subject makes the request by electronic form means, the information shall be provided by electronic means where possible, unless otherwise requested by the data subject. If the controller does not take action on the request of the data subject, the controller shall inform the data subject without delay and at the latest within one month of receipt of the request of the reasons for not taking action and on the possibility of lodging a complaint with a supervisory authority and seeking a judicial remedy.
I've marked out those phrases because they show the language of the law that you will see again and again in the articles - without undue delay and within one month. The latter does not obviate the former.
Hence, and in fact, there is no part of the law that explicitly gives you a month to delete something. If you could delete something right now but you leave it hanging around because you have arbitrarily chosen a job to run once a month and there's a leak in that time, that would be undue delay.
[1] https://gdpr-info.eu/art-17-gdpr/
[2] https://gdpr-info.eu/issues/right-of-access/
The 30 days wasn't particularly taken from the regulation, the actual grace period for the deletion should be 90 days in total, but I'm not a lawyer nor certain about it.
What I am certain about is that your original thesis, which is soft deletes being illegal under gdpr, is complete bullshit
Pseudonymous data is not anonymous data.
But yes, the GDPR obviously allows for legal requirements of record keeping. It does however require you to delete the data once those requirements no longer apply (i.e. you have to ensure data is still deleted after those several years, you can't just not delete it because the deletion time is years in the future).
Long before GDPR was a thing we had this implemented allowing us to securely analyze production dumps.
This is one of those things that I think libraries need to mature to support and then it will be effortless. Right now it requires a modest, but hardly overwhelming, engineering effort to implement.
That way log and system context is available for debugging, but the system no longer holds any PII. It does delete some logs, but anonymizing data seemed better than hard deletion. It wasn't hard to do, it took legal longer to approve than writing and testing the code and has come in handy a few times for debugging errors.
Edit: typo
So yeah, you're right that not deleting user data just because it's inconvenient is wrong and illegal, but that doesn't mean there aren't legitimate usecases for soft deletions of users.
Users can retract willing consent, but willing consent is not the only thing that allow companies to store user data.
Soft deletion of users whom had an account with transactions on it is fine
Art 17 comma b explain this quite clearly
Gdpr is amazingly readable.
Soft delete immediately when requested. Hard delete (if still soft deleted) when GDPR requires it.
Besides it's not like all data is the same. Use what's appropriate when it's appropriate. Nobody's saying "you must always soft delete, and then immediately tweet the audit trail, print that tweet, put that printout onto a rocket, and launch it to the moon for later retrieval. with a blockchain for extra GDPR hate".
There's a grace period of 30 days. We hard-remove after 2 weeks allowing a user to change their mind if they deleted it by accident. More than enough.