undefined | Better HN

0 pointsdataflow3y ago0 comments

I don't know if you were the exception or the rule, but I know for a fact some do go the forgiveness approach for requirements that they believe would be disproportionately burdensome.

Probably almost all companies actually, including yourself (just to a different extent) - what did you do about the storage remapping thing I mentioned? Did it come up/did you guys discuss it? Do you believe you're in compliance despite your hardware (most likely) not guaranteeing erasure or overwriting of existing data? I'm curious how your assessment of that went, because I doubt one can be in strict compliance without guarantees from the hardware.

0 comments

10 comments · 2 top-level

coder5433y ago· 6 in thread

Remapping of blocks isn't the problem here. IANAL, but you should be encrypting all the data for GDPR compliance to begin with. With it encrypted, how the storage device chooses to map its blocks is completely irrelevant. When the key goes away, all the data is erased by definition. Throwing away the key is one of the easiest ways to comply with Right to be Forgotten, from what I've seen.

"Oh, but the key is still on the hard drive!" you would likely complain. No... the key could be stored any number of places that won't leave a copy. If you store the key on a TPM/HSM, then tell the TPM to delete the key, short of finding out that the TPM manufacturer failed to perform their duty in erasing the key, and simultaneously discovering a 0-day exploit to go in and retrieve that deleted key, it is gone. Full stop. It is unreasonable to ask everyone to do something impossible like prove that their TPMs are working properly, when even the TPM manufacturers cannot definitively prove this, so that is clearly not what the law is asking. You might as well ask companies to prove that they've never been hacked. How can anyone prove that?

Conceptually, one of the simplest ways to comply with this would be to perform database failover once a month, since you have a month to comply with these requests. Before the failover, you would ensure that all data is deleted that has been requested to be deleted. Then you bring up a new database replica, stream the remaining data to it, failover, and then destroy the key stored in the TPM on the primary and any other replicas. All that deleted data is now gone forever.

There are more complicated architectures where you could have per-user keys which are stored on an encrypted volume, and then you would be managing a handful of keys in the TPM to handle rotation of the encrypted volumes. Once a month, you copy the per-user keys that haven't been deleted onto a new encrypted volume, then tell the TPM to delete the key for the old encrypted volume: garbage collecting the deleted keys in an unrecoverable fashion. This avoids the hassles with having to failover a database so frequently, although having that level of automation around database failover has advantages too.

I'm sure some companies are content with treating the deleted data as erased, as long as their drives are properly encrypted. A rogue employee could potentially recover the data while the hard drives and the keys are still together, but the hard drives will be worthless once they're separated from the keys after they fail and are thrown away. A rogue employee could have stolen the data before it was deleted anyways, so how do you prove that rogue employees don't exist? Is this level of erasure good enough? This one I'm a little iffy on, but it still completely resolves your issues with the block remapping.

GDPR makes reference to "taking account of available technology and the cost of implementation", so I think it is fair to say to that GDPR is not asking you to erase any butterfly-effect waves that have propagated out from the read/write head of your hard drive. If the data is not recoverable by any known means, that is the purpose of the law, isn't it?

I will repeat that I'm not a lawyer, but your argument feels like a strawman designed to paint the Right to be Forgotten as something that no one can comply with. Your comments come off as snarky and acting like you think you're so clever. As if no one has ever thought about these problems before.

jandrewrogers3y ago

Per user encryption such as you suggest does not address the problem. Systems that work this way have existed for decades and they have pathologically poor performance and scalability. Many classes of major optimization in databases don't work under these constraints. This isn't a novel idea, having been implemented for decades; it has been broadly rejected because it doesn't actually work in real systems without tradeoffs no one accepts, even in highly security sensitive environments.

You are overly dismissive of the technical challenges of actually deleting data, particularly in large systems. Technically naive solutions like having separate encryption keys for individual entities sounds good but it doesn't actually address the problem.

coder5433y ago

I feel like you’re overly dismissive of the other things I mentioned. Database failover comes with inconveniences, but it does allow you to use a single key and get great performance. For massive databases, it might be impossible to do that way, but most companies don’t have massive databases. I also opened a question about how ”erased” deleted data needs to be to be compliant, since I would imagine a lot of companies consider deletion on an encrypted drive to be good enough, even if the data may theoretically be recoverable under extreme circumstances until that hard drive is completely overwritten multiple times or destroyed.

Per user keys are also very useful for use cases like a personal file storage service, even if traditional RDBMS don’t work well with them. Techniques are situationally dependent.

I was not being dismissive of the existence of technical challenges, but rather dismissive of comments like:

> To be honest, in the age of modern overprovisioned storage drives that remap blocks frequently, I'm not really sure you can implement genuine "hard" deletes without choosing significantly unorthodox hardware (or destroying a drive every time you need a single bit erased), no matter how much you want to in software.

Which seem to completely deride the possibility of compliance.

Compliance is feasible, even if it has challenges.

1 more reply

dcow3y ago

Is there literature on this topic? I'd like to learn at what scale per-user/tenant keys becomes untenable and the characteristics of systems that exhibit pathologically poor performance due to such a design.

Naively, it sounds like individual entity keys does solve the problem of deletion, but that your argument is that the tradeoffs aren't generally worth it?

1 more reply

closewith3y ago

With respect, your own comment reads as overly dismissive of the regulatory environment introduced by the GDPR. Systems (and companies) have for decades collected and stored personal data with little or no regard for the lifecycle of the data.

Since the introduction of the Regulation, companies must now refrain from the collection of personal data if they cannot satisfy the rights of the data subject, including the right to erasure (which is obviously not an absolute right).

If the technical solutions aren't performant or scalable enough for data controllers, then the result is that they cannot collect personal data. The idea that companies have in the past rejected solutions which protect the rights of data subjects because they were unwilling to accept the technical tradeoffs is exactly why strong regulation such as the GDPR is required.

Frankly, your comment reads as contemptuous to the rights of the end-user. Thankfully, the EU offers protection to its residents from people with that attitude.

dataflowOP3y ago

I have no idea where the sudden hostility in your last paragraph came from, but it's ascribing nonexistent malice to me. I'll try to address a couple of your points the best I can regardless.

> Once a month, you copy the per-user keys that haven't been deleted onto a new encrypted volume, then tell the TPM to delete the key for the old encrypted volume: garbage collecting the deleted keys in an unrecoverable fashion. This avoids the hassles with having to failover a database so frequently

This ignores the existence of database indexes.

What I was saying (I thought quite explicitly) was that strict compliance to GDPR appears unreasonably burdensome to most companies, and thus most companies stop at a line they draw earlier (that line frequently being the lack of hardware guarantees re: block remapping). This is not the same as saying no one can comply with GDPR, and I had no intention of making GDPR look impossible to comply with. Quite the contrary in fact—I believe its current requirements can and should be complied with quite cheaply and efficiently, but the most effective path to that requires support from hardware manufacturers (and possibly others) that customers and regulators currently don't demand.

This is not a take-down of GDPR or a malicious strawman mockery of its requirements as you're (quite unkindly and wrongly) imputing me with. If anything, it was a plea for customers, regulators, and/or lawmakers to realize they need to expand their GDPR demands to require that vendors of other parts of the hardware/software stack (particularly storage device manufacturers) provide better support for GDPR compliance.

coder5433y ago

You made several comments in a row that were extremely dismissive of how GDPR compliance could be achieved, and your comments repeatedly questioned whether other people considered basic concepts of data erasure. I apologize if I misattributed malice, but I do feel those comments could have been worded differently.

Also, this quote:

That quote feels incompatible with your statement that…

> Quite the contrary in fact—I believe its current requirements can and should be complied with quite cheaply and efficiently

How can it be complied with cheaply and easily if it can’t be complied with without hardware support no matter how much software wants to? Surely you can see my confusion. Your last comment is far more in line with what I would expect to see in a discussion like this.

Regardless, I think I understand the argument in your last comment better now, but I still think destroying keys is the most effective way to achieve compliance. Even if a storage device manufacturer guaranteed that they would attempt to erase the specific block you want erased, there are problems with whether that data is still recoverable by reading the block repeatedly.

dinvlad3y ago· 2 in thread

Please, don't give ideas to whomever among their lawyers reads this later :-). We don't need another Schrems II..

dataflowOP3y ago

Sorry!

dinvlad3y ago

I think lawyers probably miss a lot of these technicalities (which explains why they can require something as technically unreasonable as Schrems II), but I just wonder if someone more technical can actually push for something like that, to win themselves some questionable points, in the name of improving privacy.

2 more replies

j / k navigate · click thread line to collapse

0 comments

10 comments · 2 top-level

coder5433y ago· 6 in thread

jandrewrogers3y ago

coder5433y ago

Per user keys are also very useful for use cases like a personal file storage service, even if traditional RDBMS don’t work well with them. Techniques are situationally dependent.

I was not being dismissive of the existence of technical challenges, but rather dismissive of comments like:

Which seem to completely deride the possibility of compliance.

Compliance is feasible, even if it has challenges.

1 more reply

dcow3y ago

Naively, it sounds like individual entity keys does solve the problem of deletion, but that your argument is that the tradeoffs aren't generally worth it?

1 more reply

closewith3y ago

Frankly, your comment reads as contemptuous to the rights of the end-user. Thankfully, the EU offers protection to its residents from people with that attitude.

dataflowOP3y ago

I have no idea where the sudden hostility in your last paragraph came from, but it's ascribing nonexistent malice to me. I'll try to address a couple of your points the best I can regardless.

This ignores the existence of database indexes.

coder5433y ago

Also, this quote:

That quote feels incompatible with your statement that…

> Quite the contrary in fact—I believe its current requirements can and should be complied with quite cheaply and efficiently

dinvlad3y ago· 2 in thread

Please, don't give ideas to whomever among their lawyers reads this later :-). We don't need another Schrems II..

dataflowOP3y ago

Sorry!

dinvlad3y ago

2 more replies

j / k navigate · click thread line to collapse