undefined | Better HN

0 pointswalterbell12y ago0 comments

Information diffusion is a function of time. Some people have the data now. Many more will use the data over time. Most of those new data users will simply click on a link in a blog or google or HN. The data may also be stored in a canonical open-data location. Each of those instances can have anonymized data.

What's the difference between distributing open-source with a known vulnerability and distributing open-data that knowingly violates the privacy of many people? If this was source code, there would be "responsible disclosure" that allowed the software author time to issue a new release of software. One could similarly work with NYC citygov digital team to anonymize the data properly and have them reissue an official dump, possibly with additional data from 2014. That would provide some incentive for developers to use the newer data.

Yes, malicious analysts can find the old data. But that is no reason for non-malicious analysts to keep replicating data that violates privacy. If this were data where the loss of privacy had significant financial or legal consequences, then naive data distributors and analysts would be inadvertently contributing to those consequences.

One should try to do the right thing, even if it seems technically pointless. In this case, working with the people who shared the data to fix the mistake. Otherwise, one could imagine future citygov publication requiring much more slow and expensive review of data to be released, e.g by lawyers who still won't find the next technical mistake. It's in the interest of all parties to make this particular instance right, to ensure future openness of privacy-protecting data.

0 comments

4 comments · 2 top-level

vijayp12y ago· 2 in thread

Yeah, this is a really good point. I'm going to try to reach out to someone in the government on Monday. I don't really have many contacts over there, so if anyone has suggestions on how to navigate the bureaucracy, I'm all ears.

walterbellOP12y ago

Might be worth trying the email address on the page of NYC Digital:

digital@cityhall.nyc.gov http://www.nyc.gov/html/digital/html/about/contact.shtml

saraid21612y ago

I'd recommend talking to Chris Whong and seeing if he has any advice, actually.

hnha12y ago

> What's the difference between distributing open-source with a known vulnerability and distributing open-data that knowingly violates the privacy of many people?

The difference is that software is something people choose to use and update while data is something other people have. The only use of a properly amonymised version of this dataset would be for whitehats who would not do malicious things with the current version either.

j / k navigate · click thread line to collapse