Dr Teague was also part of the team that found flaws in the Swiss e-voting system used in Australia state elections, nothing was done about and she was written off, the attack was deemed impractical as it required a corrupt official.
She's a national treasure and a regular source of embarrassment for the technologically illiterate bureaucrats responsible for such poor decisions.
I think that hit a bit too close to home for most of the government.
> I can't believe @healthgovau is still saying "The dataset does not contain the personal information of patients." We have shown many of the patients' records can be easily and confidently identified from a few points of medical or childbirth info.
As far as I can tell, 'personal information' is potentially the only thing this data set contains. Further, the information is so personal that the Australian government hoped that it would be infeasible to cross-reference it with other data and use it to identify the persons involved.
I did some work with it a few years ago, and you easily generate the key.
If Australia makes it illegal to re-identify information, what about information that has been re-identified outside Australia then distributed into Australia?
The letter sent to the university [0], claims that re-identifying information is actually illegal, according to the department's understanding. (Nevermind that they also admit that particular law is completely irrelevant to the work of the researcher).
[0] https://www.righttoknow.org.au/request/correspondence_on_re_...
Anyone wanting to abuse the information (i.e. a criminal) would not really care to commit a crime by reidentifying so this law would only prevent people who want to help from doing so.
In the end the are always intelligent criminals (or foreign countries acting against your country for their interest). So you will always be able to buy deanonymized data on the black market. Even more the people doing it for that reason can include leaked/stolen datasets to do deanonymization invested if just public data making it potentially much easier.
I.e., do they mean that nobody they talked to could think of a way to recover the identity of even one individual in the set with 100% certainty? Or is there some information-theoretical or legal standard of anonymization they're claiming to have met?
For "organizations" in general? It means approximately nothing, or if you're feeling particularly generous, it means "we probably remembered to drop the column containing your social security number before publishing this data... this time". You're asking exact specifics of a vague and broad category.
There are some legal standards, information theory, and non-legal organization standards that might being met in some cases - involving adding noise or removing data / making it sparse. https://en.wikipedia.org/wiki/Data_re-identification goes into all the ways that it can go wrong despite the best of intentions. My basic take on this all is: data "always" gets more identifying, not less. Two datasets that were successfully anonymized individually can still be correlated to de-anonymize some or all of the data when combined. Even organizations applying information theory with the best of intentions and proper diligence will eventually make a mistake.
> Indeed, encryption was not necessary – a randomly chosen unique number for each person would have worked.
Scroll down from here: https://www.oaic.gov.au/privacy/privacy-decisions/investigat...
[0] The data had ids for providers (e.g. doctors) as well as patients.
I'm not aware of any legal definitions, but given the thorniness of reidentification I would assume they're insufficient.
[1] https://en.wikipedia.org/wiki/K-anonymity
[2] https://www.wired.com/2007/12/why-anonymous-data-sometimes-i...
The common example is the one-legged child with cancer from a remote town. You can remove a the PII columns and it's pretty easy to find that person.
(The downside is that rare diseases might fall through the cracks.)
... all the while as the government forgets that it’s all available on the internet ️
Now it has hit Australia, but it could be have been any other country since data collection seems to be en vogue. Probably gives the impression of control, the usual.
and now a standard: "Privacy (Australian Government Agencies – Governance) APP Code 2017"
See https://www.oaic.gov.au/privacy/privacy-decisions/investigat...
Edit: I've taken a crack at fixing it now.
Off the top of my head, only the latter is necessary if throwing away a random key for the previous to be equivalent (or run the plaintext through SHA-3 20 times in feedback instead.). Say 100 rounds of AES-256 in feedback. Fixed integer-only fields could be XORed with a private key of the length of the field (OTP).
Any other ideas, please add a comment.