Melbourne professor quits after government pressure about reporting data breach (opens in new tab)

(theguardian.com)

244 pointsnbgl6y ago36 comments

36 comments

33 comments · 11 top-level

Thorrez6y ago· 7 in thread

> The breach so shocked the government, the then attorney general, George Brandis, quickly announced plans to criminalise the act of re-identifying previously de-identified data, although ultimately the legislation never passed before the 2019 election.

If Australia makes it illegal to re-identify information, what about information that has been re-identified outside Australia then distributed into Australia?

rs23296008n16y ago

If you're going to start using logic and reason with this issue then that government will simply outlaw those as well. This government has already set a precedent of having overridden the basic limits of mathematics before. See also: anything to do with encryption.

Tecuane6y ago

Relevant article, for the curious: https://www.independent.co.uk/news/malcolm-turnbull-prime-mi...

1 more reply

shakna6y ago

> If Australia makes it illegal to re-identify information, what about information that has been re-identified outside Australia then distributed into Australia?

The letter sent to the university [0], claims that re-identifying information is actually illegal, according to the department's understanding. (Nevermind that they also admit that particular law is completely irrelevant to the work of the researcher).

[0] https://www.righttoknow.org.au/request/correspondence_on_re_...

tastroder6y ago

Fascinating read, thank you. Another kicker in there seems to be bottom of page 2, start of page 3. They basically assert that now that the data has been taken down no harm can be done anymore and to top it off, suggest that presenting the findings a GovData conference because there subsequently is "no public interest" anymore.

dathinab6y ago

You don't have to go that far.

Anyone wanting to abuse the information (i.e. a criminal) would not really care to commit a crime by reidentifying so this law would only prevent people who want to help from doing so.

In the end the are always intelligent criminals (or foreign countries acting against your country for their interest). So you will always be able to buy deanonymized data on the black market. Even more the people doing it for that reason can include leaked/stolen datasets to do deanonymization invested if just public data making it potentially much easier.

throwawayjava6y ago

Consumer data brokers are a legal (non-criminal) business and they would definitely reidentify then well information if it weren't illegal.

Aeolun6y ago

This sounds so much like a ‘stick my head in the sand’ policy that it’s unreal. Who the hell would stop doing it because it’s illegal...

DoofusOfDeath6y ago· 5 in thread

When organizations claim to have "anonymized" a data set, what exactly does that mean?

I.e., do they mean that nobody they talked to could think of a way to recover the identity of even one individual in the set with 100% certainty? Or is there some information-theoretical or legal standard of anonymization they're claiming to have met?

MaulingMonkey6y ago

> When organizations claim to have "anonymized" a data set, what exactly to that mean?

For "organizations" in general? It means approximately nothing, or if you're feeling particularly generous, it means "we probably remembered to drop the column containing your social security number before publishing this data... this time". You're asking exact specifics of a vague and broad category.

There are some legal standards, information theory, and non-legal organization standards that might being met in some cases - involving adding noise or removing data / making it sparse. https://en.wikipedia.org/wiki/Data_re-identification goes into all the ways that it can go wrong despite the best of intentions. My basic take on this all is: data "always" gets more identifying, not less. Two datasets that were successfully anonymized individually can still be correlated to de-anonymize some or all of the data when combined. Even organizations applying information theory with the best of intentions and proper diligence will eventually make a mistake.

emmelaich6y ago

In this particular example, they produced a random number which use the real id[0] as a seed, then mixed the result with the original id. It was not enough, and, as Teague et al note:

> Indeed, encryption was not necessary – a randomly chosen unique number for each person would have worked.

Scroll down from here: https://www.oaic.gov.au/privacy/privacy-decisions/investigat...

[0] The data had ids for providers (e.g. doctors) as well as patients.

throwawayjava6y ago

There are some mathematical definitions [1], but the fundamental problem is that with enough cross-referencing between databases it's hard to say anything for sure [2]. You never know what data other people might publish in the future.

I'm not aware of any legal definitions, but given the thorniness of reidentification I would assume they're insufficient.

[1] https://en.wikipedia.org/wiki/K-anonymity

[2] https://www.wired.com/2007/12/why-anonymous-data-sometimes-i...

DEADBEEFC0FFEE6y ago

In hwalthcare there usually an ethics panel, that will look at the data, and look for way to reduce re-identification.

The common example is the one-legged child with cancer from a remote town. You can remove a the PII columns and it's pretty easy to find that person.

rzzzt6y ago

One way around that is to drop all cases below a certain occurrence threshold, ie. if there aren't at least 1000 people in the same town with the same condition, they aren't getting into the dataset.

(The downside is that rare diseases might fall through the cracks.)

oska6y ago· 2 in thread

Vanessa Teague:

> I can't believe @healthgovau is still saying "The dataset does not contain the personal information of patients." We have shown many of the patients' records can be easily and confidently identified from a few points of medical or childbirth info.

https://twitter.com/VTeagueAus/status/1236402085974798336

ShroudedNight6y ago

> "The dataset does not contain the personal information of patients."

As far as I can tell, 'personal information' is potentially the only thing this data set contains. Further, the information is so personal that the Australian government hoped that it would be infeasible to cross-reference it with other data and use it to identify the persons involved.

DEADBEEFC0FFEE6y ago

She might be referring to something link the SLK581 statistical linking method.

I did some work with it a few years ago, and you easily generate the key.

kop3166y ago· 2 in thread

To anyone coming to the comments, the title is misleading. The health department is pressuring her "to stop her speaking out about the Medicare and PBS history of over 2.5 million Australians being re-identifiable online due to a government bungle."

rstuart41336y ago

I'm not sure what is misleading about it. She has actually resigned from the University or Melbourne: https://twitter.com/VTeagueAus/status/1233241830994481152

kop3166y ago

The title changed since I posted this. The original title implied the professor leaked the offending data.

aschatten6y ago· 2 in thread

The title is just horrible.

dang6y ago

The best way to complain about a title is to suggest a better one. Better means: more accurate and neutral, preferably using representative language from the article. When someone suggests a better title, we're happy to change it.

Edit: I've taken a crack at fixing it now.

nbglOP6y ago

Yeah, I agree. I had copied it verbatim from the article.

forkexec6y ago· 2 in thread

Pardon my ignorance, but it seems like there should be standard ways of irrevocably anonymizing data and reversible means given a private key.

Off the top of my head, only the latter is necessary if throwing away a random key for the previous to be equivalent (or run the plaintext through SHA-3 20 times in feedback instead.). Say 100 rounds of AES-256 in feedback. Fixed integer-only fields could be XORed with a private key of the length of the field (OTP).

Any other ideas, please add a comment.

akiselev6y ago

Yes, turning data into a bunch of (ideally) random bits using encryption is an effective way of annonimizing.

yoloClin6y ago

Just because the primary key is gone doesn't mean other data can't be cross referenced. Birth dates with external sources, addresses with public registers, etc.

DarthGhandi6y ago· 1 in thread

This is horrible but not surprising, the government was told beforehand it was a bad idea and within a few months ended up with egg on their faces. Instead of remedying the situation they shoot the messenger.

Dr Teague was also part of the team that found flaws in the Swiss e-voting system used in Australia state elections, nothing was done about and she was written off, the attack was deemed impractical as it required a corrupt official.

She's a national treasure and a regular source of embarrassment for the technologically illiterate bureaucrats responsible for such poor decisions.

Aeolun6y ago

> she was written off, the attack was deemed impractical as it required a corrupt official

I think that hit a bit too close to home for most of the government.

basicplus26y ago· 1 in thread

There needs to be Australian Standards developed that everyone must comply with to annonymise personal data

emmelaich6y ago

There was a guideline: "Process for Publishing Sensitive Unit Record Level Public Data as Open Data"

and now a standard: "Privacy (Australian Government Agencies – Governance) APP Code 2017"

See https://www.oaic.gov.au/privacy/privacy-decisions/investigat...

alfiedotwtf6y ago

How long before she gets raided and her copy of the dataset and research gets taken away

... all the while as the government forgets that it’s all available on the internet ️

raxxorrax6y ago

This is the worst kind of personal data leak. Government cannot keep any data safe. The only way is to not collect the information. The reaction of the government is predictable and poor.

Now it has hit Australia, but it could be have been any other country since data collection seems to be en vogue. Probably gives the impression of control, the usual.

eloop6y ago

If the university followed through on that last paragraph why did she resign?

j / k navigate · click thread line to collapse

36 comments

33 comments · 11 top-level

Thorrez6y ago· 7 in thread

If Australia makes it illegal to re-identify information, what about information that has been re-identified outside Australia then distributed into Australia?

rs23296008n16y ago

Tecuane6y ago

Relevant article, for the curious: https://www.independent.co.uk/news/malcolm-turnbull-prime-mi...

1 more reply

shakna6y ago

> If Australia makes it illegal to re-identify information, what about information that has been re-identified outside Australia then distributed into Australia?

[0] https://www.righttoknow.org.au/request/correspondence_on_re_...

tastroder6y ago

dathinab6y ago

You don't have to go that far.

Anyone wanting to abuse the information (i.e. a criminal) would not really care to commit a crime by reidentifying so this law would only prevent people who want to help from doing so.

throwawayjava6y ago

Consumer data brokers are a legal (non-criminal) business and they would definitely reidentify then well information if it weren't illegal.

Aeolun6y ago

This sounds so much like a ‘stick my head in the sand’ policy that it’s unreal. Who the hell would stop doing it because it’s illegal...

DoofusOfDeath6y ago· 5 in thread

When organizations claim to have "anonymized" a data set, what exactly does that mean?

MaulingMonkey6y ago

> When organizations claim to have "anonymized" a data set, what exactly to that mean?

emmelaich6y ago

In this particular example, they produced a random number which use the real id[0] as a seed, then mixed the result with the original id. It was not enough, and, as Teague et al note:

> Indeed, encryption was not necessary – a randomly chosen unique number for each person would have worked.

Scroll down from here: https://www.oaic.gov.au/privacy/privacy-decisions/investigat...

[0] The data had ids for providers (e.g. doctors) as well as patients.

throwawayjava6y ago

I'm not aware of any legal definitions, but given the thorniness of reidentification I would assume they're insufficient.

[1] https://en.wikipedia.org/wiki/K-anonymity

[2] https://www.wired.com/2007/12/why-anonymous-data-sometimes-i...

DEADBEEFC0FFEE6y ago

In hwalthcare there usually an ethics panel, that will look at the data, and look for way to reduce re-identification.

The common example is the one-legged child with cancer from a remote town. You can remove a the PII columns and it's pretty easy to find that person.

rzzzt6y ago

One way around that is to drop all cases below a certain occurrence threshold, ie. if there aren't at least 1000 people in the same town with the same condition, they aren't getting into the dataset.

(The downside is that rare diseases might fall through the cracks.)

oska6y ago· 2 in thread

Vanessa Teague:

https://twitter.com/VTeagueAus/status/1236402085974798336

ShroudedNight6y ago

> "The dataset does not contain the personal information of patients."

DEADBEEFC0FFEE6y ago

She might be referring to something link the SLK581 statistical linking method.

I did some work with it a few years ago, and you easily generate the key.

kop3166y ago· 2 in thread

rstuart41336y ago

I'm not sure what is misleading about it. She has actually resigned from the University or Melbourne: https://twitter.com/VTeagueAus/status/1233241830994481152

kop3166y ago

The title changed since I posted this. The original title implied the professor leaked the offending data.

aschatten6y ago· 2 in thread

The title is just horrible.

dang6y ago

Edit: I've taken a crack at fixing it now.

nbglOP6y ago

Yeah, I agree. I had copied it verbatim from the article.

forkexec6y ago· 2 in thread

Pardon my ignorance, but it seems like there should be standard ways of irrevocably anonymizing data and reversible means given a private key.

Any other ideas, please add a comment.

akiselev6y ago

Yes, turning data into a bunch of (ideally) random bits using encryption is an effective way of annonimizing.

yoloClin6y ago

Just because the primary key is gone doesn't mean other data can't be cross referenced. Birth dates with external sources, addresses with public registers, etc.

DarthGhandi6y ago· 1 in thread

She's a national treasure and a regular source of embarrassment for the technologically illiterate bureaucrats responsible for such poor decisions.

Aeolun6y ago

> she was written off, the attack was deemed impractical as it required a corrupt official

I think that hit a bit too close to home for most of the government.

basicplus26y ago· 1 in thread

There needs to be Australian Standards developed that everyone must comply with to annonymise personal data

emmelaich6y ago

There was a guideline: "Process for Publishing Sensitive Unit Record Level Public Data as Open Data"

and now a standard: "Privacy (Australian Government Agencies – Governance) APP Code 2017"

See https://www.oaic.gov.au/privacy/privacy-decisions/investigat...

alfiedotwtf6y ago

How long before she gets raided and her copy of the dataset and research gets taken away

... all the while as the government forgets that it’s all available on the internet ️

raxxorrax6y ago

This is the worst kind of personal data leak. Government cannot keep any data safe. The only way is to not collect the information. The reaction of the government is predictable and poor.

Now it has hit Australia, but it could be have been any other country since data collection seems to be en vogue. Probably gives the impression of control, the usual.

eloop6y ago

If the university followed through on that last paragraph why did she resign?

j / k navigate · click thread line to collapse