Interactive Text Prediction Explainer (opens in new tab)

(pudding.cool)

55 pointscodenberg7y ago10 comments

10 comments

9 comments · 1 top-level

elicash7y ago· 8 in thread

The Mueller report got me thinking.

How hard would it be to put together a machine learning tool that guessed at the redacted material based on:

(a) The context of the surrounding words, and (b) In cases where just a couple words in a sentence are redacted, using the number of pixels to inform a likely combination of letters that would perfectly "fit" that space?

And what would be the legality?

keyle7y ago

I haven't read it but I suspect they blacked out full sentences. So your sample data would have to have thousands of other reports from the same author for your suggestion to remotely make sense.

Even, that would be beyond what's currently possible.

Some forensic operations over the blacked out sections is potentially more viable.

Interesting thought though.

nmstoker7y ago

I'm no lawyer but it seems hard to imagine it could be illegal as it would be based on supposition rather than fact and only someone in possession of the unredacted report would know for sure if it were right.

delish7y ago

While we're speculating:

If the algorithm predicted, "Then CIA extraodinary-rendition'd $particular_person_of_interest_to_people_with_top_secret_clearance to a black site"

you'd hope to get a judge who's technical enough to understand that the algorithm didn't "know;" it just "predicted."

Point being, I don't personally have much faith that the justice system evaluates tech the way we would.

gattilorenz7y ago

I don't know about the legality, but with a good/recent language model it could be quite feasible. The problem is getting the good language model.

elicash7y ago

I assume you'd have to also give it a list of names that are possibly associated with the specific item.

1 more reply

polm237y ago

You can't do this because language models are not magic.

Let's say you write a sentence like: "[your name] picked up the book." Censor it to take out your name and let a language model fill it in. It might give you "John picked up the book" or "Mary picked up the book", which are grammatically correct, but it has no way of guessing your name reliably because it has no information about the situation of the real world. Language models work by predicting the most likely filler for a slot - if they can predict something it's not surprising.

Emily Bender wrote about this on Twitter.

https://twitter.com/emilymbender/status/1119081131234611201

If you want to use pixel data to fill in text that's a different approach that could work if they did a poor job with black bars, though it seems unlikely they'd do that.

elicash7y ago

Would they need to have done a poor job with the black text? For example, let's say in the sentence: First, ______________ Next, this happened.

If you know where the comma ends and the next sentence begins, you still know the exact number of pixels in between. Let's then assume that there are no proper nouns in the missing text. There probably aren't very many combinations of letters that fit the space perfectly while still making for valid words, given different letters have different pixel widths.

But the idea wouldn't even be to come up with the CORRECT answer. It would be to assign a score to different options of what it could possibly say.

I agree you couldn't do paragraphs.

ShamelessC7y ago

https://threadreaderapp.com/thread/1119118085443559425.html

This guy appears to be attempting exactly this.

j / k navigate · click thread line to collapse

10 comments

9 comments · 1 top-level

elicash7y ago· 8 in thread

The Mueller report got me thinking.

How hard would it be to put together a machine learning tool that guessed at the redacted material based on:

And what would be the legality?

keyle7y ago

I haven't read it but I suspect they blacked out full sentences. So your sample data would have to have thousands of other reports from the same author for your suggestion to remotely make sense.

Even, that would be beyond what's currently possible.

Some forensic operations over the blacked out sections is potentially more viable.

Interesting thought though.

nmstoker7y ago

delish7y ago

While we're speculating:

If the algorithm predicted, "Then CIA extraodinary-rendition'd $particular_person_of_interest_to_people_with_top_secret_clearance to a black site"

you'd hope to get a judge who's technical enough to understand that the algorithm didn't "know;" it just "predicted."

Point being, I don't personally have much faith that the justice system evaluates tech the way we would.

gattilorenz7y ago

I don't know about the legality, but with a good/recent language model it could be quite feasible. The problem is getting the good language model.

elicash7y ago

I assume you'd have to also give it a list of names that are possibly associated with the specific item.

1 more reply

polm237y ago

You can't do this because language models are not magic.

Emily Bender wrote about this on Twitter.

https://twitter.com/emilymbender/status/1119081131234611201

If you want to use pixel data to fill in text that's a different approach that could work if they did a poor job with black bars, though it seems unlikely they'd do that.

elicash7y ago

Would they need to have done a poor job with the black text? For example, let's say in the sentence: First, ______________ Next, this happened.

But the idea wouldn't even be to come up with the CORRECT answer. It would be to assign a score to different options of what it could possibly say.

I agree you couldn't do paragraphs.

ShamelessC7y ago

https://threadreaderapp.com/thread/1119118085443559425.html

This guy appears to be attempting exactly this.

j / k navigate · click thread line to collapse