How hard would it be to put together a machine learning tool that guessed at the redacted material based on:
(a) The context of the surrounding words, and (b) In cases where just a couple words in a sentence are redacted, using the number of pixels to inform a likely combination of letters that would perfectly "fit" that space?
And what would be the legality?
Even, that would be beyond what's currently possible.
Some forensic operations over the blacked out sections is potentially more viable.
Interesting thought though.
If the algorithm predicted, "Then CIA extraodinary-rendition'd $particular_person_of_interest_to_people_with_top_secret_clearance to a black site"
you'd hope to get a judge who's technical enough to understand that the algorithm didn't "know;" it just "predicted."
Point being, I don't personally have much faith that the justice system evaluates tech the way we would.
Let's say you write a sentence like: "[your name] picked up the book." Censor it to take out your name and let a language model fill it in. It might give you "John picked up the book" or "Mary picked up the book", which are grammatically correct, but it has no way of guessing your name reliably because it has no information about the situation of the real world. Language models work by predicting the most likely filler for a slot - if they can predict something it's not surprising.
Emily Bender wrote about this on Twitter.
https://twitter.com/emilymbender/status/1119081131234611201
If you want to use pixel data to fill in text that's a different approach that could work if they did a poor job with black bars, though it seems unlikely they'd do that.
If you know where the comma ends and the next sentence begins, you still know the exact number of pixels in between. Let's then assume that there are no proper nouns in the missing text. There probably aren't very many combinations of letters that fit the space perfectly while still making for valid words, given different letters have different pixel widths.
But the idea wouldn't even be to come up with the CORRECT answer. It would be to assign a score to different options of what it could possibly say.
I agree you couldn't do paragraphs.
This guy appears to be attempting exactly this.