undefined | Better HN

0 pointsedouard-harris5y ago0 comments

> If you pay just a little attention, you absolutely can: GPT-3 is not saying anything.

Check out Figure 3.13 in the GPT-3 paper, https://arxiv.org/pdf/2005.14165.pdf.

The authors experimented with 200-word news articles, to see whether 80 human judges could tell the difference between human-generated and GPT-3-generated ones.

It turns out they could not: the human judges correctly identified GPT-3-generated content only 52% of the time, essentially as good as random guessing. (And no, the machine-generated articles were not cherry-picked for the experiment.)

0 comments

3 comments · 2 top-level

YeGoblynQueenne5y ago· 1 in thread

>> The authors experimented with 200-word news articles, to see whether 80 human judges could tell the difference between human-generated and GPT-3-generated ones.

I disagree with tsimionescu that this is a good counterpoint. The comment you reply to says that "GPT-3 is not saying anything". The figure you refer to shows that human judges could not tell the difference between human-generated and GPT-3 generated text. That's apples and oranges. That some humans weren't able to detect autogenerated text doesn't say anything about whether the autogenerated text said anything.

FeepingCreature5y ago

However, the comment also said this is a way to tell GPT-3 text from human text.

tsimionescu5y ago

This is a good counterpoint.

I do wonder though how close the articles that GPT-3 produced were to the articles it had been trained on. For example, the Methodist Church Split article that it produced has a lot of very specific facts about the Methodist Church and about the split, which shows that it had texts about that spefic event in the training set.

It also has a sentence which contains a pretty obvious non-sequitur, but it's easy to miss it or assume that it's a mistake that a human made.

So overall, I'm guessing GPT-3 may actually be pretty decent at re-telling a story with different words, which sometimes is very hard hard to distinguish from a human doing the same thing.

They also don't describe the way they programmatically selected the output, though I am willing to believe that they more or less randomly sampled the output from each model.

j / k navigate · click thread line to collapse