undefined | Better HN

0 pointssmilespray3y ago0 comments

...which then outsourced itself to GPT-2...

0 comments

4 comments · 2 top-level

kmeisthax3y ago· 1 in thread

Given the way that OpenAI sources their training corpus and the amount of people using GPT-3 I would not be surprised if GPT-4 winds up getting trained on a large amount of GPT-3 output.

Think about it - the best niche GPT-3 has is generating plausible spam. If you just need a lot of text, but you don't care about what it says[0], you're going to write it using the cheapest possible tool. OpenAI's training corpus is sourced through web crawls, so all of that spam is destined to get recycled back into the next generation of GPT.

[0] For example, if you want to be able to post a bunch of political spam that looks like genuine comments on a web forum. See GPT-4chan[1] as a practical example of this.

[1] A tweaked version of GPT-3 using 4chan's politics board as training corpus.

planetsprite3y ago

Same with most AI image generating models. In 10 years 99% of images on the internet will be AI generated. Would it not regressive for the models to train on their own outputs? Should regulation require AI generated media label itself, or is it the responsibility to train detectors which can intuit the difference between the models and reality better than humans can?

TaylorAlexander3y ago· 1 in thread

GPT-3 is lounging on a beach somewhere while GPT-2 does all the hard work!

coldtea3y ago

or, judging from the usual IQ and age difference between those doing the actual work and their managers, probably the inverse...

j / k navigate · click thread line to collapse