undefined | Better HN

0 pointsrcme3y ago0 comments

How would you score them at scale without training some model to differentiate real vs. AI content? If you need to train such a model, where would you get the data from?

0 comments

3 comments · 1 top-level

CuriouslyC3y ago· 2 in thread

We don't need to differentiate AI vs Human, just accurate and well written vs not. We'd do that the same way we've scored stuff at scale so far - grad students and crowdsourcing.

rcmeOP3y ago

The scale of data for these LLMs is well beyond the scale producible via crowdsourcing.

CuriouslyC3y ago

That just isn't true. It's expensive, but entirely doable. Also, it's perfectly normal to perform initial model training on a large data set to capture the statistical properties of language, then perform a second stage of model training on more curated data to cause the model to actually do what you want.

1 more reply

j / k navigate · click thread line to collapse