undefined | Better HN

0 pointsrcme3y ago0 comments

Yea… despite the massive amounts of data being fed into these models, the model quality is still data-bound. There is no way to produce models like GPT-3 with manually annotated data.

0 comments

3 comments · 1 top-level

lumost3y ago· 2 in thread

This is suddenly a highly debatable claim. Traditional ML was "data hungry" and required massive datasets for features and difficult to acquire "clean" labels.

You can ask ChatGPT to generate these datasets now. However it's unclear if future models will rely on such datasets if large models already have the answer. Here is an example of a ChatGPT generated dataset for a misinformation classifier. ChatGPT Can generate 10 examples every time I hit enter, I asked it to create this dataset using the format.

sentence, misinformation (1,0), notes

---

The earth is flat, 1, This claim is widely debunked by scientific evidence and research.

Vaccines are harmful and cause autism, 1, This claim has been disproven by multiple scientific studies and is not supported by medical evidence.

The Holocaust never happened, 1, This claim is a historical denial and disregards the overwhelming evidence of the genocide of Jewish people during World War II.

Humans and dinosaurs coexisted, 1, This claim is not supported by scientific evidence and is a popular conspiracy theory.

Water boils at 212 degrees Fahrenheit, 0, This is the standard boiling point of water at sea level.

The moon landing was faked, 1, This claim is a conspiracy theory and is not supported by any evidence.

Climate change is not caused by human activity, 1, This claim is not supported by the overwhelming majority of scientific evidence and research.

The sun revolves around the earth, 1, This claim was disproven by scientific evidence in the 16th century and is now considered a flat-earth theory.

HIV does not cause AIDS, 1, This claim is not supported by scientific evidence and has been disproven by multiple studies.

Vaccines are safe and effective, 0, This claim is supported by the majority of scientific evidence and research.

rcmeOP3y ago

I was talking about ChatGPT itself. It could be made better with more data.

lumost3y ago

however that data may not come from human labels.

j / k navigate · click thread line to collapse