undefined | Better HN

0 pointsanon2911y ago0 comments

> ChatGPT has literally been trained on so much more data than a human would ever see

This is a common fallacy. The average human ingests a few dozen GB of data a day [1] [2].

ChatGPT 4 was trained on 13 trillion tokens. Say a token is 4 bytes (it's more like 3, but we're being conservative). That's 52 trillion bytes or 52 terabytes.

Say the average human only consumes the lower estimate of 30 GB a day. That means it would take a human 1625 days to consume the number of tokens ChatGPT was trained on, or 4.5 years. Assuming humans and the LLM start from the same spot [3], the proper question is... is ChatGPT smarter than a 4.5 year old. If we use the higher estimate, then we have to ask if ChatGPT is smarter than a 2 year old. Does ChatGPT hallucinate more or less than the average toddler?

The cognitive bias I've seen everywhere is the idea that humans are trained on a small amount of data. Nothing is further from the truth. Humans require training on an insanely large amount of data. A 40 year old human has been trained on orders of magnitudes more data than I think we even have available as data sets. If you prevent a human from being trained on this amount of data through sensory deprivation they go crazy (and hallucinate very vividly too!).

No argument about energy, but this is a technology problem.

[1] https://www.tech21century.com/the-human-brain-is-loaded-dail...

[2] https://kids.frontiersin.org/articles/10.3389/frym.2017.0002...

[3] this is a bad assumption since LLMs are randomly initialized whereas humans seem to be born with some biases that significantly aid in the acquisition of language and social skills

0 comments

1 comments · 1 top-level

vlovich1231y ago

Almost all the data that humans are being trained on are how to participate in society and to have basic motor and language skills. That's what all that observation amounts to.

A student consumes only ~6 hours of relevant material a day on various in textual form (textbooks) with minimal guidance from a domain expert and some guidance from peers.

Have you read the studies backing your links? The methodology for how they come up with that estimate is highly questionable especially on its own let alone when it comes to comparing with LLMs. Domain experts in the field are pretty confident that LLMs are trained on more actual information than humans.

> If you prevent a human from being trained on this amount of data through sensory deprivation they go crazy (and hallucinate very vividly too!).

People who are deaf & blind experience a significant amount of sensory deprivation compared with the typical human but do not go crazy or start hallucinating. This suggests that your analysis is flawed. For humans communication is the important bit - as long as we have some kind of communication mechanism we can achieve quite a fair bit.

j / k navigate · click thread line to collapse