undefined | Better HN

0 pointsreaditalready1mo ago0 comments

Pretraining + RL itself is the scaling limit. If you feed it the entire dataset before 1905, LLMs aren't going to come up with general relativity. It has no concept of physics, or time even.

AGI happens when you DON'T need to scale pertaining + RL.

0 comments

acuozzo1mo ago

> If you feed it the entire dataset before 1905, LLMs aren't going to come up with general relativity.

Link?

Jensson1mo ago

You don't need a source for that, an LLM with such little data is barely able to form proper sentences.

acuozzo1mo ago

> an LLM with such little data

There is a mountain of data pre-1905. Certainly enough to train a decent 30B parameter model.

Now, digitizing & OCRing all of that data... THAT is a challenge.

rishabhaiover1mo ago

AGI maybe not, but it is reaching disruption level intelligence in the SWE domain.

j / k navigate · click thread line to collapse

0 pointsreaditalready1mo ago0 comments

Pretraining + RL itself is the scaling limit. If you feed it the entire dataset before 1905, LLMs aren't going to come up with general relativity. It has no concept of physics, or time even.

AGI happens when you DON'T need to scale pertaining + RL.

0 comments

acuozzo1mo ago

> If you feed it the entire dataset before 1905, LLMs aren't going to come up with general relativity.

Link?

Jensson1mo ago

You don't need a source for that, an LLM with such little data is barely able to form proper sentences.

acuozzo1mo ago

> an LLM with such little data

There is a mountain of data pre-1905. Certainly enough to train a decent 30B parameter model.

Now, digitizing & OCRing all of that data... THAT is a challenge.

rishabhaiover1mo ago

AGI maybe not, but it is reaching disruption level intelligence in the SWE domain.

j / k navigate · click thread line to collapse