undefined | Better HN

0 pointsgchamonlive23h ago0 comments

This could be right for the current architecture of LLMs, but you can come up with specialized large language models that can more efficiently use tokens for a specific subset of problems by encoding the information differently (https://www.nature.com/articles/d41586-024-03214-7).

So if instead of text we come up with a different representation for mathematical or physical problems, that could both improve the quality of the output while reducing the amount of transformers needed for decoding and encoding IO and for internal reasoning.

There are also difference inference methods, like autoregressive and diffusion, and maybe others we haven't discovered yet.

You combine those variables, along with the internal disposition of layers, parameter size and the actual dataset, and you have such a large search space for different models that no one can reliably tell if LLM performance is going to flatline or continue to improve exponentially.

0 comments

ifdefdebug20h ago

> So if instead of text we come up with a different representation for mathematical or physical problems, that could both improve

But then, wouldn't we first have to translate all of our current math and physics knowledge into that new representation in order to be able to train a model on it? Looks like a tremendous amount of work to me.

gchamonliveOP20h ago

Yes, but by then you already have general LLMs capable of helping with the work. And even if you didn't, if that's what it would take to advance research in these fields, that would be a justifiable effort.

coldtea21h ago

>This could be right for the current architecture of LLMs, but you can come up with specialized large language models that can more efficiently use tokens for a specific subset of problems by encoding the information differently.

That's precisely what happens on the bad side of a S curve.

gchamonliveOP20h ago

Progress don't stop however, and the S curve resets, because then you are optimizing a new architecture.

j / k navigate · click thread line to collapse

0 comments

ifdefdebug20h ago

> So if instead of text we come up with a different representation for mathematical or physical problems, that could both improve

gchamonliveOP20h ago

coldtea21h ago

That's precisely what happens on the bad side of a S curve.

gchamonliveOP20h ago

Progress don't stop however, and the S curve resets, because then you are optimizing a new architecture.

j / k navigate · click thread line to collapse