undefined | Better HN

0 pointsHarHarVeryFunny1y ago0 comments

The fact that these reasoning models may compute for extended durations, using exponentially more compute for linear performance gains (says OpenAI), resulting in outputs that while better are not necessarily any longer (more tokens) than before, all point to a different architecture - some type of iterative calling of the underlying model (essentially a reasoning agent using the underlying model).

A plain LLM does not use variable compute - it is a fixed number of transformer layers, a fixed amount of compute for every token generated.

0 comments

throwaway3141551y ago

Architecture generally refers to the design of the model. In this case, the underlying model is still a transformer based llm and so is its architecture.

What's different is the method for _sampling_ from that model where it seems they have encouraged the underlying LLM to perform a variable length chain of thought "conversation" with itself as has been done with o1. In addition, they _repeat_ these chains of thought in parallel using a tree of some sort to search and rank the outputs. This apparently scales performance on benchmarks as you scale both length of the chain of thought and the number of chains of thought.

HarHarVeryFunnyOP1y ago

No disagreement, although the sampling + search procedure is obviously adding quite a lot to the capabilities of the system as a whole, so it really should be considered as part of the architecture. It's a bit like AlphaGo or AlphaZero - generating potential moves (cf LLM) is only a component of the overall solution architecture, and the MCTS sampling/search is equally (or more) important.

throwaway3141551y ago

Ah, I see. Yeah that's a fair assessment and in hindsight is probably the way architecture is being used in the article.

famouswaffles1y ago

I think throwaway already explained what i was getting at.

That said, i probably did downplay the achievement. It may not be a "new" idea to do something like this but finding an effective method for reflection that doesn’t just lock you into circular thinking and is applicable beyond well defined problem spaces is genuinely tough and a breakthrough.

j / k navigate · click thread line to collapse

0 pointsHarHarVeryFunny1y ago0 comments

A plain LLM does not use variable compute - it is a fixed number of transformer layers, a fixed amount of compute for every token generated.

0 comments

throwaway3141551y ago

Architecture generally refers to the design of the model. In this case, the underlying model is still a transformer based llm and so is its architecture.

HarHarVeryFunnyOP1y ago

throwaway3141551y ago

Ah, I see. Yeah that's a fair assessment and in hindsight is probably the way architecture is being used in the article.

famouswaffles1y ago

I think throwaway already explained what i was getting at.

j / k navigate · click thread line to collapse