undefined | Better HN

0 pointsmrob4mo ago0 comments

>No implementations of models you’re talking to today are just raw autorrgressive predictors, taking the most likely next token.

Set the temperature to zero and that's exactly what you get. The point is the randomness is something applied externally, not a "core concept" for the LLM.

0 comments

CamperBob24mo ago

Set the temperature to zero and that's exactly what you get.

In some NN implementations, randomness is actually pretty important to keep the gradients from getting stuck at local minima/maxima. Is that true for LLMs, or is it not something that applies at all?

eru4mo ago

Are you talking about training?

CamperBob24mo ago

I'm not sure, hence the question. AFAIK temperature only comes into play at inference time once the distribution is known, but I don't know if there are other places where random numbers are involved.

1 more reply

nostrebored4mo ago

The amount of problems where people are choosing a temperature of 0 are negligible though. The reason I chose the wording “implementations of models you’re talking to today” was because in reality this is almost never where people land, and certainly not what any popular commercial surfaces are using (Claude code, any LLM chat interface).

And regardless, turning this into a system that has some notion of strategic consistency or contextual steering seems like a remarkably easy problem. Treating it as one API call in, one deterministic and constrained choice out is wrong.

j / k navigate · click thread line to collapse

0 pointsmrob4mo ago0 comments

>No implementations of models you’re talking to today are just raw autorrgressive predictors, taking the most likely next token.

Set the temperature to zero and that's exactly what you get. The point is the randomness is something applied externally, not a "core concept" for the LLM.

0 comments

CamperBob24mo ago

Set the temperature to zero and that's exactly what you get.

In some NN implementations, randomness is actually pretty important to keep the gradients from getting stuck at local minima/maxima. Is that true for LLMs, or is it not something that applies at all?

eru4mo ago

Are you talking about training?

CamperBob24mo ago

1 more reply

nostrebored4mo ago

j / k navigate · click thread line to collapse