undefined | Better HN

0 points13years1y ago0 comments

> If I ask sonnet what's under my bed it tells me it can't know and tells me to look under it myself.

The problem with most such questions is that these answer are likely patterns from training data. It is a typical reply.

The calculator question was interesting because the training data is unlikely to have such dialog as typical. People don't typically ask for a calculator or mention it for simple problems. Everyone has one and its use is somewhat implied.

I tried some variation of "provide accurate answers" or "accuracy is important". These did not result in the model asking for or mentioning a calculator. But as we know, results can be partially random and not always consistent especially in areas lacking strong patterns.

If I mentioned a calculator myself as part of a conversation, it would sometimes mention the need of a calculator. But every time we add more context, we are changing the probabilities for what will be generated.

We know the training data has the associations for LLM poor at math and calculator. But the references are weak. With some changes in prompting it makes the association.

For other examples of weak data and how LLMs respond, checkout these other tests I did - https://www.mindprison.cc/p/the-question-that-no-llm-can-ans...

0 comments

1 comments · 1 top-level

IanCal1y ago

I got good responses from some simply by telling it to be aware of its limitations.

And my first test with asking for the episode of Gilligan's Island sort of worked with sonnet, no prefix and no system prompt, temp 0. It got the episode number wrong and sometimes the season, but the right episode. Higher temperatures seemed unreliable at getting the right episode name. Split into asking for the name, then the season then the episode it worked correctly, but that's perhaps a bit more chance.

j / k navigate · click thread line to collapse