The problem with most such questions is that these answer are likely patterns from training data. It is a typical reply.
The calculator question was interesting because the training data is unlikely to have such dialog as typical. People don't typically ask for a calculator or mention it for simple problems. Everyone has one and its use is somewhat implied.
I tried some variation of "provide accurate answers" or "accuracy is important". These did not result in the model asking for or mentioning a calculator. But as we know, results can be partially random and not always consistent especially in areas lacking strong patterns.
If I mentioned a calculator myself as part of a conversation, it would sometimes mention the need of a calculator. But every time we add more context, we are changing the probabilities for what will be generated.
We know the training data has the associations for LLM poor at math and calculator. But the references are weak. With some changes in prompting it makes the association.
For other examples of weak data and how LLMs respond, checkout these other tests I did - https://www.mindprison.cc/p/the-question-that-no-llm-can-ans...