Which is why humans use calculators. That is the key point being made secondary to the reliability. The LLM "knows" it is bad at Math. It knows the purpose of calculators. However, doesn't use this information to inform the user.
It could also propose to the user it could write the answer using code. It doesn't do that either.