Did exactly that for the actual filing — Python, mentioned in the post. The 23 numbers were a probe, not the goal: I wanted to understand how it works.
This kind of thing beats me. Why should a "Large Language Model" be expected to act as a calculator. Clue one is on the name, clue two might be an understanding that it is based on statistics, it is not the deterministic tool you need.