Fine-tuning the model to force it to understand the insane tokenization its forced to use for its vocabulary and perform accurate addition sounds great until you start trying very large numbers or messing with whatever decoding settings you're using (and let me guess, it was beam search).
Or, you can simply ask it to use a tool, like a calculator for you. This is more reliable than fine-tuning a current technique BPE tokenized model ever will be.