Yeah, they seem to be there on high school math problems today, there aren't that many variations on them and there are billions of examples of data on them so LLM can saturate those.
Just don't assume they are this reliable on solving real world math tasks yet, those are more varied still and stump models.