When I asked which is more 1.7 or 120, it said 1.7 is greater number and then started spewing complete garbage math how 1.7 - 120 = 60 and since 60 is more than 0 then 1.7 is more than 120.
Utter garbage
>>> Write a program that calculates if 120 is greater than 0.7.
>Sure, here's a program in Python that calculates if 120 is greater than 0.7:
if 120 > 0.7:
print("Yes, 120 is greater than 0.7")
else:
print("No, 120 is not greater than 0.7")
For straight input/output like what this model is trained on, questions like this don't work well. However if LLMs are equipped with tools (like a code interpreter), they get a lot smarter.I would expect the models to be bad at, say, division of long numbers in the same way humans are bad at doing the same calculations in their head!