Nothing I said contradicts this.
Here is the first attempt of what I'm testing. [0] Haiku can get the correct answer to `floor( (1234567 * 8901234) / 12345 )` or
``` Math.floor( (Math.floor(Math.random() * 9000000 + 1000000) * Math.floor(Math.random() * 9000000 + 1000000)) / Math.floor(Math.random() * 9000000 + 1000000) ) ```
Given this Haiku will give a correct answer 77.8% of the time. Add one digit or remove a digit, it is very highly predictable also.
That is the WHOLE point. The models are predictable!
Given that prompt Sonnet at 37-digit × 37-digit (~10³⁷) never quits a predictable percentage of the time!
And, Opus at 80-digit × 80-digit simply quits after 9 seconds and 333 tokens!
This is the amazing thing people are not discussing. The models are very predictable.
The AI companies are not posting this information because it shows how unreliable the models are, however, I think there is great virtue that the models are consistently unreliable.
[0] https://github.com/adam-s/agent-tuning/blob/main/application...