undefined | Better HN

0 pointsmihaic1y ago0 comments

I mean, given that o1-preview takes sometimes a minute to answer, I'd imagine that they could append the prompt "Write a program and run it as well" to double check itself. It seems like they just don't trust themselves enough to run code that they generate, even sandboxed.

0 pointsmihaic1y ago0 comments

0 comments

5 comments · 2 top-level

afro881y ago· 3 in thread

o1 gets this correct, through pure reasoning without a program. OP was likely using GPT-4(o|o-mini)

mihaicOP1y ago

The example for "strawnberrystrawberry" (so the word concatenated with itself) was counted by O1 to have 4 r's.

afro881y ago

https://chatgpt.com/share/66eb38c0-22cc-8004-9d29-024de2e39d...

1 more reply

flimsypremise1y ago

yeah because now that we've all been asking about it, that answer is in its training data. the trick with LLMs is always "is the answer in the training data".

j_maffe1y ago

I think it'd be just too expensive to incorporate code-writing in CoT. Maybe once they implement having a cluster of different model sizes in one answer it'll work out.

j / k navigate · click thread line to collapse