What do you even
mean by "mechanic" way to reason here?
And what do you expect it'd replicate? As I wrote, I tried looking to see if there were similar pieces of code online, and came up empty. I did that exactly because I was curious about the huge gap in quality between what I'd found before and what GPT4 came up with. Not least because it certainly is not something that happens every time.
> I also think code creation isn't a good area because it is narrower and more mechanically linked by probability than a lot of other areas (so token probability is potentially more informative).
I don't see why that would make it worse. Not least because it also makes it far easier to evaluate the outcome. If anything, we ourselves grasp for formalisms and structure when we want to ensure our reasoning is sound.
Again your use of "mechanically" here also makes absolutely no sense to me.