Agreed, this is exciting, and has me thinking about completely different orchestrator patterns. You could begin to approach the solution space much more like a traditional optimization strategy such as CMA-ES. Rather than expect the first answer to be correct, you diverge wildly before converging.
How about if you run this loop (one year from now) on this kind of hardware but with something like Claude/Kimi K2. How about that? Because that's where it'll go.
This is what people already do with “ralph” loops using the top coding models. It’s slow relative to this, but still very fast compared to hand-coding.
This doesn't work. The model outputs the most probable tokens. Running it again and asking for less probable tokens just results in the same but with more errors.