You have to be joking. I tried Codex for several hours and it has to be one of the worst models I’ve seen. It was extremely fast at spitting out the worst broken code possible. Claude is fine, but what they said is completely correct. At a certain point, no matter what model you use, llms cannot write good working code. This usually occurs after they’ve written thousands of lines of relatively decent code. Then the project gets large enough that if they touch one thing they break ten others.