undefined | Better HN

0 pointsunsupp0rted1mo ago0 comments

Many of us prefer OpenAI's Codex, because we think it's a better product.

No comment on the CEO: I just find the product superior in everything but UI/UX and conversation. It's better at quality code.

0 comments

mliker1mo ago

Who is “us”? It does seem that some scientists prefer Codex for its math capabilities but when it comes to general frontend and backend construction, Claude Code is just as good and possibly made better with its extensive Skills library.

Both codex and Claude code fail when it comes to extremely sophisticated programming for distributed systems

keldaris1mo ago

As a scientist (computational physicist, so plenty of math, but also plenty of code, from Python PoCs to explicit SIMD and GPU code, mostly various subsets of C/C++), I can confirm - Codex is qualitatively better for my usecases than Claude. I keep retesting them (not on benchmarks, I simply use both in parallel for my work and see what happens) after every version update and ever since 5.2 Codex seems further and further ahead. The token limits are also far more generous (and it matters, I found it fairly easy to hit the 5h limit on max tier Claude), but mostly it's about quality - the probability that the model will give me something useful I can iterate on as opposed to discard immediately is much higher with Codex.

For the few times I've used both models side by side on more typical tasks (not so much web stuff, which I don't do much of, but more conventional Python scripts, CLI utilities in C, some OpenGL), they seem much more evenly matched. I haven't found a case where Claude would be markedly superior since Codex 5.2 came out, but I'm sure there are plenty. In my view, benchmarks are completely irrelevant at this point, just use models side by side on representative bits of your real work and stick with what works best for you. My software engineer friends often react with disbelief when I say I much prefer Codex, but in my experience it is not a close comparison.

Scene_Cast21mo ago

Have you tried the latest (3.1 pro) Gemini? In my experience, it's notably better for a similar type of problems than Opus 4.6. However, I don't really use OpenAI products to compare.

1 more reply

physicsguy1mo ago

I've tried both against similar and haven't found it such a clear cut difference. I still find neither are able to fully implement a complex algorithm I worked on in the past correctly with the same inputs. Not sharing exactly the benchmark I'm using but think about something for improving performance of N^2 operations that are common in physics and you can probably guess the train of thought.

2 more replies

ricksunny1mo ago

>As a scientist (computational physicist,

Is there one that you prefer for, i dunno, physics?

zeroxfe1mo ago

I'm in that camp -- I have the max-tier subscription to pretty much all the services, and for now Codex seems to win. Primarily because 1) long horizon development tasks are much more reliable with codex, and 2) OpenAI is far more generous with the token limits.

Gemini seems to be the worst of the three, and some open-weight models are not too bad (like Kimi k2.5). Cursor is still pretty good, and copilot just really really sucks.

the__alchemist1mo ago

Claude Code, Codex, and Cursor are old news. If you're having problems, it's because you're not using the latest hotness: Cludge. Everyone is using it now - don't get left behind.

outside12341mo ago

Cludge has been left behind by Clanker, that’s the new hotness. 45B valuation!

1 more reply

unsupp0rtedOP1mo ago

Us = me and say /r/codex or wherever Codex users are. I've tried both, liked both, but in my projects one clearly produces better results, more maintainable code and does a better job of debugging and refactoring.

sampullman1mo ago

That's interesting, I actively use both and usually find it to be a toss up which one performs better at a given task. I generally find Claude to be better with complex tool calls and Codex to be better at reviewing code, but otherwise don't see a significant difference.

3 more replies

rocketpastsix1mo ago

yea Im not in this "us" you speak of.

1 more reply

zem1mo ago

I've found claude startlingly good at debugging race conditions and other multithreading issues though.

josephg1mo ago

My rule of thumb is that its good for anything "broad", and weaker for anything "deep". Broad tasks are tasks which require working knowledge of lots of random stuff. Its bad at deep work - like implementing a complex, novel algorithm.

LLMs aren't able to achieve 100% correctness of every line of code. But luckily, 100% correctness is not required for debugging. So its better at that sort of thing. Its also (comparatively) good at reading lots and lots of code. Better than I am - I get bogged down in details and I exhaust quickly.

An example of broad work is something like: "Compile this C# code to webassembly, then run it from this go program. Write a set of benchmarks of the result, and compare it to the C# code running natively, and this python implementation. Make a chart of the data add it to this latex code." Each of the steps is simple if you have expertise in the languages and tools. But a lot of work otherwise. But for me to do that, I'd need to figure out C# webassembly compilation and go wasm libraries. I'd need to find a good charting library. And so on.

I think its decent at debugging because debugging requires reading a lot of code. And there's lots of weird tools and approaches you can use to debug something. And its not mission critical that every approach works. Debugging plays to the strengths of LLMs.

DeathArrow1mo ago

Many paying customers say that Anthropic degraded the capability of Opus and Claude Code in the last months and the outcomes are worse. There are even discussions on HN about this.

Last one is from yesterday: https://news.ycombinator.com/item?id=47660925

lhl1mo ago

As some other people mentioned, using both/multiple is the way to go if it's within your means.

I've been working on a wide range of relatively projects and I find that the latest GPT-5.2+ models seem to be generally better coders than Opus 4.6, however the latter tends to be better at big picture thinking, structuring, and communicating so I tend to iterate through Opus 4.6 max -> GPT-5.2 xhigh -> GPT-5.3-Codex xhigh -> GPT-5.4 xhigh. I've found GPT-5.3-Codex is the most detail oriented, but not necessarily the best coder. One interesting thing is for my high-stakes project, I have one coder lane but use all the models do independent review and they tend to catch different subsets of implementation bugs. I also notice huge behavioral changes based on changing AGENTS.md.

In terms of the apps, while Claude Code was ahead for a long while, I'd say Codex has largely caught up in terms of ergonomics, and in some things, like the way it let's you inline or append steering, I like it better now (or where it's far, far, ahead - the compaction is night and day better in Codex).

(These observations are based on about 10-20B/mo combined cached tokens, human-in-the-loop, so heavy usage and most code I no longer eyeball, but not dark factory/slop cannon levels. I haven't found (or built) a multi-agent control plane I really like yet.)

kasey_junk1mo ago

Codex won me over with one simple thing. Reliability. It crashed less, had less load shedding and its configuration is well designed.

I do regular evaluation of both codex and Claude (though not to statistical significance) and I’m of the opinion there is more in group variance on outcome performance than between them.

baq1mo ago

This is the way. Eg. IME Gemini is really damn good at sql.

Razengan1mo ago

I have been using Codex AND Claude side by side for the same project*, with the same prompts.

Codex has been consistently better on almost every level.

* (an open source framework for 2D games in Godot 4.6 GDScript, mostly using AI to review existing code)

7thpower1mo ago

Not a scientist and use codex for anything complex.

I enjoy using CC more and use it for non coding tasks primarily, but for anything complex (honestly most of what I do is not that complex), I feel like I am trading future toil for a dopamine hit.

baq1mo ago

I’m one of those ‘us’, Claude’s outputs require significant review and iteration effort (to put it bluntly they get destroyed by gpt and Gemini). I’m basically using sonnet to do code search and write up since it is a better (more human-like) writer than gpt and faster and more reliable than gemini, but that’s about it.

bko1mo ago

I also find Codex much more generous in terms of what you get with a Pro ($20/mo) subscription. I use it pretty much non-stop and I have yet to hit a limit. Weekly reset is much better as well.

DeathArrow1mo ago

I prefer GLM 5.1 and MiniMax 2.7. With a better harness like Forge Code, I have better results for way less money than by using GPT and Opus.

jbergqvist1mo ago

Usage limits are more generous and GPT 5.4 is a good model, but yes, UI/UX lags behind Claude Code. Currently I'm especially missing /rewind with code restoration and proper support for plugin marketplaces

KaiserPro1mo ago

GPT/claude/gemini is pretty interchangeable at this point.

baq1mo ago

Absolutely not the case. They're complementary.

1 more reply

shevy-java1mo ago

Does this work for people? To me having a "better product" would be completely irrelevant if the use cases are evil.

thaoanh4041mo ago

i find myself being more productive with codex/copilot on coding tasks, but claude does seem to be better at planning

MrSkelter29d ago

Here’s a reality check.

There are two types of vaccine be coders. Those who review the code generated and those who don’t.

Either because they don’t understand code at all, or because they don’t have time and don’t care.

Code quality is only one factor. Naive vibe coders, who don’t code otherwise, rate performance based on output alone.

aaa_aaa1mo ago

Shill talk

j / k navigate · click thread line to collapse

0 comments

mliker1mo ago

Both codex and Claude code fail when it comes to extremely sophisticated programming for distributed systems

keldaris1mo ago

Scene_Cast21mo ago

Have you tried the latest (3.1 pro) Gemini? In my experience, it's notably better for a similar type of problems than Opus 4.6. However, I don't really use OpenAI products to compare.

1 more reply

physicsguy1mo ago

2 more replies

ricksunny1mo ago

>As a scientist (computational physicist,

Is there one that you prefer for, i dunno, physics?

zeroxfe1mo ago

Gemini seems to be the worst of the three, and some open-weight models are not too bad (like Kimi k2.5). Cursor is still pretty good, and copilot just really really sucks.

the__alchemist1mo ago

Claude Code, Codex, and Cursor are old news. If you're having problems, it's because you're not using the latest hotness: Cludge. Everyone is using it now - don't get left behind.

outside12341mo ago

Cludge has been left behind by Clanker, that’s the new hotness. 45B valuation!

1 more reply

unsupp0rtedOP1mo ago

sampullman1mo ago

3 more replies

rocketpastsix1mo ago

yea Im not in this "us" you speak of.

1 more reply

zem1mo ago

I've found claude startlingly good at debugging race conditions and other multithreading issues though.

josephg1mo ago

DeathArrow1mo ago

Many paying customers say that Anthropic degraded the capability of Opus and Claude Code in the last months and the outcomes are worse. There are even discussions on HN about this.

Last one is from yesterday: https://news.ycombinator.com/item?id=47660925

lhl1mo ago

As some other people mentioned, using both/multiple is the way to go if it's within your means.

kasey_junk1mo ago

Codex won me over with one simple thing. Reliability. It crashed less, had less load shedding and its configuration is well designed.

I do regular evaluation of both codex and Claude (though not to statistical significance) and I’m of the opinion there is more in group variance on outcome performance than between them.

baq1mo ago

This is the way. Eg. IME Gemini is really damn good at sql.

Razengan1mo ago

I have been using Codex AND Claude side by side for the same project*, with the same prompts.

Codex has been consistently better on almost every level.

* (an open source framework for 2D games in Godot 4.6 GDScript, mostly using AI to review existing code)

7thpower1mo ago

Not a scientist and use codex for anything complex.

I enjoy using CC more and use it for non coding tasks primarily, but for anything complex (honestly most of what I do is not that complex), I feel like I am trading future toil for a dopamine hit.

baq1mo ago

bko1mo ago

I also find Codex much more generous in terms of what you get with a Pro ($20/mo) subscription. I use it pretty much non-stop and I have yet to hit a limit. Weekly reset is much better as well.

DeathArrow1mo ago

I prefer GLM 5.1 and MiniMax 2.7. With a better harness like Forge Code, I have better results for way less money than by using GPT and Opus.

jbergqvist1mo ago

KaiserPro1mo ago

GPT/claude/gemini is pretty interchangeable at this point.

baq1mo ago

Absolutely not the case. They're complementary.

1 more reply

shevy-java1mo ago

Does this work for people? To me having a "better product" would be completely irrelevant if the use cases are evil.

thaoanh4041mo ago

i find myself being more productive with codex/copilot on coding tasks, but claude does seem to be better at planning

MrSkelter29d ago

Here’s a reality check.

There are two types of vaccine be coders. Those who review the code generated and those who don’t.

Either because they don’t understand code at all, or because they don’t have time and don’t care.

Code quality is only one factor. Naive vibe coders, who don’t code otherwise, rate performance based on output alone.

aaa_aaa1mo ago

Shill talk

j / k navigate · click thread line to collapse