undefined | Better HN

0 pointssampullman1mo ago0 comments

That's interesting, I actively use both and usually find it to be a toss up which one performs better at a given task. I generally find Claude to be better with complex tool calls and Codex to be better at reviewing code, but otherwise don't see a significant difference.

0 comments

SOLAR_FIELDS1mo ago

If you want to find an advocate for Codex that can give a pretty good answer as to why they think it's better, go ask Eric Provencher. He develops https://repoprompt.com/. He spends a lot of time thinking in this space and prefers Codex over Claude, though I haven't checked recently to see if he still has that opinion. He's pretty reachable on Discord if you poke around a bit.

hirako20001mo ago

Quite irrelevant what factions think. This or that model may be superior for these and those use cases today, and things will flip next week.

Also. RLHF mean that models spit out according to certain human preference, so it depends what set of humans and in what mood they've been when providing the feedback.

SOLAR_FIELDS1mo ago

On the contrary, I very much care about what the other factions think because I want to know if things have already flipped and the easiest way to do so is just ask someone who's been using the tool. Of course the correct thing to do is to set up some simple evals, but there is a subjective aspect to these tools that I think hearing boots on the ground anecdata helps with.

tharkun__1mo ago

Haven't done it in a while, but I've done some tasks with both Codex and Claude to compare. In all cases I asked both to put their analysis and plans for implementation into a .md file. Then I asked the other agent to analyze said file for comparison.

In general, Claude was impressed by what Codex produced and noted the parts where it (i.e. Claude) had missed something vs. Codex "thinking of it".

From a "daily driver" perspective I still use Claude all the time as it has plan mode, which means I can guarantee that it won't break out and just do stuff without me wanting it to. With Codex I have to always specify "Don't implement/change, just tell me" and even then it sometimes "breaks out" and just does stuff. Not usually when I start out and just ask it to plan. But after we've started implementation and I review, a simple question of "Why did you do X?" will turn into a huge refactoring instead of just answering my question.

To be fair, that's what most devs do too (at least at first), when you ask them "Why did you do X" questions. They just assume that you are trying to formulate a "Do Y instead of X" as a question, when really you just don't understand their reasoning but there really might be a good reason for doing X. But I guess LLMs aren't sure of themselves, so any questioning of their reasoning obliterates their ego and just turns them into submissive code monkeys (or rather: exposes them as such) vs. being software engineers that do things for actual reasons (whether you agree with them or not).

cher881mo ago

Codex has plan mode too - /plan

aswanson1mo ago

Any difference in performance on mobile development?

sampullmanOP1mo ago

For that I'm not so sure. I tried both early 2025 and was disappointed in their ability to deal with a TCA based app (iOS) and Jetpack compose stuff on Android, but I assume Opus 4.6 and GPT 5.4 are much better.

j / k navigate · click thread line to collapse

0 comments

SOLAR_FIELDS1mo ago

hirako20001mo ago

Quite irrelevant what factions think. This or that model may be superior for these and those use cases today, and things will flip next week.

Also. RLHF mean that models spit out according to certain human preference, so it depends what set of humans and in what mood they've been when providing the feedback.

SOLAR_FIELDS1mo ago

tharkun__1mo ago

In general, Claude was impressed by what Codex produced and noted the parts where it (i.e. Claude) had missed something vs. Codex "thinking of it".

cher881mo ago

Codex has plan mode too - /plan

aswanson1mo ago

Any difference in performance on mobile development?

sampullmanOP1mo ago

j / k navigate · click thread line to collapse