Codex logging bug may write TBs to local SSDs (opens in new tab)

(github.com)

502 pointsvantareed4d ago269 comments

269 comments

113 comments · 33 top-level

b--l4d ago· 25 in thread

Codex is one of the most infamous examples of slopware. Just having the window unhidden on my mac will cause it to use 100% of the GPU displaying the spinner message.

THE SPINNER MESSAGE CAUSES 100% GPU USAGE ON AN MBP M5!!

So any time you're waiting on the model (which is 90% of the time), your fans will be blasting (careful, don't use it on battery).

The issue is on github and close to 6 months old. Probably since the release of vibe coded junk. I would literally fix it myself but it's closed source for whatever reason.

There are many discussions about which model is better, or if vibe coding is even possible. I point you to the extent of what one of the most well funded, money flush, well staffed model making companies can do with vibe coding.

To me a screwup this bad (where the CEO has already made it clear they're now "focussing on coding") indicates that there's something truly broken in the company. No one on polymarket expects them to have a leading model any time soon for example.

It's a tragedy. The world needs competition to anthropic.

jofzar4d ago

> Codex is one of the most infamous examples of slopware

Woah, let's not forget Claude code is right there

varjag4d ago

Right, just yesterday I found my laptop kinda hot. And what do you think, it was good old Claude deciding to load a few cores with completely idling prompts.

kokada4d ago

Not that Claude Code is much better, I just hit this issue[1] because it seems setting DO_NOT_TRACK=1 seems enough to get a really strange behavior in the newest versions of CC.

[1]: https://github.com/anthropics/claude-code/issues/69238#issue...

Edit: I think I misunderstood OP, they're saying that CC is even worse and not better than Codex CLI.

mvATM994d ago

Yeah exactly.

I'm not exactly building TUI's every day, but even i felt pain when i read that "small game engine" post

r_lee4d ago

if we are at 10x with AI and near AGI or ASI, then how is it possible that these products (Codex, Claude Code CLI) are still such garbage?

shouldn't this "agentic AI revolution" have long solved this already?

no way they're over there saying "we are on it plz wait" or that "it's too much effort"?

hombre_fatal4d ago

Even with AI, you still need attention to detail and TLC to polish software, something that's always in short supply.

igleria4d ago

This is the biggest elephant in the room I have seen in my decade+ career. At the same time, look how bad Apple is in software compared to its hardware... It's not an AI only problem, it's almost like software in general gets a free pass on being very unsafe or low quality because no one wants to face the same "profit reducing red tape" that civil engineers or similar face.

CharlieDigital4d ago

Anthropic were the progenitors of the Model Context Protocol. Claude Code does not fully implement the client end of the protocol. A protocol; a literal pre-defined spec that an agent should be able to one-shot. Neither does Codex. Codex does not implement MCP Prompts.

(I want Codex to implement MCP Prompts because then we have one central way to ship skills from a server).

The fact that neither platform can implement a protocol given what is functionally infinite frontier model tokens really says a lot. I do not care what kind of random project some influencer can ship with a swarm of 1000 agents. If you cannot make the basics work, it is a farce.

jeffybefffy5194d ago

Because vibe coding is a toy… thats the secret.

You can use it to accelerate development certainly, but that requires careful change->review cycles. The developer still needs to be in heavy control, versus vibe coding having an agent own the code base.

fg1374d ago

You are asking too many good questions.

user439284d ago

The products generally work just fine on my MacBook.

I have not encountered major issues in either the Claude Code CLI, the Codex Desktop app, or Claude Desktop app.

They generally get the job done. I don't measure disk writes or analyze the GPU usage.

Zababa4d ago

A simple explanation is that they are "good enough" for most people and they have better things to do. Even if tomorrow I was 100 times as productive, I still wouldn't have time to do literally everything and I would have to prioritize.

nicce4d ago

Not only Codex, but I can't leave ChatGPT app in macOS open for few hours, because it will consume 60 gigabytes of RAM over time and crashes all the apps.

Mindboggling. Or can't use Google's AI Studio in browser because it takes 100% CPU.

Need to write own app for everything???

porridgeraisin4d ago

the damn chat.openai.com webapp lags a lot as well on long chats, typing takes so long.

xpct4d ago

Well thank you for your service. I thought about trying out Codex after the disaster that is Claude Code. I'll be fine without either one on my machine

jofzar4d ago

Imo codex is significantly better then Claude code for me ATM.

comboy4d ago

I mean, Codex CLI is really bad. But Claude's CLI is so much worse.

Welcome to the world of tomorrow!

l33tman4d ago

This was fixed long ago, if I'm thinking of the same bug. It was stuck in an inf loop all the time the codex window was open.

cncjvu74d ago

Nah it's still doing weird shit. Uninstalled that crapware last week.

xenator4d ago

I have exactly the same problem with Time Machine spinner on macOS. It even doesn't rotate.

Somewhere should be rare specialists with diploma who are capable of fixing such problems with waiting lists for years ahead.

hokkos4d ago

is it closed source ? i can see the rust code in repo contrary to the JS in claude code repo, are you mixing them up ?

nicce4d ago

Codex CLI is the main Rust code. There is Codex Desktop separately, using Electron and the same Codex CLI.

seviu4d ago

To be fair with Codex, you can use any harness you want with it. Access is not gatekeeper by a crappy full of slop electron app.

So just move to PI, or whatever.

Claude on the contrary, forces all plan users to use their horrible app, which, if you ever dared to use cowork, only once, will run a 2GB VM on app start, no f's given. at all.

Not justifying it. But if you use the official Codex app, thats on you. If you use the official Claude app, it's because you are forced to.

Sidenote unrelated to the post: since the Fable thing, and after serious thinking, I moved to open source models. I still have the basic OpenAI sub, but then easy lifting is now done elsewhere.

energy1234d ago

Let me guess, there's also a bug where they train on all our data?

varjag4d ago

They don't need to. You pay them for the privilege to do black box reinforcement learning already.

indiv04d ago· 12 in thread

This thread will become a typical "haha slop company made slop" but I've been bitten by a bug exactly like this before in a (pre-AI, artisan) OSS project. The maintainer there didn't properly account for DST when calculating last backup time, so the app started and never stopped writing/re-writing backups continuously.

Perhaps the framing shouldn't be "haha slop" but rather why doesn't the AI write better quality software than we do? To which the answer is obvious IMO -- even emergent properties can't elevate AI intelligence too far above the training dataset. So how do we get to superintelligent (or at least "not-wreck-your-NVMe-endurance-telligent") AI, if we, as a whole, are not smart enough ourselves?

Judge not the slop-bot, lest ye be judged yourself, engineer.

sleples4d ago

We've gone from "you're holding it wrong" to "the training data was bad because humans suck too". Difference is, humans learn from their mistakes.

SilverSlash4d ago

> Difference is, humans learn from their mistakes.

Great! So next time the human will prompt the agent to watch out for and avoid this bug.

ponector4d ago

You are a senior developer. Please do no mistakes!

xpct4d ago

Lack of accountability is the cause here. People don't think before hitting the 'Publish' button. Their managers let them off the hook because the culture still allows making egregious mistakes, as long as there's an LLM to blame.

applfanboysbgon4d ago

1. I bet that developer only made that mistake one time in their life. Humans learn from their mistakes, LLMs don't. If you rely on LLMs to generate all of your code, you can expect to run into the same issues again and again.

2. "One developer somewhere in the world made a bad mistake one time, so this represents the quality of all software devs everywhere". Maybe they were just a bad developer? Bad developers exist. I have never written a bug that has destroyed my users' hardware, and I think that writing such a bug is completely inexcusable in an enterprise environment with software that will be shipped to millions of users, as Codex is.

matharmin4d ago

LLMs do learn from mistakes. Not as directly from individual mistakes like humans do, but in aggregate the models have improved much more in the last year than most humans I know learn in the same time.

xpct4d ago

I don't like the reframing of 'learning from mistakes' from a human-like, near instantaneous feedback loop, to a year-long process of retraining on many traces collected from user data. They're different concepts and we should refer to them using different phrasing.

Y-bar4d ago

How many more times do I have to add variations of ”do not run any commands for the application without first entering the running container at `docker compose …`” to my AGENTS.md before it learns that node and phpunit is not available outside these containers?

lifthrasiir4d ago

> I have never written a bug that has destroyed my users' hardware, ...

Probably whoever (human or agent) originally decided to put TRACE logs into SQLite also thought---or reasoned---so. Maybe the decision was right at that time but the amount of TRACE logs have increased enormously. You will never know.

applfanboysbgon4d ago

I love that we've moved the goalposts from "LLMs are better than artisanal software engineers" to "actually, shipping hardware-destroying bugs in production is literally unavoidable, nobody could possibly avoid doing it".

1 more reply

da_grift_shift4d ago

What are your thoughts on the SNR of the linked GitHub issue threads? Consider the volume of comments posted and the substance of each comment.

fn-mote4d ago

I read the first page and they were excellent. Each was clearly written by an experienced dev who knows how to substantiate their claims and propose an acceptable fix that could just be merged.

Your comment, on the other hand, would be improved by including your own opinion on the matter.

2 more replies

neuralkoi4d ago· 10 in thread

Vibe coding takes "move fast and break things" to a whole nother level.

cryo324d ago

Yeah. Here I am sitting on a major incident at our company because someone’s vibe coded shit went seriously wrong.

Imustaskforhelp4d ago

Can you talk more in detail if possible and are allowed to do so?

I do know one instance of someone literally losing a job because they vibe-coded their way to prod. Their response/justification was: "The code wasn't written by me. It was written by Claude/Chatgpt"

They hadn't done anything to the database itself but you betcha that there are some horror stories involving database, lack of proper backups and Vibe-coding gone insanely wrong.

cryo324d ago

I can say very little in detail but basically Claude doesn’t have any conceptual idea of order of operations and transactional guarantees which resulted in producing something that failed under normal load. There is an evidence chain to suggest it was asked to do this but did not and that wasn’t verified.

Our engineers are accountable for what they produce regardless of how so they are cleaning up the extensive mess this made. This will result in a very heated post-mortem meeting between the two factions in the company.

flir4d ago

> "The code wasn't written by me. It was written by Claude/Chatgpt"

Culturally (across all LLM use, not just programming) we need to nip that in the bud. If we don't it's going to be the new "someone hacked my social media password" get out of jail free card for avoiding responsibility.

I don't care what tools you used, but if your name is on it, you're the author and the responsibility is yours. No "it wasn't me it was my typewriter" bullshit.

latexr4d ago

> Their response/justification was: "The code wasn't written by me. It was written by Claude/Chatgpt"

It boggles the mind someone could think that is a valid justification, because ultimately what they’re saying is “I’m useless, what you get from me is the same thing as prompting the model” which still means they would lose their job.

comboy4d ago

We are running out of things to break.

stavros4d ago

Make more things to break.

GL264d ago

as long as you don't have technical debt, vibe coding is mostly useful for prototyping. For a real product, true SWE will never be replaced

Otek4d ago

Already got replaced at world top tier tech jobs. „True SWE” will be niche / luxury soon, just like real woodworking vs IKEA

inigyou4d ago

Software is freely duplicable unlike wood. IKEA could be mass producing copies of the most beautiful chair in the world just as easily as it produces copies of something a 5-year-old drew in freecad.

tgtweak4d ago· 4 in thread

Slightly better than the claude code "feature" that deletes all your session context and transcripts older than 30 days old.

Mistredo3d ago

Codex has a similiar bug and makes chats disappear randomly.

qup4d ago

At least that's a decision vs a bug.

tgtweak4d ago

Not a decision that users are aware of though. Nor is there a setting to disable/change it. It just showed up one day and erased your previous sessions.

pdantix4d ago

> Nor is there a setting to disable/change it.

https://code.claude.com/docs/en/settings#available-settings

`cleanupPeriodDays` has always existed.

woadwarrior014d ago· 3 in thread

Someone posted a temporary workaround for this on X[1].

sqlite3 ~/.codex/logs_2.sqlite "CREATE TRIGGER IF NOT EXISTS block_log_inserts BEFORE INSERT ON logs BEGIN SELECT RAISE(IGNORE); END;"

Also, I found that running VACUUM FULL on the sqlite file on my laptop shrunk it from 27GB to a mere 73MB[2].

[1]: https://xcancel.com/bdsqlsz/status/2067964486615810369

[2]: https://xcancel.com/jeethu/status/2068087449469780434

sgarland4d ago

DB-level rules saving the day once again.

NamlchakKhandro4d ago

The real solution is to stop using it and switch to Pi

woadwarrior014d ago

I’ve been using oh-my-pi with GLM-5.2 xhigh as the main model and GPT-5.5 medium as its advisor model. IMO, the combo works better than either of those models alone.

i2km4d ago· 3 in thread

Shocking. Been open a week and AFAICT just silence from OpenAI. I just find it baffling. You'd think that these vendors would be very sensitive to this sort of issue. I mean, surely they have multiple agents hooked up to github monitoring potential issues and proposing fixes, right? ...right?

Surely it should be trivial for them to have their own tools spinning away trying to fix all the github issues in real time...

drakythe4d ago

They're pretty bad about fixing issues it seems. My favorite is #2472 which they demonstrated "fixing" on stage on the release of GPT 5, but the ticket is still open and the "fix" hasn't been merged. The original blog that flagged this fact https://blog.tymscar.com/posts/openaiunmergeddemo/ and the issue: https://github.com/openai/openai-python/issues/2472

lelandfe4d ago

Claude meanwhile just auto closes all issues because they simply don’t triage ~anything. There’s been countless instances of this horrifying issue created for over a year now: https://github.com/anthropics/claude-code/issues/16180

> Permission bypass when commands are chained with &&

At one point they fixed their auto stale bot closing bugs but, hey, guess that wasn’t long lived.

cl3misch4d ago

There have been Issues on Github about the same problem since April. I'm using Codex a lot and I'm very happy with its performance (UX and output), but it's baffling they haven't fixed this problem.

taspeotis4d ago· 3 in thread

OpenAI really snatched defeat from the jaws of victory late last year when Claude Code was a laggy mess.

Nowadays Codex has typing latency out of the gate, whereas Claude Code has the odd pause but generally displays my key presses as … you know … I press them.

kasey_junk4d ago

Fwiw I have the exact opposite experience.

christophilus4d ago

I find Claude Code nearly unusable. I always have to type in neovim if I’m typing anything more than a few words.

aquariusDue4d ago

It runs fine for me on an old ThinkPad X220 loaded with 8 GB, an i5 and a barely working SATA SSD. This is on Fedora and Claude Code is installed from Anthropic's dnf repo (the latest channel). Granted I'm on the Pro Plan and I'm not running lots of sub agents but the default terminal app from KDE (Konsole) renders and keeps Claude Code responsive enough.

I must be honestly missing some key piece of workflow otherwise I don't know why it would run so slow for other people on better hardware? Granted I'm taking care to tell Claude to not exhaust CPU cores and make sure to not trigger OOM errors, akin to "make no mistakes pls".

Imustaskforhelp4d ago· 3 in thread

I don't understand how Codex can blunder so badly. I imagine that even if they would be using vibe-coding, surely they must have some good engineers. So why is there such severe bugs?

One can argue that these products are the flagship products of their respective AI companies aside from the AI models themselves of course.

I imagine that this story will be picked up by the news left and right, some stories just feel this way and this one is like that (given 12 upvotes on HN in 7 minutes)

The only logical conclusion (from this incident) that I can have is: An (vibe-coded?) product is hard to maintain even for some of the best engineers and is bound to have severe bugs.

2. Proper testing and taking issues seriously is the key if you still wish to do this and there isn't much. This is a week old issue which I can only classify as severe.

I wish to keep an nuanced opinion about it but oh this is bad for openAI (not as bad as them accepting autonomous AI within drones and mass surveillance though)

My point is: AI has both uphills and downward valleys and cliffs. It might as well just accelerate you, which could be, towards your downfall as well. Its recommended to keep an eye while driving and not drive too fast.

AI companies might be like car companies which don't offer a brake pedal.

dathinab4d ago

> I don't understand how Codex can blunder so badly.

because they trust the AI too much (and seem to be fin with acting knowingly negligent)

the problem is

- AI tends to produces very convincing looking code, even if fully wrong

- AI does mistakes of kinds no human would do, at least no human who is also able to write convincing looking code

- code reviews are hard, a lot of devs, including senior devs, put a lot of implicit trust into the co-worker behaving "sane and non malicious". But AIs behave sometimes not so sane and in a way (wrt. trying to be convincing). In the worst case in ways which if it where a human you might consider to be them trying malicious sabotage the product

Like a "dump" example from work:

- AI randomly removes a HTML element id while doing other changes in jsx/react

- the PR has a lot of changes, the id removal line looks innocent, like some on the fly cleanup

- human reviewers have the bad tendency to often not look too much at deleted lines, only if they need reference to how a new line was before (but it's only a deleted line and no new line)

- you don't expect humans to randomly without reason delete important properties of components when changing other things

- you maybe would still have found it, but it's a emergency fix for a production issue

- it happens to miss integration tests, but happens to still matter a lot for one specific important for complicated reasons not properly tested flow (similar people tend to not test logging too much, at best the presence of needed info but hardly ever the absence of noise)

PunchyHamster4d ago

> I don't understand how Codex can blunder so badly. I imagine that even if they would be using vibe-coding, surely they must have some good engineers. So why is there such severe bugs?

Because it was deemed not Hard Enough task for real engineer to look at, so AI was sent to do it with no supervision, just checking the effects.

Also overly excessive logging is probably useful to them in chasing some of the edge cases, the cost to users doesn't matter in the slightest to them

supriyo-biswas4d ago

The truth of the matter is that any time that has been saved in writing the code must be spent on ensuring proper system design, reviewing the code, and most importantly of all, QA, which is an uncomfortable discussion for AI techbros who are peddling complete automation of the software profession.

purpleidea4d ago· 3 in thread

I want to like codex, but the quality is just not very good, especially when compared to Claude.

It used to work okay, but a while back they landed a major regression for an entire team of folks I work with.

No response, no workaround.

https://github.com/openai/codex/issues/23762

christophilus4d ago

I don’t trust any agent to respect any boundaries. They might today. But tomorrow’s vibe coded slip update might break it in subtle ways.

My solution to this is to only run agents in a sandbox of my own making (a locked down Podman container).

drakythe4d ago

They can't respect boundaries as long as those boundaries exist only in the LLM instruction set. A human being who follows rules long enough the rules will become second nature (usually), almost to the point where long running companies are known for having rules no one understands (Chesterton's Fence is alive and well).

But an LLM have a limited "memory" and while the instructions might land in there and be of sufficient priority to be "respected" a single instance of that memory getting too full or the LLM autocompleting the work around because that was the statistical "best" solution and any barriers that exist only in LLM instructions and not in hardcoded guards will evaporate like so much morning fog.

matheusmoreira4d ago

I went the full virtual machine route. Just finished hardening the setup and firewalling it off my local network. Not perfect but it does make me feel much safer.

ares6234d ago· 2 in thread

i hope they find the smoking gun, the key insight, the kicker.

59nadir4d ago

Then they can apply a clean solve, the cleanest solution.

It's fascinating how offensive some of this verbiage becomes to you when you see it attached to LLM output too much.

jofzar4d ago

Ugh this one's gets me so bad, same with "wire" and "wired" everything is wired to something.

altcognito4d ago· 2 in thread

I think part of the question should be, why is there no QA or test that catches this? It's one thing to be slopware, but why didn't anything run a test that catches this?

theowaway2134564d ago

Every time you write a test that handles some data, you write an assertion about how much data is handled?

Come on, this is such an easy thing to forget to test. Don't act like there is some magical testing strategy that would have caught this

altcognito4d ago

I'll acknowledge that this is probably not likely to get caught.

Integration testing could/should catch this, especially for a client side app.

A simple constraints is a good thing. "Our app shouldn't use more than 50mb of ram, or use 3gb of disk space."

hun34d ago· 2 in thread

The operating system has historically trusted the applications not to do dumb things too much.

Only now we're witnessing the consequences much more frequently thanks to accelerated slop.

skydhash4d ago

> The operating system has historically trusted the applications not to do dumb things too much.

The OS is a thin layer providing an abstract and consistent interface regardless of the hardware configuration. Policing applications is mostly related to security and resources utilization, not moronic errors.

hun34d ago

> The OS is a thin layer providing an abstract and consistent interface regardless of the hardware configuration.

This is called a hardware abstraction layer, not OS.

https://en.wikipedia.org/wiki/Hardware_abstraction

christophilus4d ago· 1 in thread

Well, everyone's bashing on OpenAI as well they should, but just a reminder, unlike Claude Code, Codex is officially available to customize here: https://github.com/openai/codex

It's fairly easy to patch.

redox994d ago

That's the CLI, not the codex app which is proprietary.

jofzar4d ago· 1 in thread

This is actually such a classic blunder (shipping trace/debug logging on for everything), but funnily the impact is not in a normal way.

It's crazy we have hit a point where memory, CPU speed and disk speed isn't getting clapped because a Dev shipped logging at trace level instead of what used to the application being catastrophically slow so its immediately fixed in the next update.

kuekacang4d ago

It helps too that agent work is done server side so you can hog all the local resources for your thin client.

rvz4d ago· 1 in thread

The first of many bugs that are beyond the complexity of its authors, thanks to comprehension debt.

Even with tests, the more complex the code base is, the more risky it is to vibe-code on it without introducing more bugs [0] and increasing the debt. Does not matter if the CI is green or if all the tests pass.

It gets even worse if you can't explain the change / pull request or what the implications are after applying that "suggested" fix.

[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...

HPsquared4d ago

There are going to be sooooo many consulting opportunities after this wave.

1 more reply

fc-oai4d ago· 1 in thread

Hey everyone, I'm an engineer at OpenAI. Thanks for the discussion here. Just wanted to report that a fix for this issue has been published in an update to the CLI and Codex App.

tdehnke4d ago

Looks like the fix doesn't work for all users from the comments on Github, please verify.

ramon1564d ago· 1 in thread

Blegh, I puke every time I see obviously AI generated comments in GH PR's. You cannot assume any of these people have done their research, other than telling Codex to do it for them

b--l4d ago

It's because they use gpt-5.5-xhigh (the money making* model) to build it.

(*for them)

dundercoder4d ago· 1 in thread

If something like this is helpful or necessary, that’s what ram backed tmpfs is for.

mrweasel4d ago

Using a RAM backed tmpfs would be a work-around as to not trash your SSD. It's doesn't fix underlying problem. It's incredibly poor design on OpenAIs part.

xfgong4d ago· 1 in thread

Same issue with Claude Code btw — it writes massive debug logs to ~/.claude/logs. Had to symlink it to a tmpfs to stop wearing out my SSD.

eddyfromtheblok4d ago

I don't see this. According to their docs, logs are no longer written: https://code.claude.com/docs/en/claude-directory

consp4d ago· 1 in thread

Why didn't the review process spot this obvious error? Oh wait ... @codex review this

Forgeties794d ago

“Make no mistakes”

Damn I’m good!

ewsbr4d ago

Looks like this was fixed[0], so it should land in the next release.

[0] https://github.com/openai/codex/commit/e98d43ac372ddf7f513c0...

collabs4d ago

This is a little off topic but

These guys really need to stop polluting the root folder of repo with Claude dot MD and copilot dot MD. Get in a room together and decide on a well known folder structure like docs/llm/*

bravetraveler4d ago

Somebody please donate some tokens to this plucky startup, they need our help.

robeym3d ago

I'm on 0.139.0 on Linux and my visible ~/.codex/logs are only about 129 MB.

This makes me even more conservative on upgrading these tools every time they prompt me to. Better to let them get a few miles on them and see how the community responds.

In this case it sounds like 0.142.0 reduced the issue but didn't fully settle it. I'll wait for 0.143.0+ and see if that version is more acceptable.

collabs4d ago

I feel vindicated in my admittedly seemingly masochistic ritual of copy pasting code from the web browser to visual studio code even though I don't always pore over every line.

joelthelion4d ago

A good moment to switch to an open solution like opencode or pi.

sigbottle4d ago

I have noticed absurd lag from the browser usage and sometimes complete bricking of my network too on my computer. I thought it was just my computer getting old, but possibly it's ChatGPT.

bob10294d ago

I'm struggling with how this much logging information could be generated at any level of verbosity. Is codex writing log entries while it's sitting idle? Why would someone want to look at these logs?

g42gregory4d ago

I wonder if this falls into a "coding is solved" category?

I start getting good results with Oh-My-Pi and Pi Coding Agents.

linzhangrun4d ago

Considering the current storage prices and the SSDs whose lifespan you would thus exhaust...

taosu_la4d ago

Can someone tell me if the current sub-agent of codex is available now? There used to always be a spinning issue.

whalesalad4d ago

Yikes. I have a habit of leaving sessions open for a long time. I just ran `sudo iotop` to watch live disk activity and sure enough all my idle codex sessions were spinning away writing god knows what constantly to disk.

jackbucks4d ago

HAHAHAHAHAHAHA

j / k navigate · click thread line to collapse

269 comments

113 comments · 33 top-level

b--l4d ago· 25 in thread

Codex is one of the most infamous examples of slopware. Just having the window unhidden on my mac will cause it to use 100% of the GPU displaying the spinner message.

THE SPINNER MESSAGE CAUSES 100% GPU USAGE ON AN MBP M5!!

So any time you're waiting on the model (which is 90% of the time), your fans will be blasting (careful, don't use it on battery).

The issue is on github and close to 6 months old. Probably since the release of vibe coded junk. I would literally fix it myself but it's closed source for whatever reason.

It's a tragedy. The world needs competition to anthropic.

jofzar4d ago

> Codex is one of the most infamous examples of slopware

Woah, let's not forget Claude code is right there

varjag4d ago

Right, just yesterday I found my laptop kinda hot. And what do you think, it was good old Claude deciding to load a few cores with completely idling prompts.

kokada4d ago

Not that Claude Code is much better, I just hit this issue[1] because it seems setting DO_NOT_TRACK=1 seems enough to get a really strange behavior in the newest versions of CC.

[1]: https://github.com/anthropics/claude-code/issues/69238#issue...

Edit: I think I misunderstood OP, they're saying that CC is even worse and not better than Codex CLI.

mvATM994d ago

Yeah exactly.

I'm not exactly building TUI's every day, but even i felt pain when i read that "small game engine" post

r_lee4d ago

if we are at 10x with AI and near AGI or ASI, then how is it possible that these products (Codex, Claude Code CLI) are still such garbage?

shouldn't this "agentic AI revolution" have long solved this already?

no way they're over there saying "we are on it plz wait" or that "it's too much effort"?

hombre_fatal4d ago

Even with AI, you still need attention to detail and TLC to polish software, something that's always in short supply.

igleria4d ago

CharlieDigital4d ago

(I want Codex to implement MCP Prompts because then we have one central way to ship skills from a server).

jeffybefffy5194d ago

Because vibe coding is a toy… thats the secret.

fg1374d ago

You are asking too many good questions.

user439284d ago

The products generally work just fine on my MacBook.

I have not encountered major issues in either the Claude Code CLI, the Codex Desktop app, or Claude Desktop app.

They generally get the job done. I don't measure disk writes or analyze the GPU usage.

Zababa4d ago

nicce4d ago

Not only Codex, but I can't leave ChatGPT app in macOS open for few hours, because it will consume 60 gigabytes of RAM over time and crashes all the apps.

Mindboggling. Or can't use Google's AI Studio in browser because it takes 100% CPU.

Need to write own app for everything???

porridgeraisin4d ago

the damn chat.openai.com webapp lags a lot as well on long chats, typing takes so long.

xpct4d ago

Well thank you for your service. I thought about trying out Codex after the disaster that is Claude Code. I'll be fine without either one on my machine

jofzar4d ago

Imo codex is significantly better then Claude code for me ATM.

comboy4d ago

I mean, Codex CLI is really bad. But Claude's CLI is so much worse.

Welcome to the world of tomorrow!

l33tman4d ago

This was fixed long ago, if I'm thinking of the same bug. It was stuck in an inf loop all the time the codex window was open.

cncjvu74d ago

Nah it's still doing weird shit. Uninstalled that crapware last week.

xenator4d ago

I have exactly the same problem with Time Machine spinner on macOS. It even doesn't rotate.

Somewhere should be rare specialists with diploma who are capable of fixing such problems with waiting lists for years ahead.

hokkos4d ago

is it closed source ? i can see the rust code in repo contrary to the JS in claude code repo, are you mixing them up ?

nicce4d ago

Codex CLI is the main Rust code. There is Codex Desktop separately, using Electron and the same Codex CLI.

seviu4d ago

To be fair with Codex, you can use any harness you want with it. Access is not gatekeeper by a crappy full of slop electron app.

So just move to PI, or whatever.

Claude on the contrary, forces all plan users to use their horrible app, which, if you ever dared to use cowork, only once, will run a 2GB VM on app start, no f's given. at all.

Not justifying it. But if you use the official Codex app, thats on you. If you use the official Claude app, it's because you are forced to.

Sidenote unrelated to the post: since the Fable thing, and after serious thinking, I moved to open source models. I still have the basic OpenAI sub, but then easy lifting is now done elsewhere.

energy1234d ago

Let me guess, there's also a bug where they train on all our data?

varjag4d ago

They don't need to. You pay them for the privilege to do black box reinforcement learning already.

indiv04d ago· 12 in thread

Judge not the slop-bot, lest ye be judged yourself, engineer.

sleples4d ago

We've gone from "you're holding it wrong" to "the training data was bad because humans suck too". Difference is, humans learn from their mistakes.

SilverSlash4d ago

> Difference is, humans learn from their mistakes.

Great! So next time the human will prompt the agent to watch out for and avoid this bug.

ponector4d ago

You are a senior developer. Please do no mistakes!

xpct4d ago

applfanboysbgon4d ago

matharmin4d ago

xpct4d ago

Y-bar4d ago

lifthrasiir4d ago

> I have never written a bug that has destroyed my users' hardware, ...

applfanboysbgon4d ago

1 more reply

da_grift_shift4d ago

What are your thoughts on the SNR of the linked GitHub issue threads? Consider the volume of comments posted and the substance of each comment.

fn-mote4d ago

I read the first page and they were excellent. Each was clearly written by an experienced dev who knows how to substantiate their claims and propose an acceptable fix that could just be merged.

Your comment, on the other hand, would be improved by including your own opinion on the matter.

2 more replies

neuralkoi4d ago· 10 in thread

Vibe coding takes "move fast and break things" to a whole nother level.

cryo324d ago

Yeah. Here I am sitting on a major incident at our company because someone’s vibe coded shit went seriously wrong.

Imustaskforhelp4d ago

Can you talk more in detail if possible and are allowed to do so?

I do know one instance of someone literally losing a job because they vibe-coded their way to prod. Their response/justification was: "The code wasn't written by me. It was written by Claude/Chatgpt"

They hadn't done anything to the database itself but you betcha that there are some horror stories involving database, lack of proper backups and Vibe-coding gone insanely wrong.

cryo324d ago

flir4d ago

> "The code wasn't written by me. It was written by Claude/Chatgpt"

I don't care what tools you used, but if your name is on it, you're the author and the responsibility is yours. No "it wasn't me it was my typewriter" bullshit.

latexr4d ago

> Their response/justification was: "The code wasn't written by me. It was written by Claude/Chatgpt"

comboy4d ago

We are running out of things to break.

stavros4d ago

Make more things to break.

GL264d ago

as long as you don't have technical debt, vibe coding is mostly useful for prototyping. For a real product, true SWE will never be replaced

Otek4d ago

Already got replaced at world top tier tech jobs. „True SWE” will be niche / luxury soon, just like real woodworking vs IKEA

inigyou4d ago

Software is freely duplicable unlike wood. IKEA could be mass producing copies of the most beautiful chair in the world just as easily as it produces copies of something a 5-year-old drew in freecad.

tgtweak4d ago· 4 in thread

Slightly better than the claude code "feature" that deletes all your session context and transcripts older than 30 days old.

Mistredo3d ago

Codex has a similiar bug and makes chats disappear randomly.

qup4d ago

At least that's a decision vs a bug.

tgtweak4d ago

Not a decision that users are aware of though. Nor is there a setting to disable/change it. It just showed up one day and erased your previous sessions.

pdantix4d ago

> Nor is there a setting to disable/change it.

https://code.claude.com/docs/en/settings#available-settings

`cleanupPeriodDays` has always existed.

woadwarrior014d ago· 3 in thread

Someone posted a temporary workaround for this on X[1].

sqlite3 ~/.codex/logs_2.sqlite "CREATE TRIGGER IF NOT EXISTS block_log_inserts BEFORE INSERT ON logs BEGIN SELECT RAISE(IGNORE); END;"

Also, I found that running VACUUM FULL on the sqlite file on my laptop shrunk it from 27GB to a mere 73MB[2].

[1]: https://xcancel.com/bdsqlsz/status/2067964486615810369

[2]: https://xcancel.com/jeethu/status/2068087449469780434

sgarland4d ago

DB-level rules saving the day once again.

NamlchakKhandro4d ago

The real solution is to stop using it and switch to Pi

woadwarrior014d ago

I’ve been using oh-my-pi with GLM-5.2 xhigh as the main model and GPT-5.5 medium as its advisor model. IMO, the combo works better than either of those models alone.

i2km4d ago· 3 in thread

Surely it should be trivial for them to have their own tools spinning away trying to fix all the github issues in real time...

drakythe4d ago

lelandfe4d ago

> Permission bypass when commands are chained with &&

At one point they fixed their auto stale bot closing bugs but, hey, guess that wasn’t long lived.

cl3misch4d ago

There have been Issues on Github about the same problem since April. I'm using Codex a lot and I'm very happy with its performance (UX and output), but it's baffling they haven't fixed this problem.

taspeotis4d ago· 3 in thread

OpenAI really snatched defeat from the jaws of victory late last year when Claude Code was a laggy mess.

Nowadays Codex has typing latency out of the gate, whereas Claude Code has the odd pause but generally displays my key presses as … you know … I press them.

kasey_junk4d ago

Fwiw I have the exact opposite experience.

christophilus4d ago

I find Claude Code nearly unusable. I always have to type in neovim if I’m typing anything more than a few words.

aquariusDue4d ago

Imustaskforhelp4d ago· 3 in thread

I don't understand how Codex can blunder so badly. I imagine that even if they would be using vibe-coding, surely they must have some good engineers. So why is there such severe bugs?

One can argue that these products are the flagship products of their respective AI companies aside from the AI models themselves of course.

I imagine that this story will be picked up by the news left and right, some stories just feel this way and this one is like that (given 12 upvotes on HN in 7 minutes)

The only logical conclusion (from this incident) that I can have is: An (vibe-coded?) product is hard to maintain even for some of the best engineers and is bound to have severe bugs.

2. Proper testing and taking issues seriously is the key if you still wish to do this and there isn't much. This is a week old issue which I can only classify as severe.

I wish to keep an nuanced opinion about it but oh this is bad for openAI (not as bad as them accepting autonomous AI within drones and mass surveillance though)

AI companies might be like car companies which don't offer a brake pedal.

dathinab4d ago

> I don't understand how Codex can blunder so badly.

because they trust the AI too much (and seem to be fin with acting knowingly negligent)

the problem is

- AI tends to produces very convincing looking code, even if fully wrong

- AI does mistakes of kinds no human would do, at least no human who is also able to write convincing looking code

Like a "dump" example from work:

- AI randomly removes a HTML element id while doing other changes in jsx/react

- the PR has a lot of changes, the id removal line looks innocent, like some on the fly cleanup

- human reviewers have the bad tendency to often not look too much at deleted lines, only if they need reference to how a new line was before (but it's only a deleted line and no new line)

- you don't expect humans to randomly without reason delete important properties of components when changing other things

- you maybe would still have found it, but it's a emergency fix for a production issue

PunchyHamster4d ago

> I don't understand how Codex can blunder so badly. I imagine that even if they would be using vibe-coding, surely they must have some good engineers. So why is there such severe bugs?

Because it was deemed not Hard Enough task for real engineer to look at, so AI was sent to do it with no supervision, just checking the effects.

Also overly excessive logging is probably useful to them in chasing some of the edge cases, the cost to users doesn't matter in the slightest to them

supriyo-biswas4d ago

purpleidea4d ago· 3 in thread

I want to like codex, but the quality is just not very good, especially when compared to Claude.

It used to work okay, but a while back they landed a major regression for an entire team of folks I work with.

No response, no workaround.

https://github.com/openai/codex/issues/23762

christophilus4d ago

I don’t trust any agent to respect any boundaries. They might today. But tomorrow’s vibe coded slip update might break it in subtle ways.

My solution to this is to only run agents in a sandbox of my own making (a locked down Podman container).

drakythe4d ago

matheusmoreira4d ago

I went the full virtual machine route. Just finished hardening the setup and firewalling it off my local network. Not perfect but it does make me feel much safer.

ares6234d ago· 2 in thread

i hope they find the smoking gun, the key insight, the kicker.

59nadir4d ago

Then they can apply a clean solve, the cleanest solution.

It's fascinating how offensive some of this verbiage becomes to you when you see it attached to LLM output too much.

jofzar4d ago

Ugh this one's gets me so bad, same with "wire" and "wired" everything is wired to something.

altcognito4d ago· 2 in thread

I think part of the question should be, why is there no QA or test that catches this? It's one thing to be slopware, but why didn't anything run a test that catches this?

theowaway2134564d ago

Every time you write a test that handles some data, you write an assertion about how much data is handled?

Come on, this is such an easy thing to forget to test. Don't act like there is some magical testing strategy that would have caught this

altcognito4d ago

I'll acknowledge that this is probably not likely to get caught.

Integration testing could/should catch this, especially for a client side app.

A simple constraints is a good thing. "Our app shouldn't use more than 50mb of ram, or use 3gb of disk space."

hun34d ago· 2 in thread

The operating system has historically trusted the applications not to do dumb things too much.

Only now we're witnessing the consequences much more frequently thanks to accelerated slop.

skydhash4d ago

> The operating system has historically trusted the applications not to do dumb things too much.

hun34d ago

> The OS is a thin layer providing an abstract and consistent interface regardless of the hardware configuration.

This is called a hardware abstraction layer, not OS.

https://en.wikipedia.org/wiki/Hardware_abstraction

christophilus4d ago· 1 in thread

Well, everyone's bashing on OpenAI as well they should, but just a reminder, unlike Claude Code, Codex is officially available to customize here: https://github.com/openai/codex

It's fairly easy to patch.

redox994d ago

That's the CLI, not the codex app which is proprietary.

jofzar4d ago· 1 in thread

This is actually such a classic blunder (shipping trace/debug logging on for everything), but funnily the impact is not in a normal way.

kuekacang4d ago

It helps too that agent work is done server side so you can hog all the local resources for your thin client.

rvz4d ago· 1 in thread

The first of many bugs that are beyond the complexity of its authors, thanks to comprehension debt.

It gets even worse if you can't explain the change / pull request or what the implications are after applying that "suggested" fix.

[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...

HPsquared4d ago

There are going to be sooooo many consulting opportunities after this wave.

1 more reply

fc-oai4d ago· 1 in thread

Hey everyone, I'm an engineer at OpenAI. Thanks for the discussion here. Just wanted to report that a fix for this issue has been published in an update to the CLI and Codex App.

tdehnke4d ago

Looks like the fix doesn't work for all users from the comments on Github, please verify.

ramon1564d ago· 1 in thread

Blegh, I puke every time I see obviously AI generated comments in GH PR's. You cannot assume any of these people have done their research, other than telling Codex to do it for them

b--l4d ago

It's because they use gpt-5.5-xhigh (the money making* model) to build it.

(*for them)

dundercoder4d ago· 1 in thread

If something like this is helpful or necessary, that’s what ram backed tmpfs is for.

mrweasel4d ago

Using a RAM backed tmpfs would be a work-around as to not trash your SSD. It's doesn't fix underlying problem. It's incredibly poor design on OpenAIs part.

xfgong4d ago· 1 in thread

Same issue with Claude Code btw — it writes massive debug logs to ~/.claude/logs. Had to symlink it to a tmpfs to stop wearing out my SSD.

eddyfromtheblok4d ago

I don't see this. According to their docs, logs are no longer written: https://code.claude.com/docs/en/claude-directory

consp4d ago· 1 in thread

Why didn't the review process spot this obvious error? Oh wait ... @codex review this

Forgeties794d ago

“Make no mistakes”

Damn I’m good!

ewsbr4d ago

Looks like this was fixed[0], so it should land in the next release.

[0] https://github.com/openai/codex/commit/e98d43ac372ddf7f513c0...

collabs4d ago

This is a little off topic but

These guys really need to stop polluting the root folder of repo with Claude dot MD and copilot dot MD. Get in a room together and decide on a well known folder structure like docs/llm/*

bravetraveler4d ago

Somebody please donate some tokens to this plucky startup, they need our help.

robeym3d ago

I'm on 0.139.0 on Linux and my visible ~/.codex/logs are only about 129 MB.

This makes me even more conservative on upgrading these tools every time they prompt me to. Better to let them get a few miles on them and see how the community responds.

In this case it sounds like 0.142.0 reduced the issue but didn't fully settle it. I'll wait for 0.143.0+ and see if that version is more acceptable.

collabs4d ago

I feel vindicated in my admittedly seemingly masochistic ritual of copy pasting code from the web browser to visual studio code even though I don't always pore over every line.

joelthelion4d ago

A good moment to switch to an open solution like opencode or pi.

sigbottle4d ago

I have noticed absurd lag from the browser usage and sometimes complete bricking of my network too on my computer. I thought it was just my computer getting old, but possibly it's ChatGPT.

bob10294d ago

I'm struggling with how this much logging information could be generated at any level of verbosity. Is codex writing log entries while it's sitting idle? Why would someone want to look at these logs?

g42gregory4d ago

I wonder if this falls into a "coding is solved" category?

I start getting good results with Oh-My-Pi and Pi Coding Agents.

linzhangrun4d ago

Considering the current storage prices and the SSDs whose lifespan you would thus exhaust...

taosu_la4d ago

Can someone tell me if the current sub-agent of codex is available now? There used to always be a spinning issue.

whalesalad4d ago

jackbucks4d ago

HAHAHAHAHAHAHA

j / k navigate · click thread line to collapse