THE SPINNER MESSAGE CAUSES 100% GPU USAGE ON AN MBP M5!!
So any time you're waiting on the model (which is 90% of the time), your fans will be blasting (careful, don't use it on battery).
The issue is on github and close to 6 months old. Probably since the release of vibe coded junk. I would literally fix it myself but it's closed source for whatever reason.
There are many discussions about which model is better, or if vibe coding is even possible. I point you to the extent of what one of the most well funded, money flush, well staffed model making companies can do with vibe coding.
To me a screwup this bad (where the CEO has already made it clear they're now "focussing on coding") indicates that there's something truly broken in the company. No one on polymarket expects them to have a leading model any time soon for example.
It's a tragedy. The world needs competition to anthropic.
Woah, let's not forget Claude code is right there
[1]: https://github.com/anthropics/claude-code/issues/69238#issue...
Edit: I think I misunderstood OP, they're saying that CC is even worse and not better than Codex CLI.
I'm not exactly building TUI's every day, but even i felt pain when i read that "small game engine" post
shouldn't this "agentic AI revolution" have long solved this already?
no way they're over there saying "we are on it plz wait" or that "it's too much effort"?
(I want Codex to implement MCP Prompts because then we have one central way to ship skills from a server).
The fact that neither platform can implement a protocol given what is functionally infinite frontier model tokens really says a lot. I do not care what kind of random project some influencer can ship with a swarm of 1000 agents. If you cannot make the basics work, it is a farce.
You can use it to accelerate development certainly, but that requires careful change->review cycles. The developer still needs to be in heavy control, versus vibe coding having an agent own the code base.
I have not encountered major issues in either the Claude Code CLI, the Codex Desktop app, or Claude Desktop app.
They generally get the job done. I don't measure disk writes or analyze the GPU usage.
Mindboggling. Or can't use Google's AI Studio in browser because it takes 100% CPU.
Need to write own app for everything???
Somewhere should be rare specialists with diploma who are capable of fixing such problems with waiting lists for years ahead.
So just move to PI, or whatever.
Claude on the contrary, forces all plan users to use their horrible app, which, if you ever dared to use cowork, only once, will run a 2GB VM on app start, no f's given. at all.
Not justifying it. But if you use the official Codex app, thats on you. If you use the official Claude app, it's because you are forced to.
Sidenote unrelated to the post: since the Fable thing, and after serious thinking, I moved to open source models. I still have the basic OpenAI sub, but then easy lifting is now done elsewhere.
Perhaps the framing shouldn't be "haha slop" but rather why doesn't the AI write better quality software than we do? To which the answer is obvious IMO -- even emergent properties can't elevate AI intelligence too far above the training dataset. So how do we get to superintelligent (or at least "not-wreck-your-NVMe-endurance-telligent") AI, if we, as a whole, are not smart enough ourselves?
Judge not the slop-bot, lest ye be judged yourself, engineer.
Great! So next time the human will prompt the agent to watch out for and avoid this bug.
2. "One developer somewhere in the world made a bad mistake one time, so this represents the quality of all software devs everywhere". Maybe they were just a bad developer? Bad developers exist. I have never written a bug that has destroyed my users' hardware, and I think that writing such a bug is completely inexcusable in an enterprise environment with software that will be shipped to millions of users, as Codex is.
Probably whoever (human or agent) originally decided to put TRACE logs into SQLite also thought---or reasoned---so. Maybe the decision was right at that time but the amount of TRACE logs have increased enormously. You will never know.
Your comment, on the other hand, would be improved by including your own opinion on the matter.
I do know one instance of someone literally losing a job because they vibe-coded their way to prod. Their response/justification was: "The code wasn't written by me. It was written by Claude/Chatgpt"
They hadn't done anything to the database itself but you betcha that there are some horror stories involving database, lack of proper backups and Vibe-coding gone insanely wrong.
Our engineers are accountable for what they produce regardless of how so they are cleaning up the extensive mess this made. This will result in a very heated post-mortem meeting between the two factions in the company.
Culturally (across all LLM use, not just programming) we need to nip that in the bud. If we don't it's going to be the new "someone hacked my social media password" get out of jail free card for avoiding responsibility.
I don't care what tools you used, but if your name is on it, you're the author and the responsibility is yours. No "it wasn't me it was my typewriter" bullshit.
It boggles the mind someone could think that is a valid justification, because ultimately what they’re saying is “I’m useless, what you get from me is the same thing as prompting the model” which still means they would lose their job.
https://code.claude.com/docs/en/settings#available-settings
`cleanupPeriodDays` has always existed.
sqlite3 ~/.codex/logs_2.sqlite "CREATE TRIGGER IF NOT EXISTS block_log_inserts BEFORE INSERT ON logs BEGIN SELECT RAISE(IGNORE); END;"
Also, I found that running VACUUM FULL on the sqlite file on my laptop shrunk it from 27GB to a mere 73MB[2].
Surely it should be trivial for them to have their own tools spinning away trying to fix all the github issues in real time...
> Permission bypass when commands are chained with &&
At one point they fixed their auto stale bot closing bugs but, hey, guess that wasn’t long lived.
Nowadays Codex has typing latency out of the gate, whereas Claude Code has the odd pause but generally displays my key presses as … you know … I press them.
I must be honestly missing some key piece of workflow otherwise I don't know why it would run so slow for other people on better hardware? Granted I'm taking care to tell Claude to not exhaust CPU cores and make sure to not trigger OOM errors, akin to "make no mistakes pls".
One can argue that these products are the flagship products of their respective AI companies aside from the AI models themselves of course.
I imagine that this story will be picked up by the news left and right, some stories just feel this way and this one is like that (given 12 upvotes on HN in 7 minutes)
The only logical conclusion (from this incident) that I can have is: An (vibe-coded?) product is hard to maintain even for some of the best engineers and is bound to have severe bugs.
2. Proper testing and taking issues seriously is the key if you still wish to do this and there isn't much. This is a week old issue which I can only classify as severe.
I wish to keep an nuanced opinion about it but oh this is bad for openAI (not as bad as them accepting autonomous AI within drones and mass surveillance though)
My point is: AI has both uphills and downward valleys and cliffs. It might as well just accelerate you, which could be, towards your downfall as well. Its recommended to keep an eye while driving and not drive too fast.
AI companies might be like car companies which don't offer a brake pedal.
because they trust the AI too much (and seem to be fin with acting knowingly negligent)
the problem is
- AI tends to produces very convincing looking code, even if fully wrong
- AI does mistakes of kinds no human would do, at least no human who is also able to write convincing looking code
- code reviews are hard, a lot of devs, including senior devs, put a lot of implicit trust into the co-worker behaving "sane and non malicious". But AIs behave sometimes not so sane and in a way (wrt. trying to be convincing). In the worst case in ways which if it where a human you might consider to be them trying malicious sabotage the product
Like a "dump" example from work:
- AI randomly removes a HTML element id while doing other changes in jsx/react
- the PR has a lot of changes, the id removal line looks innocent, like some on the fly cleanup
- human reviewers have the bad tendency to often not look too much at deleted lines, only if they need reference to how a new line was before (but it's only a deleted line and no new line)
- you don't expect humans to randomly without reason delete important properties of components when changing other things
- you maybe would still have found it, but it's a emergency fix for a production issue
- it happens to miss integration tests, but happens to still matter a lot for one specific important for complicated reasons not properly tested flow (similar people tend to not test logging too much, at best the presence of needed info but hardly ever the absence of noise)
Because it was deemed not Hard Enough task for real engineer to look at, so AI was sent to do it with no supervision, just checking the effects.
Also overly excessive logging is probably useful to them in chasing some of the edge cases, the cost to users doesn't matter in the slightest to them
It used to work okay, but a while back they landed a major regression for an entire team of folks I work with.
No response, no workaround.
My solution to this is to only run agents in a sandbox of my own making (a locked down Podman container).
But an LLM have a limited "memory" and while the instructions might land in there and be of sufficient priority to be "respected" a single instance of that memory getting too full or the LLM autocompleting the work around because that was the statistical "best" solution and any barriers that exist only in LLM instructions and not in hardcoded guards will evaporate like so much morning fog.
Come on, this is such an easy thing to forget to test. Don't act like there is some magical testing strategy that would have caught this
Integration testing could/should catch this, especially for a client side app.
A simple constraints is a good thing. "Our app shouldn't use more than 50mb of ram, or use 3gb of disk space."
Only now we're witnessing the consequences much more frequently thanks to accelerated slop.
The OS is a thin layer providing an abstract and consistent interface regardless of the hardware configuration. Policing applications is mostly related to security and resources utilization, not moronic errors.
This is called a hardware abstraction layer, not OS.
It's fairly easy to patch.
It's crazy we have hit a point where memory, CPU speed and disk speed isn't getting clapped because a Dev shipped logging at trace level instead of what used to the application being catastrophically slow so its immediately fixed in the next update.
Even with tests, the more complex the code base is, the more risky it is to vibe-code on it without introducing more bugs [0] and increasing the debt. Does not matter if the CI is green or if all the tests pass.
It gets even worse if you can't explain the change / pull request or what the implications are after applying that "suggested" fix.
[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...
(*for them)
Damn I’m good!
[0] https://github.com/openai/codex/commit/e98d43ac372ddf7f513c0...
These guys really need to stop polluting the root folder of repo with Claude dot MD and copilot dot MD. Get in a room together and decide on a well known folder structure like docs/llm/*
This makes me even more conservative on upgrading these tools every time they prompt me to. Better to let them get a few miles on them and see how the community responds.
In this case it sounds like 0.142.0 reduced the issue but didn't fully settle it. I'll wait for 0.143.0+ and see if that version is more acceptable.
I start getting good results with Oh-My-Pi and Pi Coding Agents.