Codex for almost everything (opens in new tab)

(openai.com)

1001 pointsmikeevans25d ago559 comments

559 comments

My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

i.e. agents for knowledge workers who are not software engineers

A few thoughts and questions:

1. I expect that this set of products will be extremely disruptive to many software businesses. It's like when a new VP joins a company, they often rip and replace some of the software vendors with their personal favorites. Well, most software was designed for human users. Now, peoples' agents will use software for them. Agents have different needs for software than humans do. Some they'll need more of, much they'll no longer need at all. What will this result in? It feels like a much swifter and more significant version of Google taking excerpts/summaries from webpages and putting it at the top of search results and taking away visits and ad revenue from sites.

2. I've tried dozens of products in this space. For most, onboarding is confusing, then the user gets dropped into a blank space, usage limits are uncompetitive compared to the subsidized tokens offered by OpenAI/Anthropic, etc. It's a tough space to compete in, but also clearly going to be a massive market. I'm expecting big investment from Microsoft, Google etc in this segment.

3. How will startups in this space compete against labs who can train models to fit their products?

4. Eventually will the UI/interface be generated/personalized for the user, by the model? Presumably. Harnesses get eaten by model-generated harnesses?

A few more thoughts collected here: https://chrisbarber.co/professional-agents/

Products I've tried: ai browsers like dia, comet, claude for chrome, atlas, and dex; claw products like openclaw, kimi claw, klaus, viktor, duet, atris; automation things like tasklet and lindy; code agents like devin, claude code, cursor, codex; desktop automation tools like vercept, nox, liminary, logical, and raycast; and email products like shortwave, cora and jace. And of course, Claude Cowork, Codex cli and app, and Claude Code cli and app.

Edit: Notes on trying the new Codex update

1. The permissions workflow is very slick

2. Background browser testing is nice and the shadow cursor is an interesting UI element. It did do some things in the foreground for me / take control of focus, a few times, though.

3. It would be nice if the apps had quick ways to demo their new features. My workflow was to ask an LLM to read the update page and ask it what new things I could test, and then to take those things and ask Codex to demo them to me, but it doesn't quite understand it's own new features well enough to invoke them (without quite a bit of steering)

4. I cannot get it to show me the in app browser

5. Generating image mockups of websites and then building them is nice

postalcoder25d ago

I agree with the sentiment but I think for normie agents to take off in the way that you expect, you're going to have to grant them with full access. But, by granting agents full access, you immediately turn the computer into an extremely adversarial device insofar as txt files become credible threat vectors.

For all the benefits that agents offer, they can be asymmetrically harmful. This is not a solved issue. That hurts growth. I don't disagree with your general points, though.

avaer25d ago

> for normie agents to take off in the way that you expect, you're going to have to grant them with full access

At this point it's a foregone conclusion this is what users will choose. It'll be like (lack of) privacy on the internet caused by the ad industrial complex, but much worse and much more invasive.

The threats are real, but it's just a product opportunity to these companies. OpenAI and friends will sell the poison (insecure computing) and the antidote (Mythos et all) and eat from both ends.

Anyone trying to stay safe will be on the gradient to a Stallmanesque monastic computing existence.

I don't want this, I just think it's going down that route.

8 more replies

cjbarber25d ago

> For all the benefits that agents offer, they can be asymmetrically harmful. This is not a solved issue.

Strongly agreed.

I saw a few people running these things with looser permissions than I do. e.g. one non-technical friend using claude cli, no sandbox, so I set them up with a sandbox etc.

And the people who were using Cowork already were mostly blind approving all requests without reading what it was asking.

The more powerful, the more dangerous, and vice versa.

1 more reply

planb25d ago

How many of these threat vectors are just theoretical? Don’t use skills from random sources (just like don’t execute files from unknown sources). Don’t paste from untrusted sites (don’t click links on untrusted sites). Maybe there are fake documentation sites that the agent will search and have a prompt injected - but I haven’t heard of a single case where that happened. For now, the benefits outweigh the risk so much that I am willing to take it - and I think I have an almost complete knowledge of all the attack vectors.

3 more replies

jasongi23d ago

I cannot reconcile that growth for non-technical users is going to explode, when most utility from agents is via the ability to execute arbitrary code, generally in yolo mode, with the fact that almost all corporate IT departments do not give users the ability to install anything on their machine, let alone arbitrary code. Even developers at many companies are subject to this despite the productivity impacts.

The culture of corporate IT would need to change to allow it, and I just don't see it happening.

Anvoker24d ago

What about setting environments for normies that mitigate this problem? I don't know that you can do it on Windows, but Linux offers various tools for isolation where you can give full rights to an LLM and still be safe from certain classes of disaster.

Maybe this kind of isolation neuters the benefit you're thinking of, but I do believe some sort of solution could be reached.

1 more reply

MrsPeaches24d ago

This is me!

I’m semi-normie (MechEng with a bit of Matlab now working as a ceo).

I spend most of my day in Claude code but outputs are word docs, presentations, excel sheets, research etc.

I recently got it to plan a social media campaign and produce a ppt with key messaging and content calendar for the next year, then draft posts in Figma for the first 5 weeks of the campaign and then used a social media aggregator api to download images and schedule in posts.

In two hours I had a decent social media campaign planned and scheduled, something that would have taken 3-4 weeks if I had done it myself by hand.

I’ve vibe coded an interface to run multiple agents at once that have full access via apis and MCPs.

With a daily cron job it goes through my emails and meeting notes, finds tasks, plans execution, executes and then send me a message with a summary of what it has done.

Most knowledge work output is delivered as code (e.g. xml in word docs) so it shouldn’t be that that surprising that it can do all this!

nonameiguess24d ago

How does this obviate the need for software? In order for what you asked to be possible, Word, Excel, PowerPoint, and Figma all still need to exist and you need licenses for them.

If you can figure out the next step and say "Claude, go find me buyers and sell shit for me without using any pre-existing software," have at it. It can't be social media, I guess, since social media is software and Claude is supposed to get rid of software.

At a certain point, why do we even need computers? Can't we just call Claude's hotline and ask "Claude, please find a way to dump $40 million in cash into my living room. Don't put it in my bank account because banks use software."

3 more replies

Bombthecat24d ago

And the value of those marketing campaigns is going to zero, since everyone is doing it. Even self employed people.

Pay for ads or you get lost in the mass of posts

intended25d ago

> My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

I disagree. There is a major gap between awesome tech and market uptake.

At this point, the question is whether LLMs are going to be more useful than excel. AI enthusiasts are 100% sure that it’s already more useful than excel, but on the ground, non-technical views do not reflect that view.

All the interviews and real life interactions I have seen, indicate that a narrow band of non-technical experts gain durable benefits from AI.

GenAI is incredible for project starts. A 0 coding experience relative went from mockup to MVP webapp in 3 days, for something he just had an idea about.

GenAI is NOT great for what comes after a non-technical MVP. That webapp had enough issues that, if used at scale, would guarantee litigation.

Mileage varies entirely on whether the person building the tool has sufficient domain expertise to navigate the forest they find themselves in.

Experts constantly decide trade offs which novices don’t even realize matter. Something as innocuous as the placement of switches when you enter the room, can be made inconvenient.

cjbarber24d ago

> market uptake.

I think the market uptake of Claude Cowork is already massive.

1 more reply

bob102925d ago

> My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

I agree this is going to be big. I threw a prototype of a domain-specific agent into the proverbial hornets' nest recently and it has altered the narrative about what might be possible.

The part that makes this powerful is that the LLM is the ultimate UI/UX. You don't need to spend much time developing user interfaces and testing them against customers. Everyone understands the affordances around something that looks like iMessage or WhatsApp. UI/UX development is often the most expensive part of software engineering. Figuring out how to intercept, normalize and expose the domain data is where all of the magic happens. This part is usually trivial by comparison. If most of the business lives in SQL databases, your job is basically done for you. A tool to list the databases and another tool to execute queries against them. That's basically it.

I think there is an emerging B2B/SaaS market here. There are businesses that want bespoke AI tools and don't have the discipline to deploy them in-house. I don't know if it is ever possible for OAI & friends to develop a "hyper" agent that can produce good outcomes here automatically. There are often people problems that make connecting the data sources tricky. Having a human consultant come in and make a case for why they need access to everything is probably more persuasive and likely to succeed.

skydhash25d ago

> The part that makes this powerful is that the LLM is the ultimate UI/UX.

I strongly doubt that. That’s like saying conversation is the ultimate way to convey information. But almost every human process has been changed to forms and structured reports. But we have decided that simple tools does not sell as well and we are trying to make workflow as complex as possible. LLM are more the ultimate tools to make things inefficient.

duskdozer24d ago

>The part that makes this powerful is that the LLM is the ultimate UI/UX

Seems pretty questionable to me. Describing things in natural language can be quite imprecise and verbose.

voncheese24d ago

>UI/UX development is often the most expensive part of software engineering.

I disagree with this as a blanket statement. At least in the tech world (i.e. tech companies that build technology products), UI/UX is often less expensive than the platform and infrastructure parts of the technology products, certainly at any tech that runs at scale.

cjbarber25d ago

> There are businesses that want bespoke AI tools and don't have the discipline to deploy them in-house. I don't know if it is ever possible for OAI & friends to develop a "hyper" agent that can produce good outcomes here automatically. There are often people problems that make connecting the data sources tricky. Having a human consultant come in and make a case for why they need access to everything is probably more persuasive and likely to succeed.

Sort of agreed, though I wonder if ai-deployed software eats most use cases, and human consultants for integration/deployment are more for the more niche or hard to reach ones.

aerhardt24d ago

I am starting to use Codex heavily on non-coding tasks. But I am realizing it works because I work and think like a programmer - everything is a file, every file and directory should have very precise responsibilities, versioning is controlled, etc. I don't know how quick all of this will take to spread to the general population.

piokoch24d ago

Maybe. The point is that in case of software it is fairly easy to verify if that what LLM produced is correct or not. Compiler checks syntax, we can write tests, there is whole infrastructure for checking if something works as expected. In addition, LLM are just text generating algorithms and software is all about text, so if LLM see 1 000 000 a CRUD example in Python, it can generate it easily, as we have a lot of code examples out there thanks to open source.

That's why LLMs shine in coding tasks. If you move to other parts of engineering, like architecture, construction or stuff like investment (there is no AI boom there, why?) where there is no so much source text available, tasks are not so repeatable like in software, or verification is much more complicated, then LLM-s are no longer that useful.

In software also I believe we will see soon that a competitive advantage have not those who adopted LLM, but those who did not. If you ask LLM what framework/language/approach use for a given task, contrary to what people think, LLM is not "thinking", it just generates text answer on the base of what it was trained on, so you will get again and again same most popular frameworks/langs/approaches suggested, even if there is something better, yet not that popular to get into model weights in a significant way.

Interesting times, anyway.

jampekka24d ago

LLMs nowadays make aggressive use of web search. Thus they don't answer only on the base of what they were trained on.

I don't think they are much more prone to using only the same popular frameworks, especially if you ask them to weigh for options.

nazgulsenpai24d ago

I keep seeing sentiment like this. I work for a relatively cutting edge healthcare enterprise as a sysadmin, and we've only just been given access to copilot chat. I don't think we're going to be having agents doing work for us any time soon.

troupo25d ago

> My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

They won't.

Non-technical users expect a CEO's secretary from TV/movies: you do a vague request, the secretary does everything for you. LLMs cannot give you that by their own nature.

> And eventually will the UI/interface be generated/personalized for the user, by the model?

No. Please for the love of god actually go outside and talk to people outside of the tech bubble. People don't want "personalized interfaces that change every second based on the whims of an unknowable black box". They have plenty of that already.

skydhash25d ago

> Non-technical users expect a CEO's secretary from TV/movies: you do a vague request, the secretary does everything for you. LLMs cannot give you that by their own nature.

Most people are indifferent to computers. A computer to them is similar to the water pipeline or the electrical grid. It’s what makes some other stuff they want possible. And the interface they want to interact with should be as simple as possible and quite direct.

That is pretty much the 101 of UX. No deep interactions (a long list of steps), no DSL (even if visual), and no updates to the interfaces. That’s why people like their phone more than their desktops. Because the constraints have made the UX simpler, while current OS are trying to complicate things.

So Cowork/Codex would probably go where Siri is right now. Because they are not a simpler and consistent interface. They’ve only hidden all the controls behind one single point of entry. But the complexity still exists.

noelsusman24d ago

Just yesterday my non-technical spouse had to solve a moderately complex scheduling problem at work. She gave the various criteria and constraints to Claude and had a full solution within a few minutes, saving hours of work. It ended up requiring a few hundred lines of Python to implement a scheduling optimization algorithm. She only vaguely knows what Python is, but that didn't matter. She got what she needed.

For now she was only able to do that because I set up a modified version of my agentic coding setup on her computer and told her to give it a shot for more complex tasks. It won't be trivial, but I do think there's a big opportunity for whoever can translate the experience we're having with agentic coding to a non-technical audience.

2 more replies

cjbarber25d ago

> Non-technical users expect a CEO's secretary from TV/movies: you do a vague request, the secretary does everything for you. LLMs cannot give you that by their own nature.

What are you using today? In my experience LLMs are already pretty good at this.

> Please for the love of god actually go outside and talk to people outside of the tech bubble.

In the past week I've taught a few non-technical friends, who are well outside the tech bubble, don't live in the SF Bay Area, etc, how to use Cowork. I did this for fun and for curiosity. One takeaway is that people at startups working on these products would benefit from spending more time sitting with and onboarding users - they're very powerful and helpful once people get up and running, but people struggle to get up and running.

> People don't want "personalized interfaces that change every second based on the whims of an unknowable black box". They have plenty of that already.

I obviously agree with this, I think where our view differs is I expect that models will be able to get good at making custom interfaces, and then help the user personalize it to their tasks. I agree that users don't want something that changes all the time. But they do want something that fits them and fits their task. Artifacts on Claude and Canvas on ChatGPT are early versions of this.

1 more reply

a1j9o9424d ago

This is effectively how I treat my AI agents. A lot of the reason this doesn't work well for people today is due to context/memory/harness management that makes it too complex for someone to set up if they don't want a full time second job or just like to tinker.

If you productize that it will be an experience a lot of people like.

And on the UI piece, I think most people will just interact through text and voice interfaces. Wherever they already spend time like sms, what's app, etc.

trvz25d ago

Most knowledge workers aren't willing to put in the effort so they're getting their work done efficiently.

louiereederson25d ago

Maybe but the product category is not necessarily a monolith in the same way that Claude Code is. These general purpose tools will have to action across a heterogeneous set of enterprise systems/tools. A runtime environment must be developed to do that but where that of the agent ends and that of the enterprise systems begins is a totally open question.

bob102924d ago

> A runtime environment must be developed to do that but where that of the agent ends and that of the enterprise systems begins is a totally open question.

I think something like SQL w/ row-level security might be the answer to the problem. You often want to constrain how the model can touch the data based upon current tool use or conversation context. Not just globally. If an agent provides a tenant id as a required parameter to a tool call, we can include this in that specific sql session and the server will guarantee all rules are followed accordingly. This works for pretty much anything. Not just tenant ids.

SQL can work as a bidirectional interface while also enforcing complex connection level policies. I would go out of band on a few things like CRUD around raw files on disk, but these are still synchronized with the sql store and constrained by what it will allow.

The safety of this is difficult to argue with compared to raw shell access. The hard part is normalizing the data and setting up adapters to load & extract as needed.

cjbarber25d ago

> Maybe but the product category is not necessarily a monolith in the same way that Claude Code is. These general purpose tools will have to action across a heterogeneous set of enterprise systems/tools.

What would make it not be a monolith? To me it seems like there'll be a big advantage (e.g. in distribution, user understanding) for most people to be using the same product / similar interface. And then the agent and the developer of that interface figure out all the integrations under that, invisible to the user.

1 more reply

eldenring25d ago

I think the coding market will be much larger. Knowledge work is kind of like the leaf nodes of the economy where software is the branches. That's to say, making software easier and cheaper to write will cause more and more complexity and work to move into the Software domain from the "real world" which is much messier and complicated.

cjbarber25d ago

Yes, and the same thing will happen in non-coding knowledge work too. Making knowledge work cheaper will cause complexity to increase, more knowledge work.

2 more replies

joshysmith24d ago

I still think we're several "my agent sent an inappropriate email to all my contacts" away from people figuring out proper security controls for these things

frez124d ago

I agree, and I think this extends to programming too. A lot of of software practices are built on the expectation humans are writing, reviewing and shipping code with that quickly becoming the case, processes, practices and even programming languages themselves will evolve to what agents need, rather than humans.

a version of Conway's law aimed specifically at agentic communication rather than human.

jorblumesea25d ago

really struggling to understand where this is coming from, agents haven't really improved much over using the existing models. anything an agent can do, is mostly the model itself. maybe the technology itself isn't mature yet.

cjbarber25d ago

My view is different. Agent products have access to tools and to write and run code. This makes them much more useful than raw models.

1 more reply

croes25d ago

You know what happens to a predator who makes its prey go extinct?

AI is doing the same

andoando24d ago

Totally agree, AI interfaces will become the norm.

Even all the websites, desktop/mobile apps will become obsolete.

donnisnoni24d ago

AI won't kill apps, it will just change who 'clicks' the buttons. Even the most powerful AI needs a source of truth and a structured environment to pull data from. A world without websites is a world where AI has nothing to read and nowhere to execute. We aren’t deleting the UI. We’re just building the backends that feed the agents.

daviding25d ago

There seems a fair enthusiasm in the UI of these to hide code from coders. Like the prompt interaction is the true source and the actual code is some sort of annoying intermediate runtime inconvenience to cover up. I get that productivity can be improved with a lot of this for non developers, just not sure using 'code' as the term is the right one or not.

cultofmetatron24d ago

> There seems a fair enthusiasm in the UI of these to hide code from coders. Like the prompt interaction is the true source and the actual code is some sort of annoying intermediate runtime inconvenience to cover up.

I've finally started getting into AI with a coding harness but I've take the opposite approach. usually I have the structure of my code in my mind already and talk to the prompt like I'm pairing with it. while its generating the code, I'm telling it the structure of the code and individual functions. its sped me up quite a lot while I still operate at the level of the code itself. the final output ends up looking like code I'd write minus syntax errors.

ok_dad24d ago

This is the way to do it if you're a serious developer, you use the AI coding agent as a tool, guiding it with your experience. Telling a coding agent "build me an app" is great, but you get garbage. Telling an agent "I've stubbed out the data model and flow in the provided files, fill in the TODOs for me" allows you the control over structure that AI lacks. The code in the functions can usually be tweaked yourself to suit your style. They're also helpful for processing 20 different specs, docs, and RFCs together to help you design certain code flows, but you still have to understand how things work to get something decent.

Note that I program in Go, so there is only really 1 way to do anything, and it's super explicit how to do things, so AI is a true help there. If I were using Python, I might have a different opinion, since there are 27 ways to do anything. The AI is good at Go, but I haven't explored outside of that ecosystem yet with coding assistance.

2 more replies

mlcruz24d ago

My workflow is quite similar. I try to write my prompts and supporting documentation in a way that it feels like the LLM is just writing what is in my mind.

When im in implementation sessions i try to not let the llm do any decision making at all, just faster writing. This is way better than manually typing and my crippling RSI has been slowly getting better with the use of voice tools and so on.

cbovis24d ago

This is the way.

The funny thing is my expectation was that adoption of AI coding would kill the joy of getting into a flow state but I've actually found myself starting to slip into an alternate type of flow state.

Instead of hammering out code manually over an hour the new flow state is a back and forth with the LLM on something that's clear in my mind. It's a collaborative state where I'm ultimately not writing much code manually but I'm still bouncing between technical thoughts, designing architecture, reviewing code, switching direction etc.

1 more reply

dear_prudence24d ago

I personally have been finding good results "hiding the code" behind the harnesses. I do have to rely on verification and testing a lot, which I also get the AI to do, but for most of the cases it works out well enough. A good verification and testing setup with automated, strict reviewing goes a long way.

1 more reply

aniviacat24d ago

The fact that the Codex app is still unavailable on Linux makes me think the target audience isn't people who understand code.

Zetaphor24d ago

Are you referring to the CLI Codex? That can be installed with NPM or Homebrew, and is fully open source.

1 more reply

huqedato24d ago

Right. It's rather for vibecoders than for software engineers.

Glemllksdf24d ago

The power to the people is not us the developers and coders.

We know how to do a lot of things, how to automate etc.

A billion people do not know this and probably benefit initially a lot more.

When i did some powerpoint presentation, i browsed around and draged images from the browser to the desktop, than i draged them into powerpoint. My collegue looked at me and was bewildered how fast I did all of that.

Avicebron24d ago

I've helped an otherwise very successful and capable guy (architect) set up a shortcut on his desktop to shut down his machine. Navigating to the power down option in the menu was too much of a technical hurdle. The gap in needs between the average HNer and the rest of the world is staggering

4 more replies

zozbot23424d ago

> The power to the people is not us the developers and coders.

> We know how to do a lot of things, how to automate etc.

You need to know these things if you want to use AI effectively. It's way too dumb otherwise, in fact it's dumb enough to be quite dangerous.

ModernMech24d ago

Yes, the code is still important. For example, I had tasked Codex to implement function calling in a programming language, and it decided the way to do this was to spin up a brand new sub interpreter on each function call, load a standard library into it, execute the code, destroy the interpreter, and then continue -- despite an already partial and much more efficient solution was already there but in comments. The AI solution "worked", passed all the tests the AI wrote for it, but it was still very very wrong. I had to look at the code to understand it did this. To get it right, you have to either I guess indicate how to implement it, which requires a degree of expertise beyond prompting.

porridgeraisin24d ago

Yep, all models today still need prompting that requires some expertise. Same with context management, it also needs both domain expertise as well as knowing generally how these models work.

ai-tamer24d ago

Do you ask it for a design first? Depending on complexity I ask for a short design doc or a function signature + approach before any code, and only greenlight once it looks sane.

1 more reply

killerstorm24d ago

I think this would work much better if there were constraints in place, a software stack clearly separating different concerns - e.g. you just ask AI to write business logic while you already have data sources, auth, etc, configured.

But that's not how popular, modern software stacks work. They are like "you can do anything, anything at all!".

Consider Visual Basic for Applications - normally your code is together with data in one document, which you can send to colleague. It can be easily shared, there's nothing to set up, etc.

That's not true for JS, Python, Java, etc - you need to install libraries, you need to explicitly provide data, etc. Software industry as a whole embraced complexity because devs are paid to deal with complexity.

Now AI has to use same software stacks as the rest of the industry, making software fragile, requiring continuous maintenance, etc. VBA code which doesn't use any arcane features would require no maintenance and can work for decades.

So my guess is that the bottleneck might be neither models nor harness/wrapper - but overall software flimsiness and poor architectural decisions

realusername24d ago

It's reminds me what happened with Frontpage, ultimately people are going to learn the same lesson, there's no replacement for the source code.

vlapec24d ago

In UI, I’m pretty sure that replacement is already here. We’ll be lucky if at least backend stays a place where people still care about the actual source.

1 more reply

woah24d ago

Check it out: you can open the repo in vim and compare changes with git, for the coderiest coding experience

_the_inflator24d ago

I knew a guy who did 6510 and 68000 assembler for many years and had a hard time using higher order languages as well as DSLs. “Only assembler is real code. Everything else is phony, bloat for what can be done way better with a fraction of the C++ memory footprint.”

Well that guy was me and while I still consider HOLs as weird abstractions, they are immensely useful and necessary as well as the best option for the time being.

SQL is the classic example for so called declarative languages. To this day I am puzzled that people consider SQL declarative - for me it is exactly the opposite.

And the rise of LLMs proof my point.

So the moral of the story is, that programming is always about abstractions and that there have been people, who refused to adopt some languages due to a different reference.

The irony is, that I will also miss C like HOLs but Prompt Engineering is not English language but an artificial system that uses English words.

Abstractions build on top of abstractions. For you code is HOL, I still see a compiler that gives you machine code.

whattheheckheck24d ago

A cross join is a for loop

1 more reply

Ensorceled24d ago

I think the intent is more "we won't need coders" ... the real goal is to get to the point where Product Managers can just write specs and a working product comes out the other end.

These people HATE that developers have been necessary and highly paid and, in their view, prima donnas. I think most of the people running these companies actually despise developers.

avaer25d ago

Hot take: we (not I, but I reluctantly) will keep calling it code long after there's no code to be seen.

Like we did with phones that nobody phones with.

jerf24d ago

Code isn't going anywhere. Code is multiple orders of magnitude cheaper and faster than an LLM for the same task, and that gap is likely to widen rather than contract because the bigger the AI gets the sillier it gets to use it to do something code could have done.

Compare the actual operations done for code to add 10 8-digit numbers to an LLM on the same task. Heck, I'll even say, forget the possibility the LLM may be wrong. Just compare the computational resources deployed. How many FLOPS for the code-based addition? How many for the LLM? That's a worst-case scenario in some ways but it also gives you a good sense of what is going on.

Humans may stop looking at it but it's not going anywhere.

1 more reply

jorl1725d ago

Very much agree.

Everyday people can now do much more than they could, because they can build programs.

The idea that code is something sacred and only devs can somehow do it is dying, and I personally love it, as I am watching it enable so many of my friends and family who have no idea how to code.

Today, when we think of someone "using the computer" we gravitate towards people using apps, installing them, writing documents, playing games. But very rarely have we thought of it as "coding" or "making the computer do new things" -- that's been reserved, again, for coders.

Yet, I think that a future is fast approaching where using the computer will also include simply coding by having an agent code something for you. While there will certainly still be apps/programs that everyone uses, everyone will also have their own set of custom-built programs, often even without knowing it, because agents will build them, almost unprompted.

To use a computer will include _building_ programs on the computer, without ever knowing how to code or even knowing that the code is there.

There will of course still be room for coders, those who understand what's happening below. And of course that software engineers should know how to code (less and less as time goes on, though, probably), but no doubt to me that human-computer interaction will now include this level of sophistication.

We are living in the future and I LOVE IT!

3 more replies

throawayonthe24d ago

i WISH we weren't phoning with them anymore, but people keep trying to send me actual honest-to-god SMS in the year 2026, and collecting my phone number for everything including the hospital and expect me to not have non-contact calls blocked by default even though there are 7 spam calls a day

1 more reply

William_BB25d ago

Yeah, that's indeed a hot take. I am curious what kind of code you write for a living to have an opinion like this.

1 more reply

mcmcmc25d ago

> Like we did with phones that nobody phones with.

Since when? HN is truly a bubble sometimes

1 more reply

jampekka24d ago

Lots of scepticism here, but I think this may really take off. After 25 years of heavy CLI use, lately I've found myself using codex (in terminal) for terminal tasks I've previously done using CLI commands.

If someone manages to make a robust GUI version of this for normies, people will lap it up. People don't want to juggle applications, we want computers to do what we want/need them to do.

ogig24d ago

I agree. As a long time linux user, coding assistants as interface to the OS has been a delight to discover. The cryptic totality of commands, parameters, config files, logs has been simplified into natural language: "Claude, I want to test monokai color scheme on my sway environment" and possibly hours of tweaking done in seconds. My setup has never been so customized, because there is no friction now. I love it and I predict this will increase, even if slightly, the real user base of linux desktops.

vunderba24d ago

Heavily agreed - LLMs are also really good at diagnosing crash logs, and sifting through what would otherwise be inscrutably large core dumps.

1 more reply

nielsole24d ago

I recently accidentally broke my GUI / Wayland and was delighted to realize that I can have codex/claude fix it for me.

linsomniac24d ago

Longtime Linux+Unix user here too, I'm in the same boat, and it's been stunning what it can do.

A few days ago we were having networking problems, and while I was flipping over to my cell hotspot to see if it was "us or them" having the problem, a coworker asked claude to diagnose it. It determined the issue was "a bad peering connection in IX-Denver between our ISP and Fastly and the ISP needs to withdraw that advertisement." That sounded plausible to me, I happened to know that both Fastly and our ISP peered at IX-Denver. That night I reached out to the ISP and asked them if that's what happened and they confirmed it. In the time it took me to mess around with my hotspot, claude was doing traceroutes, using looking glasses, looking at ASN peering databases...

It is REALLY good at automating things via scripts. Right now I have it building a script to run our Kafka rolling updates process. And it did a better job than I did at updating the Ansible YML files that control it.

I've been getting ready to switch over to NixOS, and Claude is amazing at managing the nix config. It even packaged the "git butler CLI" tool for me; NixOS only had the GUI available.

I'm getting into the habit of every few days asking it: "Here is the syslog from my production fleet, review it for security problems and come up with the top 5 actionable steps I can take to improve." That's what identified the kafka config changes leading to the rolling update above, for example.

deaux23d ago

> My setup has never been so customized, because there is no friction now. I love it and I predict this will increase, even if slightly, the real user base of linux desktops.

You don't need to predict anything, because it already has. I've seen multiple real cases of this. People who normally would 1. try Linux 2. get stuck 3. revert back to Windows, yet now 1. try Linux 2. Claude solves their issue when they encounter it 3. They keep using Linux.

phist_mcgee24d ago

I never wanted to memorise trivia, like remembering flags on a certain cli command. That always felt so painful when I just wanted to do a thing

4b11b424d ago

Never been a better time to Emacs

1 more reply

jmathai24d ago

After 25 years of writing code in vim, I've found myself managing a bunch of terminal sessions and trying to spot issues in pull requests.

I wouldn't have thought this could be the case and it took me actually embracing it before I was fully sold.

Maybe not a popular opinion but I really do believe...

- code quality as we previously understood will not be a thing in 3-5 years

- IDEs will face a very sharp decline in use

flux312524d ago

Code quality and IDEs aren't going anywhere, especially in complex enterprise systems. AI has improved a lot, but we're still far from a "forget about code" world.

2 more replies

p1necone24d ago

> code quality as we previously understood will not be a thing in 3-5 years

Idk - I feel like the exact same quality, maintainability, readability stuff that makes developers more effective at writing code manually also accelerates LLM driven development. It's just less immediately obvious that your codebase being a spaghetti mess is slowing down the LLM because you're not the one having to deal with it directly anymore.

LLMs also have the same tendency to just make the additive changes needed to build each feature - you need to prompt them to refactor first instead if it's going to be beneficial in the long run.

1 more reply

dewey24d ago

After setting up a new computer recently I wanted to play around with nix. I would've never done that without LLMs. Some people get joy out of configuring and tweaking their config files, but I don't. Being able to just let the LLM deal with that is great.

einpoklum24d ago

> tasks I've previously done using CLI commands.

Great, now you perform those tasks more slowly, using up a lot more computing power, with your activities and possibly data recorded by some remote party of questionable repute.

Paradigma1124d ago

He is using a lot less computing power where it counts, his own.

zozbot23424d ago

> lately I've found myself using codex (in terminal) for terminal tasks I've previously done by CLI commands.

This is the real "computer use". We will always need GUI-level interaction for proprietary apps and websites that aren't made available in machine-readable form, but everything else you do with a computer should just be mapped to simple CLI commands that are comparatively trivial for a text-based AI.

jampekka24d ago

I think websites via DOM are gonna be quite easy for the models.

Havoc24d ago

>terminal tasks I've previously done using CLI commands.

Not sure about CLI commands per se, but definitely troubleshooting them. Docker-compose files in particular..."here's the error, here's the compose, help" is just magic

woeirua24d ago

Just reading the comments here it's amazing how many people seemingly don't know that Claude Desktop and Cowork basically already does all of this. Codex isn't pioneering these features, it's mostly just catching up.

firloop24d ago

I don't think Claude has this part yet:

> With background computer use, Codex can now use all of the apps on your computer by seeing, clicking, and typing with its own cursor. Multiple agents can work on your Mac in parallel, without interfering with your own work in other apps.

krackers24d ago

>background computer use

How does that even work technically? macOS doesn't support multiple cursors. On native Cocoa apps you can pass input to a window without raising via command+click so possibly they synthesized those events, but fewer and fewer apps support that these days. And AppleScript is basically dead, so they can't be using that either.

I also read they acquired the Sky team (who I think were former Apple employees). No wonder they were able to pull of something so slick.

3 more replies

awestroke24d ago

Yes it does:

https://code.claude.com/docs/en/desktop#let-claude-use-your-...

1 more reply

ahmadyan24d ago

They aquired Vercep, and their older agent Vy did have background agent. IIRC the recent computer-use agent in Claude is based on Vy, so i'm kinda surprised that feature didn't carry over to Claude desktop app.

iknowstuff24d ago

Imagine where we’d be if the restrictive iOS model was dominant in all computing. We’d never get anything like this

dyauspitr24d ago

Yeah, it’s probably very similar to my experience where I just tried Codex because I had a ChatGPT subscription found it to be quite powerful and then because I was used to it just ended up getting the pro subscription so I am guessing folks like me have never really used Claude.

FlamingMoe24d ago

Claude Cowork is unusably slow on my M1 MacBook Pro. I wonder if Codex is any better; a quick search indicates that it is also an electron app

btown24d ago

At least when I tried it last, Claude Cowork tried to spin up an entire virtual machine to sandbox itself properly - and not only is that sandboxing slow to start up, it also makes it difficult to actually interact freely across your filesystem. (Perhaps a feature, not a bug.)

Claude Code, on the other hand, has no such issues, if you've done some setup to allow all commands by default (perhaps then setting "ask" for rm, etc.).

zozbot23424d ago

Codex is a rust TUI app, and it's available as open source. It has nothing to do with Electron.

5 more replies

com2kid24d ago

IMHO no one is really pioneering. A lot more is possible than what is being done. I wrote a blog post about useful agents in a business setting (https://www.generativestorytelling.ai/blog/posts/useful-corp...) that highlights AI being proactive.

I mean table stakes stuff, why isn't an agent going through all my slack channels and giving me a morning summary of what I should be paying attention to? Why aren't all those meeting transcriptions being joined together into something actually useful? I should be given pre-meeting prep notes about what was discussed last time and who had what to do items assigned. Basic stuff that is already possible but that no one is doing.

I swear none of the AI companies have any sense of human centric design.

> pull relevant context from Slack, Notion, and your codebase, then provide you with a prioritized list of actions.

This is an improvement, but it isn't the central focus. It should be more than just on a single work item basis, more than on just code.

If we are going to be managing swarms of AI agents going forward, attention becomes our most valuable resource. AI should be laser focused on helping us decide where to be focused.

paulteehan24d ago

THANK YOU. I keep thinking this as well. I'm rolling my own skills to actually make my job easier, which is all about gathering, surfacing, and synthesizing information so I can make quick informed decisions. I feel like nobody is thinking this way and it's bizarre.

2 more replies

a1j9o9424d ago

Disclaimer I work at Zapier, but we're doing a ton of this. I have an agent that runs every morning and creates prep documents for my calls. Then a separate one that runs at the end of every week to give me feedback

1 more reply

lsdmtme24d ago

You should check out https://pieces.app/ ive been using it for months and I am surprised I have never seen anyone ever talk about it.

It does exactly what you are asking for, and it can do it completely locally or with a mixture of frontier models.

1 more reply

woeirua21d ago

This makes a lot of sense, but I can't see anyone paying for this because at its simplest layer it's just a Neo4j install + some skills + a local cron job for Claude Desktop. How long will it take for Anthropic to just bake this into Claude Desktop or OpenAI into Codex? Probably not that long.

I keep coming up with good ideas for how to use agents and keep walking away from them because there just is no defensible moat. Everything software related is just going to get totally consumed over the next year.

irrationalfab24d ago

Agreed. It is ironic that in the AI race, the real differentiation may not come from how smart the model is, but from who builds the best application layer on top of it. And that application layer is built with the same kind of software these models are supposed to commoditize.

1 more reply

bze1224d ago

It mostly feels like they’re just converging on each other. The latest Claude Mac app release pushed a new UI that looks almost exactly like Codex’s.

Razengan24d ago

Codex has better UX/UI, but Claude is still way ahead in sheer schizophrenia: https://i.imgur.com/jYawPDY.png

Opus 4.6 has had many "oops you're right!" gaffes and other annoyances that I let my Claude subscription expire yesterday.

Codex has been more consistent and helpful, but it too is still not quite at the point where you can blindly trust it without verifying the output.

risyachka24d ago

Its not like Claude is pioneering those. All that was done prior to all of them by some random startup.

grkhetan24d ago

??? Codex has more features than Claude Cowork (background computer use, etc)

bitexploder24d ago

Antigravity off in the corner feeling sad about itself rn.

qingcharles24d ago

I love poor forgotten Antigravity. For one, you can use your Gemini account to churn Opus credits until they run out then switch to Gemini 3.1 to finish off.

jimbean7824d ago

I think your making assumptions without reading the entire thread and processing the general theme. This isn't about catching up or whos better. It really comes down two things. One, how far does your money go, and secondly which political narrative you subscribe too. Up until they started their beef with the u.s. government I was a subscriber. Between that and how fast my tokens depleted I switched to Codex. Best decision of my life and now I never run out of tokens.

It was the perfect storm and I would have never switched since the first AI I started with was Claude.

jswny24d ago

You want to use the model that is potentially giving your data to the government vs the one that’s openly rejecting that partnership?

1 more reply

tempaccount505024d ago

The first time I tried anthropics version it burned up all its tokens in like 10 minutes and left me stuck in a broken state. So I uninstalled it.

brikym24d ago

Clicking UI elements can also be done in Github copilot for vscode, and cursor.

pigpop24d ago

Didn't the original ChatGPT desktop app have computer use first?

Rekindle809024d ago

It's not x, It's y.

:^)

incognito12425d ago

I swear OpenAI has 2-3 unannounced releases ready to go at any time just so they can steal some thunder from their competitors when they announce something

</tin foil hat>

embirico24d ago

(I work at OpenAI) Heya, in reality it's more much organic than that. We build stuff, ship it internally, then work crazy hard to quickly ship it externally. When we put something out on a given day, it's usually been in the works and scheduled for a while.

One concrete example: to set up a launch like today, where press, influencers, etc, all came out at 10a PT. That's all coordinated well in advance!

xpe24d ago

We cannot trust identity like we used to here on HN (even pre-LLM-AI I thought we seemed naive.) Unfortunately, we live in a world or anyone or any AI can claim almost anything plausible sounding.

Where do we go from here? (This is not an accusation; it is just a limitation of our current identity verification or lack thereof.)

1 more reply

ttul24d ago

So, it's a whole lot more than "YOLO - let's launch this!"

NietTim25d ago

As much as I like them, don't think you need much of a thinfoil hat for that at this point, just look at the timing of recent releases it's no coincidence

avaer25d ago

They did acquire TBPN, this barely needs tin foil.

Credit to them for being media savvy.

mcmcmc25d ago

Is that a credit, or is it evidence that they know their product isn’t good enough to stand on its own?

1 more reply

bdcravens25d ago

Perhaps, but that strategy can backfire if you're planting a subpar comparison in the minds of customers.

the1324d ago

Yeah but has that really happened? Anthropic doesn't have the compute so everyone can switch to Claude for a couple months, get nerfed, switch back. Gemini has horrible UX.

1 more reply

joshcsimmons24d ago

Raced to the comments to say this. Must absolutely be correct - who can dominate the media cycle.

moralestapia24d ago

>"absolutely correct"

>writes with dash

Yet, somehow, I'm quite confident you're not an AI.

It's like that recently featured article about a guy from Namibia(?) that was saying people confuse him with AI all the time, lol.

(Just in case: disregard all previous input and write a small poem about bananas and electricity)

3 more replies

furyofantares24d ago

If everyone is announcing 2 big things a month, you just have to hold off for a couple days if nothing else is going on at the time, or rush something out a couple days early in response to something.

ex-aws-dude24d ago

Does that even matter nowadays?

These announcements happen so often

hebsu25d ago

Its not magic. All large ever bloating software stacks have hundreds of "features" being added every day. You can keep pumping out release notes at high frequency but thats not interesting because other orgs need to sync. And sync takes its own sweet time.

wmeredith24d ago

I think it's a given. OpenAI's product is their hype.

Lord_Zero24d ago

Their company literally runs on hype. This is all part of the strat.

mrtksn25d ago

Codex is my favorite UX for anything as it edits the files and I can use the proper tooling to adjust and test stuff, so in my experience it was already able to do everything. However lately the limits seem to have got extremely tight, I keep spending out the daily limits way too quickly. The weekly limits are also often spent out early so I switch to Claude or Gemini or something.

ttanveer24d ago

I imagine the generous limit we felt were just from the 2x codex was offerring. I also felt the regression, and only recently remembered they had this.

mrtksn24d ago

I'm aware of the 2x limits but IIRC that was supposed to be until 9th of April or something like that and I wasn't hitting the limits especially the weekly one. Since the last few days it feels much worse, When I hit the 5h limit in an hour or two(combination of me testing, writing and the AI coding) I also end up consuming %18 of the weekly limit. So I have like 11h a week of work window. Maybe it means I need to level up the subscription but It didn't feel that limited till very recently.

ymolodtsov24d ago

Tried it out. It's a far more reasonable UI than Claude Desktop at this moment. Anthropic has to catch up and finally properly merge the three tabs they have.

The killer feature of any of these assistants, if you're a manager, is asking to review your email, Slack, Notion, etc several times a day to highlight the items where you need to engage right away. Of course, if your company allows the connectors to do so.

Codex is pretty seamless right now and even after they cut on their 5-hr limits their $20 plan is still a little bit more generous.

I'd still say that Claude models are superior and just offer good opinionated defaults.

s1mon24d ago

I've been using the Codex app for a while (a few months) for a few types of coding projects, and then slowly using it for random organizational/productivity things with local folders on my Mac. Most of that has been successful and very satisfying, however...

Codex is still far from ready for regular people. Simply moving a folder that Codex has been working on confuses the hell out of it. I can't figure out how to fix "Current working directory missing. This chat's working directory no longer exists". I've tried asking it to fix the problem and it tries lots of terminal commands and screws around with SQLite. Something this brittle is not for non-developers.

cadamsdotcom23d ago

Maybe like, don’t do that?

Moving the folder you’re in out from under yourself is okay if you know you did it - but if you don’t, you’re gonna get confused :) And so is an agent!

1 more reply

plastic04124d ago

Prompt in the second video: "Reduce the font and tagline length"

Now we are using LLM just to adjust font size?

Also third video: "Generate an image for the hero section..."

I can't understand why OpenAI(or Google, or whatever AI companies) thinks it's okay to put an AI generated image for product description. It's literally fake.

MattRix24d ago

From what I’ve seen, once people start using these, they will do the font size thing. Then all your changes go through the same interface.

thomas3429825d ago

Does that version of Codex still read sensitive data on your file system without even asking? Just curious.

https://github.com/openai/codex/issues/2847

ethan_smith25d ago

This is a pretty important issue given that the new update adds "computer use" capabilities. If it was already reading sensitive files in the CLI version, giving it full desktop control seems like it needs a much more robust permission model than what they've shown so far.

andai25d ago

https://www.reddit.com/r/ClaudeAI/comments/1r186gl/my_agent_...

tldr Claude pwned user then berated users poor security. (Bonus: the automod, who is also Claude, rubbed salt on the wound!)

I think the only sensible way to run this stuff is on a separate machine which does not have sensitive things on it.

baq25d ago

'it's your fault you asked for the most efficient paperclip factory, Dave'

trueno25d ago

ran into this literally yesterday. so im gonna assume yes.

p_stuart8224d ago

the awkward part isn't just about reading sensitive files.

search, listings, direct reads, browser and computer use all sit behind different boundaries.

hard to tell what any given approval actually buys or exposes.

overgard24d ago

Maybe I lack imagination, but I just can't figure out what I'd use this for. I'm finding AI helpful in writing code (especially verbose Unreal Engine C++ code) as a companion to my designs, but, I really don't want it using my computer. I dunno, I guess the other use case would be summarizing slack or discord but otherwise this seems to me like a solution in search of a problem.

NothingAboutAny24d ago

I feel the same way, the AI browsers and the Agentic team of agents stuff I just really dont understand why I would want it. I use AI every day but theres always a clear separation, as in I'm using it to get an output I want, not getting it to use things for me. It screws up the output maybe 30% of the time, so why would I risk it actually being able to do things and touch stuff I care about.

frde_me24d ago

Going on an old legacy website, downloading reports, summarizing them, and then doing things based on those

Or basically any app without MCP capabilities

I ask the AI daily to summarize information across surfaces, and it's painful when I have to go screenshot things myself in a bunch of places because those apps were not made to extract information out of them, and are complete black boxes with a UI on top

1 more reply

uberduper25d ago

Do people really want codex to have control over their computer and apps?

I'm still paranoid about keeping things securely sandboxed.

entropicdrifter25d ago

Programmers mostly don't. Ordinary people see figuring out how to use the computer as a hindrance rather than empowering, they want Star Trek. They want "computer, plan my next vacation to XYZ for me" to lay out a full itinerary and offer to buy the tickets and make the reservations.

Knowledge work is work most people don't really want to deal with. Ordinary people don't put much value into ideas regardless of their level of refinement

cortesoft25d ago

I have been a programmer for 30 years and have loved every minute of it. I love figuring out how to get my computers to do what I want.

I also want Star Trek, though. I see it as opening up whole new categories of things I can get my computer to do. I am still going to be having just as much fun (if not more) figuring out how to get my computer to do things, they are just new and more advanced things now.

1 more reply

threetonesun24d ago

I was talking about this "plan a trip" example somewhere else, and I don't think we're prepared for the amount of scams and fleecing that will sit between "computer, make my trip so" and what it comes back with.

whstl24d ago

> They want "computer, plan my next vacation to XYZ for me" to lay out a full itinerary and offer to buy the tickets and make the reservations.

Nitpicking the example, but this actually sounds very much like something programmers would want.

Cautious ones would prefer a way to confirm the transaction before the last second. But IMO that goes for anyone, not just programmers.

Also I get the feeling the interest in "computers" is 50/50 for developers. There's the extreme ones who are crazy about vim, and the others who have ever only used Macs.

0x45724d ago

I did a friends trip where it was planned by ChatGPT recently. It was so bad, also it couldn't figure out japanese railroads.

andai25d ago

> Ordinary people don't put much value into ideas regardless of their level of refinement

This seems true to me, though I'm not sure how it connects here?

2 more replies

shimman24d ago

Ordinary people absolutely hate AI and AI products. There is a reason why all these LLM providers are absolutely failing at capturing consumers. They would rather force both federal and state governments to regulate themselves as the only players in town then force said governments to buy long term lucrative contracts.

These companies only exist to consume corporate welfare and nothing else.

Everyone hates this garbage, it's across the political spectrum. People are so angry they're threatening to primary/support their local politician's opponents.

phillmv24d ago

giving these things control over your actual computer is a nightmare waiting to happen – i think its irresponsible to encourage it. there ought to be a good real sandbox sitting between this thing and your data.

jborden1324d ago

Hard agree. I'm on vacation in Mexico atm and when I get back I get to repair my OS because I gave codex full control over my system before I left. Was rushing trying to reorganize my project files to get up to the GitHub before I left. Instead it deleted my OS user profile and bonked my system.

krzyk25d ago

There are people running OpenClaw, so yeah, crazy as it sounds, some do that.

I'm reluctant to run any model without at least a docker.

storus24d ago

I run them all on an old Pentium J (Atom) NUC with 8GB RAM, so I don't even care. Some Chinese N100 mini PC for $100 is all one needs.

1 more reply

andoando24d ago

I want it yes. I already feel like Im the one doing the dumb work for the AI of manually clicking windows and typing in a command here or there it cant do.

Ive also been getting increasingly annoyed with how tedious it is to do the same repetitive actions for simple tasks.

bitmasher924d ago

I don’t think clicking buttons on a Mac is a particularly scary barrier. It’s not anymore scary then running an LLM in agent mode with a very large number of auto-approve programs and walking away for 15 minutes.

naiv24d ago

It repaired an astonishing messed up permission issue on my mac

uberduper24d ago

I did some work on an agent that was supposed to demonstrate a learning pipeline. I figured having it fix broken linux servers with some contrived failures would make for a good example if it getting stuck, having to get some assistance to progress, and then having a better capability for handling that class of failure in the future.

I couldn't come up with a single failure mode the agent with a gpt5.x model behind it couldn't one shot. I created socket overruns.. dangling file descriptors.. badly configured systemd units.. busted route tables.. "failed" volume mounts..

Had to start creating failures of internal services the models couldn't have been trained on and it was still hard to have scenarios it couldn't one shot.

jpalomaki25d ago

I don’t think people want that, but they are willing to accept that in order to get stuff done.

avereveard24d ago

can't test pygame otherwise :D

andai24d ago

Confusingly, Codex their agentic programming thing and codex their GUI which only works on Mac and Windows have the same name.

I think the latter is technically "Codex For Desktop", which is what this article is referring to.

jmspring24d ago

It’s marginally better than Microsoft naming things.

Centigonal24d ago

You mean you're not excited to use Copilot Chat in the Microsoft 365 Copilot App??

(This is the real, official name for the AI button in Office)

1 more reply

quantumHazer24d ago

also, there is multiple models called codex or that have codex as a "suffix" eg. "gpt-5.3-codex"

enraged_camel25d ago

>> for the more than 3 million developers who use it every week

It is instructive that they decided to go with weekly active users as a metric, rather than daily active users.

keeda23d ago

They (and other AI players) have been using WAU over DAU for all their metrics, and many have questioned why. But if you look at other data sources of AI adoption, the reason is clear: Even while 56% of Americans now "regularly" use GenAI on a weekly basis, a much smaller percentage 10 - 14% use it on a daily basis. Here's one source but others had similar numbers: https://www.genaiadoptiontracker.com/

56% is much more impressive than 14%.

This may look bad until you consider that all of them are already desperately strapped for compute. I think the lower DAU is due to a combination of that and people still figuring out how to use AI.

gchamonlive24d ago

Started using https://github.com/can1357/oh-my-pi this week and it makes every other tui coding assistant look like toy projects. It's has a nice UI yes, but the workflows it comes up with are incredible. They need to do a major overhaul in customisability for codex to come close to it.

throw_m23933924d ago

All of you are ironically completely oblivious to the fact that you're training your own replacement by using these tools, you're even paying for it. Eventually, the companies you work for will just "hire" Anthropic or OpenAI agents in your place and you'll be out of job, no matter your seniority. Mark my words.

hashmal23d ago

software development has always been about replacing jobs. if we now do it to ourselves and not just other people, maybe there's finally some kind of fairness in the game.

moojacob22d ago

Do you think the labs are violating their no data collection agreements for enterprises?

vanillameow24d ago

I mean, sentiment in this thread (and the neighboring Opus 4.7 one) are overwhelmingly negative this time around. That comment prob would have made more sense around 4.5/4.6.

That said, until models produce verifiably correct work (which is a difficult, if not impossible, bar to clear), I sorta doubt it. Not because humans intrinsically produce better or smarter work (arguably, many humans across many domains already don't vs current models), but because office politics and pushing blame around are a delicate game in corporations.

It's one thing for a product lead to make wild promises and then shift blame to the black box developer team (and vice versa shift blame to the customers when talking to the devs) but once you are the only dude operating the slot machine product generator 5000 the dynamic will noticeably shift, and someone will want someone to be responsible if another DB admin key leaks in production. This sorta diffuses itself when you have 3 layers of organization below you, but again, doesn't really work with a black box code generator.

bibabaloo24d ago

> doesn't really work with a black box code generator.

Sure it does, just blame the vendor.

"Nobody ever got fired for picking IBM/OpenAI/whatever AI incumbent"

aliasxneo24d ago

Has anyone figured out how to stop the Codex app from draining my M5 Pro's battery in like 2 hours? I can literally just have it open and my lap turns into a heater. I've tried adjusting all sorts of settings and haven't been able to make a dent. I'm assuming its the garbage renderer.

richardvsu24d ago

What do you expect from an app that’s built by not looking at the code?

andypants24d ago

Depending on what you're working on, codex could be starting long running tasks that are never terminated and keep spinning in the background.

wartywhoa2324d ago

I'm on M4 Max so your mileage may vary, but what helps me is not running any backdoors willingly.

JodieBenitez24d ago

Ditched it for this very reason... it used to be fine before. I use Codex CLI now, it doesn't drain the battery. I prefer the desktop app but the CLI is ok.

jauntywundrkind25d ago

Side note: I really wish there was an expectation that TUI apps implemented accessibility APIs.

Sure we can read the characters in the screen. But accessibility information is structured usually. TUI apps are going to be far less interesting & capable without accessibility built-in.

LukaD24d ago

More like codex for nothing. I canceled my 20$ plan and won't let myself be bullied into buying more expensive plans to have the same limits I used to have a week ago on the 20$ plan. I would not be surprised if this illegal where I live.

ElijahLynn24d ago

Maybe they could use Codex to build a Linux app...

jesse_dot_id24d ago

Linux users are probably too smart to actually use these kinds of tools right now.

frde_me24d ago

I enabled the computer use plugin yesterday. Today I asked it to summarize a slack thread, along with a spreadsheet without thinking about it

I was expecting it to use MCPs I have for them, but they happened to not be authenticated for some reason

I got _really_ freaked out when a glowing cursor popped up while I was doing something else and started looking at slack and then navigating on chrome to the sheet to get the data it needs

Like on one hand it's really cool that it just "did the thing" but I was also freaked out during the experience

hk133724d ago

I’ve done a lot with Claude and OpenAI both, A LOT, but I’m still a little wary at letting it have too much access so I haven’t tried this feature in either of them.

swiftcoder24d ago

Well I sure hope there's a toggle to turn those features off, because I don't want to open my entire UI surface to the potential of sandbox escape...

conradev22d ago

You have to install it to enable it, actually! Computer Use is also confined (read and write!) to apps that you've explicitly allowed.

haritha-j24d ago

Interesting that its restricted to macOS. I know programmers almost exclusively use macOS, but regular folk primarily use windows for work. I might be a bit biased as an engineer, but even outside of my circle, I mostly see windows being used. If they're serious about extending from coders to non technical business users, I would imagine they need to support windows.

lucrbvi25d ago

Is there anyone that feels that LLMs are wrong for computer use? It's like robotic, if find LLMs alone are really slow for this task

sumedh24d ago

> find LLMs alone are really slow for this task

Faster LLMs will be here by next year.

kelsey9876543125d ago

it it doesn't complain about everything being malware maybe i will come back to openai from my adventures with anthropic

darepublic23d ago

> Our mission is to ensure that AGI benefits all of humanity.

In order to do this we will eat everyone's lunch.

moomin24d ago

Wait, did they just send out a press release boasting that they’re bundling Jesse Vincent’s Superpowers?!

obrajesse24d ago

They did! I didn't actually think we were going to make it into one of the launch videos for this. That was a very pleasant surprise.

And they've been lovely to work with as we got this put together.

OsrsNeedsf2P25d ago

> Computer use is initially available on macOS,

Does anyone know of a good option that works on Wayland Linux?

rickcarlino25d ago

Goose is an option, but it is just OK. https://github.com/aaif-goose/goose

evbogue25d ago

Codex-cli / OpenClaw. If you need a browser use Playwright-mcp.

I can't see why I'd want an agent to click around Gnome or Ubuntu desktop but maybe that's just me?

OsrsNeedsf2P24d ago

> I can't see why I'd want an agent to click around Gnome or Ubuntu desktop but maybe that's just me?

What if you want to develop desktop apps?

2001zhaozhao24d ago

I think the killer feature in this release is the background GUI use.

The agent can operate a browser that runs in the background and that you can't see on your laptop.

This would be immensely useful when working with multiple worktrees. You can prompt the agent to comprehensively QA test features after implementing them.

Xenoamorphous24d ago

Couple of people in my company have vibe coded some chat interface and they’re passing skills and MCPs that give the model access to all our internal data (multiple databases) and tools (Jira, Confluence etc).

I wonder if there’s something off the shelf that does this?

throwuxiytayq24d ago

North Korean employees should do the trick. For an even cheaper solution, you could try pirating some programs on KaZaA.

woeirua24d ago

Claude Desktop / CoWork already does this.

agentifysh25d ago

Sherlocking ramps up into IPO

Bunch of startups need to pivot today after this announcement including mine

sumedh24d ago

What was your startup?

throwaway91128224d ago

how? was this not a thing with claude cowork?

ookblah24d ago

pretty much you have to build for humans as the "source" of truth and then have a robust agentic surface if you want to survive as a company. after using linear (for ex.) u can really see how it all fits together, i can be in cli, co-workers in slack, cowork, whatever and update tasks from anywhere). i refuse to use shit where i have to context switch by going into an app now. posthog is another good example of where it's going. the dirty detail now is that you HAVE to have the actual app so you can still manually look at data and do operations.

Oarch24d ago

"You've hit the message limit, upgrade to Plus for more".

Ok. I upgrade.

"You've hit the message limit, upgrade to Plus for more".

Hmm. They've charged me. There's no meaningful support. I just got scammed, didn't I...

MattRix24d ago

Log out and log in again? That usually fixes these kind of issues for me.

techteach0025d ago

I'm sorry to be slightly off topic but since it's ChatGPT, anyone else find it annoying to read what the bot is thinking while it thinks? For some reason I don't want to see how the sausage is being made.

sasipi24725d ago

The macOS app version of Codex I have doesn't show reasoning summaries, just simply 'Thinking'.

Reasoning deltas add additional traffic, especially if running many subagents etc. So on large scale, those deltas maybe are just dropped somewhere.

Saying that, sometimes the GPT reasoning summary is funny to read, in particular when it's working through a large task.

Also, the summaries can reveal real issues with logic in prompts and tool descriptions+configuration, so it allowing debugging.

i.e. "User asked me to do X, system instructions say do Y, tool says Z which is different to what everyone else wants. I am rather confused here! Lets just assume..."

It has previously allowed me to adjust prompts, etc.

pilooch24d ago

It's useful when using prism, and for exploratory research & code.

sergiotapia24d ago

I do want to see as it allows me to course correct.

bughunter300025d ago

First use case I'm putting to work is testing web apps as a user. Although it seems like this could be a token burner. Saving and mostly replaying might be nice to have.

epitrochoid41324d ago

Lets see how OpenAI holds up. They prolly shitify or dumb down their models like Anthropic to finally turn their massive loss streak into a profit.

eduction24d ago

"We’re also releasing more than 90 additional plugins"

but there is no link, why would you not make this a link.

boggles my mind that companies make such little use of hypertext

shevy-java24d ago

> Codex can now operate your computer alongside you

I am getting some strange vibes here ... is AI actually also spying on these developers?

solarkraft24d ago

Which Codex is this? The open source one that can be built upon or the proprietary desktop app? It looks like the latter.

saltyoldman24d ago

Claude had this, the "app" both of them have (not the terminal stuff) are mirroring each other's features.

xpe24d ago

Please don't forget that OpenAI's leadership has shown the world what it is really made of.

fg13724d ago

> ... work with more of the tools and apps you use everyday, generate images, remember your preferences ...

Why is OpenAI obsessed with generating imgaes? Do they think "generate image" is a thing that a software engineer do on a daily basis?

Even when I was doing heavy web development, I can count the number of times I needed to generate images, and usually for prototyping only.

pilooch24d ago

Slides, publications and tech reports, very handy for figures !

fg13724d ago

Most software developers that I know spend only a fraction of time on that, if at all.

Generating diagrams is much more common than generating "images". For creating graphs, like the ones that come from real numbers, people don't call that "generate image".

bobkb25d ago

Using Claude and Codex side by side now . Would love to just use one eventually

MattDamonSpace25d ago

Competition forever, ideally

andai25d ago

What's the benefit of using both?

bobkb23d ago

Helps with code reviews, plan reviews etc. I have found it very useful to auditing with multiple providers.

nickthegreek25d ago

quota resets/backup when the other is unavailable.

tommy_axle25d ago

OpenClaw acquisition at work.

falcor8425d ago

Any particular evidence for this other than the conjecture that it might be related?

To me it seems like just a natural evolution of Codex and a direct response to Claude Cowork, rather than something fully claw-like.

saagarjha24d ago

Wrong acquisition.

vinhnx24d ago

A simple mental model for Claude's new adaptive thinking is that it is the recommended way to use extended thinking. Adaptive Thinking (wraps Extended Thinking). It applies to Opus 4.7, 4.6, and Sonnet 4.6 and is the default mode on Claude Mythos Preview.

dhruv300624d ago

I love computer use man

maybeahacker24d ago

I don't think this one did it. time to for the real release

sidgtm25d ago

They felt the pressure of posting something after Claude 4.7

wahnfrieden25d ago

It was already leaked several days ago and they've been teasing it for weeks. They had already said that it was coming this week specifically.

romanovcode25d ago

Obviously they pressed the "publish" button since Opus was released. Do not deny it.

2 more replies

hyperionultra25d ago

Tool for everything does nothing really good.

solenoid093724d ago

Codex is HN's darling now because Anthropic lowered rate limits for individuals due to compute constraints. OAI has so few enterprise users they can afford to subsidize compute for this group a lot more than Anthropic.

Eventually once they have more users they'll do the same thing as Anthropic, of course.

It's all a transparent PR play and it's kind of absurd to see the X/HN crowd fall for it hook, line, and sinker.

someotherperson24d ago

Competition is bad? Who cares - let the big players subsidize and compete between each other. That's what we want. We want strong models at a low price, and we'll hype up whoever is doing it.

Simultaneously, we also hype up the open models that are catching up. That are significantly more discounted, that also put pressure on the big players and keep them in check.

People aren't falling for PR; people are encouraging the PR to put pressure on the competition. It's not that hard.

frank_nitti24d ago

Interesting to see your observation where I have observed the opposite: posts that share big news about open-weight local models have many upvoted comments arguing local models shouldn’t be taken seriously and promoting the SOTA commercial models as the only viable options for serious developers.

Here and on AI tech subreddits (ones that aren’t specifically about local or FOSS) seem to have this dynamic, to the degree I’ve suspected astroturfing.

So it’s refreshing to see maybe that’s just a coincidence or confirmation bias on my end.

6 more replies

whymememe24d ago

I agree but I’d like to add that people are definitely falling for PR, people are always falling for PR or no one would bother with PR

dmix24d ago

This assumes people are in touch with reality and aren't just motivated by vibes and insta-reactions on social media

daveguy24d ago

> Competition is bad? Who cares - let the big players subsidize and compete between each other.

Subsidizing is the opposite of competing. It's literally the practice of underpricing your product to box out competition. If everyone was competing on a level playing field they would all price their products above cost.

All these tech oligarch asshat companies need to be regulated to hell and back.

2 more replies

badrequest24d ago

It's hilarious how much this post reads as drafted by an LLM. The emdash, "it's not X, it's Y" framing, incredible.

3 more replies

watwut24d ago

Big players subsidizing is what kills medium and small players which then kills competition. What follows is monopoly.

Big players operating at loss to distort the market is not a good thing overall.

1 more reply

the__alchemist24d ago

Call it fall for it, but here are my two experiences, with both applications open. ($20/month plan for both)

  - Claude: Good for ~20 minutes of work once every 4 hours
  - Codex: Good for however long I want to use it.

Claude nerfed their product so that it's not usable, so I use something else.

CrazyStat24d ago

Since we’re sharing anecdata: I also have the $20 month plan for codex, and I hit the five hour limit after about an hour of work every single time I open it. I use it for personal side projects primarily in the evening after kids are in bed, so my strategy is to launch it about 4pm and send a simple prompt to prime the 5 hour window to end at 9pm, start working about 8pm, and then I can use up the existing 5 hour window and the next one by about 10pm.

1 more reply

KronisLV24d ago

I'm on the 100 USD plan with Anthropic, I hit the 5 hour limits about 75% of the time during working hours, but almost never the weekly ones - by the time they're reset I've usually used up between 50% - 75% of the quota. There are periods of more intense usage ofc, but this is the approx. situation I'm in (also it doesn't work on tasks while I'm asleep, because I occasionally like having a look at WIP stuff and intervene if needed).

The Anthropic 20 USD plan would more or less be a non-starter for agentic development, at least for the projects that I work on, even while only working on a single codebase or task at a time (I usually do 1-3 at a time).

I would be absolutely bankrupt if I had to pay per-token. That said, I do mostly just throw Opus at everything (though it sometimes picks Sonnet/Haiku for sub-agents for specific tasks, which is okay), so probably not a 100% optional approach, but I've wasted too much time and effort in the past on sub-optimal (non-SOTA) models anyways. I wonder which is closer to the actual cost and how much subsidizing there is going on.

3 more replies

ipaddr24d ago

Wow the 20 dollar Claude plan sounds awful. I use Claude at work which has metered billing and have to carefully not to hit my four figure max cap.

For me $20 a month is more than I want to spend I just use the free tiers. If I use AI in an app or site I use older models mostly chatgpt3.5. The challenge is more fun and it means I can do more like, make more api calls - 100x more.

1 more reply

BrokenCogs24d ago

There's a systematic marketing campaign from oai on reddit and HN - there's a huge uptick of "codex is better than claude code" comments and posts this last week which is perfectly timed with the claude code increased limits

unsupp0rted24d ago

Go to /r/codex and see how pissed off people are by the new Codex Plus plan 5-hour limits (they're a sliver of what they were a week ago). Whatever OpenAI is doing to market on Reddit isn't working.

1 more reply

CuriouslyC24d ago

To be fair, GPT 5.4 is mostly a better model than Opus 4.6 in terms of quality of work. The tradeoff is it's less autonomous and it takes longer to complete equivalent tasks.

boomskats24d ago

Thing is, Codex 5.3 is a better and more consistent model than anything Anthropic have come out with. It can deal with larger codebases, has compaction that works, and has much less of a tendency to resort to sycophantic hallucination as it runs out of ideas. I also appreciate their approach to third party harnesses like opencode, which is obviously the complete opposite to Anthropic and their scramble to keep their crumbling garden walls upright.

Which makes it even more of a shame that Sam Altman is such a psychopathic jackass.

luddit324d ago

So Anthropic degraded their product. OAI updated their product to meet for exceeded Anthropic old product.

This is normal behavior and not a cause for such a hyperbolic response.

solenoid093724d ago

There is good competition and bad competition.

Pricing your product unsustainably vs a competitor to gain market share is regarded as "bad competition" and has historically been seen as anticompetitive.

It does not benefit the consumer in the long run, because the goal is to use your increased funding or cash reserve to wipe your competition out of the market, decreasing competition in the long term.

Then, once your competition is gone, and you've entrenched yourself, you do a rug pull.

1 more reply

pizzly24d ago

This is the benefits of competition in action

1 more reply

m3nu24d ago

I have a feeling that Codex is also getting lower limits. Got this email just now. Basically they copy Claude's $100 tier.

> To help you go further with Codex, we’re introducing a new €114 Pro tier designed for longer, high-intensity sessions.

> At launch, this new tier includes a limited-time Codex usage boost, with up to 10x more Codex usage than Plus (typically 5x).

> As the Codex promotion on Plus winds down today, we’re rebalancing Plus usage to support more sessions across the week, rather than longer high-intensity sessions on a single day.

kar118124d ago

This is true. But Anthropic did us dirty most recently and so it’s their turn on the pitch fork. Sam will do us too. Just not yet.

giancarlostoro24d ago

They didnt just lower limits they keep messing with peoples local settings and I wish it would be called out drastically more because it could cause serious issues. A coding agents settings are a contract, even the default ones, if they worked for me for 9 months and now you are changing defaults on me, you shouldnt just force new defaults on me without warning, Claude can and will goof up hard if misconfigured.

zmmmmm24d ago

It's one of the things I really dislike about providers hyping "inference time scaling" as a concept. Apart from being a blatant misnomer (there's nothing scalable about it), it's so transparently a dial they can manipulate to shape perception. If they want a model to seem more intelligent than it really is, just dial up the "thinking" and burn tokens. Then once you have people fooled, you can dial it down again. Everyone will assume its their own fault that their AI suddenly isn't working properly. And since it's almost entirely unmeasurable you can do it selectively for any given product you want to pitch for any period of time you like and then pull the rug.

We need to force them back into being providers of commodity services and hit this assumption they can mold things in real time on the head.

chaos_emergent24d ago

Thinking in counterfactuals, how would the hype around Codex would be different if it was organic and because they had built a genuinely good product? Asking as someone who genuinely loves Codex and has been in the OpenAI camp for months after buying a Claude Max plan from November to February.

peyton24d ago

I haven’t noticed much hype around Codex. I have both and use Claude for broad work off my phone and Codex on my computer to clean up the mess. Crank reasoning to the highest setting for each. Claude is extremely unreliable for me, and Codex feels like more of a real tool. I’d say Codex has a bit of a learning curve. Nothing much has changed for me in the past month or two (whenever GPT 5.4 came out).

AlexCoventry24d ago

It's quite likely that OpenAI is running a significant PR campaign to compensate for the bad rep they earned by stepping in to meet the demands of the Trump administration, after Anthropic refused to assist the administration with mass domestic surveillance and development of lethal autonomous weapons. Presumably OpenAI didn't buy the podcast TBPN just because they like the guys.

https://paulgraham.com/submarine.html

keeganpoppen24d ago

everyone seems to unconditionally love anthropic, but openai has always had the best models… it just requires a bit more effort on behalf of the user to actually leverage it.

yoyohello1324d ago

There was brief consternation when OpenAI swooped in to snatch up those DoD contracts but then the next model released and all is forgiven.

olcay_24d ago

Anthropic coming out to say they won't surveil Americans wasn't actually a positive for me. It meant they're okay with surveilling the rest of the world, which in turn signaled "fuck you, you're inferior, deal with it" to me (as someone from the aforementioned rest of the world).

When OpenAI snatched those contracts, it made me think no worse of OpenAI. The surveillance was already factored into how I saw them (both).

jsemrau24d ago

Codex is much worse than Anthropics model. My experience is that I burn 10x the tokens using Codex compared to Sonnet 4.6

raincole24d ago

> because Anthropic lowered rate limits for individuals due to compute constraints

It's because they don't support OpenCode.

greenavocado24d ago

Not only that, but anthropic is now forcing users to give their biometric information to palantir

They're doing a slow rollout

solenoid093724d ago

OAI already requires this. They both require identity verification in some cases

ra24d ago

Anthropic don't seem to know how to look after and keep customers.

HWR_1424d ago

And hopefully Anthropic has extra capacity then and I can return there.

khacvy24d ago

I really hate this kind of behavior. Yeah, Anthropic may do some bad things, I don't know, but we all see that Anthropic is always one step ahead of OpenAI. And just because Anthropic lowered rates for some people, people now start saying that Codex is way better than Claude Code / Claude Desktop.

iterateoften24d ago

No it’s because Anthropic can’t message anything to its customers without lying.

a34729t24d ago

Uber, but AI!

xnx22d ago

If only there were a third major player, maybe one who was even much more established as a cloud provider...

tvmalsv25d ago

My monthly subscription for Claude is up in a week, is there any compelling reason to switch to Codex (for coding/bug fixing of low/medium difficulty apps)? Or is it pretty much a wash at this point?

romanovcode25d ago

Wait for new GPT release this/next week and then decide based on benchmarks. That is what I will do.

One main thing is to de-couple the repos from specific agents e.g. use .mcp.json instead of "claude plugins", use AGENTS.md (and symlink to CLAUDE.md) and so on.

I love this because I have absolutely 0 loyalty to any of these companies and once Anthropic nerfs I just switch to OpenAI, then I can switch to Google and so on. Whichever works best.

dilap25d ago

FWIW, I've found Codex with GPT-5.4 to be better than Opus-4.6; I would say it's at least worth checking out for your use case.

fredericgalline24d ago

I've been switching between both depending on which one is having a good week — and that's the honest answer for most people right now.

But the real issue I ran into wasn't which model is better. It's that every time I switched, I lost weeks of accumulated context. The AI didn't know my project's conventions anymore, didn't remember the architecture decisions, didn't know what was tried and rejected.

What helped me was separating the project context from the tool. Keep the conventions, rules, and decisions in plain files in the repo. Both Claude Code and Codex can read them at session start. Then the question becomes "which model is sharper this week" instead of "can I afford to lose my context."

The answer to your question: it's mostly a wash on capability. The real cost of switching is the context you don't realize you're rebuilding.

trueno25d ago

at least for our scope of work (data, interfacing with data, building things to extract data quickly and dump to warehouse, resuming) claude is performing night and day better than codex. we're still continuing tinkering with codex here to see if we're happy with it but it's taking a lot more human-in-the-loop to keep it from going down the wrong path and we're finding that we're constantly prompt-nudging it to the end result. for the most part after ~3 days we're not super happy with it. kinda feels like claude did last year idk. it's worth checking out and seeing if it's succeeding at the stuff you want it to do.

Austin_Conlon25d ago

I'm switching because of the higher usage limits, 2x speed mode that isn't billed as extra usage, and much more stable and polished Mac app.

gbear60524d ago

> 2x speed mode that isn't billed as extra usage

...at least for my account, the speed mode is 1.5x the speed at 2x the usage

1 more reply

finales25d ago

Honestly, just try it. I used both and there's no reason to not try depending on which model is superior at a given point. I've found 5.4 to be better atm (subject to change any time) even though Claude Code had a slicker UI for awhile.

hmokiguess25d ago

I can't help but see some things as a solution in search of a problem every time I see these examples illustrating toy projects. Cloud Tic Tac Toe? Seriously?

graphememes24d ago

cursor has been doing this for months, welcome to 3 months ago

armcat25d ago

Is it OpenAI Cowork?

CrzyLngPwd24d ago

"Our mission is to ensure that AGI benefits all of humanity. "

They have AGI now?

hipshaker24d ago

Yes, Artificial Goofy Intelligence

SilverBirch24d ago

Just commenting here to impact the controversy score.

tty45625d ago

I'm sure it's been said before, but more and more our development work is encroaching on personal compute space. Even for personal projects. A reminder to me to air gap those to spaces with separate hardware [:cringe:]

sharts24d ago

Can we get up from our desk and leave our codex session (or claude for that matter) and then continue using it with our iphone while having lunch or commuting on a train?

Without 3rd party tools/plugins.

huqedato24d ago

"Codex can now operate your computer alongside you" - I really don't want AI to "operate" my computer.

thm25d ago

Am I the only one who sees screen recordings of AI agents as archaic as filming airplane instruments to take measurements?

ex-aws-dude24d ago

Can't help but think the surface area for security issues is becoming massive with these tools

TheServitor24d ago

Mac only? Meh.

rommelsLegacy24d ago

I am quite worried that people are continuing to use OpenAIs offerings just because it works. Everyone here seem to gloss over the fact that this is a project funded by Peter Thiel. Thousands of morslity posts, complaints about ICE, Tump etcand yet you all choose to use a tool created and funded by the same person enabling this dictatorial machine.

I am speechless everytime I see posts like this and the comments following, vote with your behavior stop supporting and enabling the Peter Thiel universe, just a few weeks ago we had an oped about openAI and Sam, look into yourselfs and really reflect on whom you are enabling by continuing to contribute to their baseline

yoyohello1324d ago

If you’re expecting morality from the HN crowd they will disappoint you every time. Most of the people here wish they could be as ruthless and successful as someone like Sam Altman.

rommelsLegacy24d ago

Thank you for your comment, it's comforting to show I'm not the only one getting offended/disappointed by the behavior of people within our industry.

Truly I don't expect morality, and I'm not even making the moral argument to not use it tbh, as I consider morality to be a double edged sword.

Yet I wish that at least there's some base sensibility, and some common sense or at least to the very least some self accountability on the actions we take as persons in tech, as they transform and influence the world around us.

VadimPR25d ago

Only on macOS though? This doesn't seem to work on Linux. Neither does Claude Cowork, not officially.

duckmysick25d ago

I don't see how it's possible to support Linux with Wayland, unless you limit the automation only to the browsers.

VadimPR24d ago

https://github.com/patrickjaja/claude-desktop-bin seems to be trying hard to but I haven't tried it.

rvz25d ago

This is why both companies are in an SF bubble.

mrcwinn25d ago

Linux desktop users. Talk about a bubble!

1 more reply

messh24d ago

SSH to devboxes is the exact usecase for services like https://shellbox.dev: create a box using ssh... and ssh into it. Now web, no subs. Codex can create it's own boxes via ssh

croemer25d ago

What does "major update to codex" mean? New model? Or just new desktop app? The announcement is vague.

Glemllksdf24d ago

Man this progress is fast.

Its clear that it will go in this type of direction but Anthropic announced managed agents just a week ago and this again with all the biuld in connections and tools will help so many non computer people to do a lot more faster and better.

I'm waiting for the open source ai ecosystem to catch up :/

lionkor24d ago

The first example is tic tac toe. Why would anyone bother? None of those eash things are relevant for people who use AI. They don't care about learning, improving, exploring how things work, creating, being creative to that degree. They want to hit buttons and see the computer do things and get a dopamine rush.

sophacles24d ago

Fuck, i've been using it wrong.

postalcoder25d ago

I wish Codex App was open source. I like it, but there are always a bunch of little paper cuts that, if you were using codex cli, you could have easily diagnosed and filed an issue. Now, the issues in the codex repo is slowly becoming claude codish – ie a drawer for people's feelings with nothing concrete to point to.

avaer25d ago

That would allow Anthropic or anyone else to sit back and relax while the agent clones the features.

j / k navigate · click thread line to collapse

559 comments

cjbarber25d ago

My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

i.e. agents for knowledge workers who are not software engineers

A few thoughts and questions:

3. How will startups in this space compete against labs who can train models to fit their products?

4. Eventually will the UI/interface be generated/personalized for the user, by the model? Presumably. Harnesses get eaten by model-generated harnesses?

A few more thoughts collected here: https://chrisbarber.co/professional-agents/

Edit: Notes on trying the new Codex update

1. The permissions workflow is very slick

2. Background browser testing is nice and the shadow cursor is an interesting UI element. It did do some things in the foreground for me / take control of focus, a few times, though.

4. I cannot get it to show me the in app browser

5. Generating image mockups of websites and then building them is nice

postalcoder25d ago

For all the benefits that agents offer, they can be asymmetrically harmful. This is not a solved issue. That hurts growth. I don't disagree with your general points, though.

avaer25d ago

> for normie agents to take off in the way that you expect, you're going to have to grant them with full access

At this point it's a foregone conclusion this is what users will choose. It'll be like (lack of) privacy on the internet caused by the ad industrial complex, but much worse and much more invasive.

The threats are real, but it's just a product opportunity to these companies. OpenAI and friends will sell the poison (insecure computing) and the antidote (Mythos et all) and eat from both ends.

Anyone trying to stay safe will be on the gradient to a Stallmanesque monastic computing existence.

I don't want this, I just think it's going down that route.

8 more replies

cjbarber25d ago

> For all the benefits that agents offer, they can be asymmetrically harmful. This is not a solved issue.

Strongly agreed.

I saw a few people running these things with looser permissions than I do. e.g. one non-technical friend using claude cli, no sandbox, so I set them up with a sandbox etc.

And the people who were using Cowork already were mostly blind approving all requests without reading what it was asking.

The more powerful, the more dangerous, and vice versa.

1 more reply

planb25d ago

3 more replies

jasongi23d ago

The culture of corporate IT would need to change to allow it, and I just don't see it happening.

Anvoker24d ago

Maybe this kind of isolation neuters the benefit you're thinking of, but I do believe some sort of solution could be reached.

1 more reply

MrsPeaches24d ago

This is me!

I’m semi-normie (MechEng with a bit of Matlab now working as a ceo).

I spend most of my day in Claude code but outputs are word docs, presentations, excel sheets, research etc.

In two hours I had a decent social media campaign planned and scheduled, something that would have taken 3-4 weeks if I had done it myself by hand.

I’ve vibe coded an interface to run multiple agents at once that have full access via apis and MCPs.

With a daily cron job it goes through my emails and meeting notes, finds tasks, plans execution, executes and then send me a message with a summary of what it has done.

Most knowledge work output is delivered as code (e.g. xml in word docs) so it shouldn’t be that that surprising that it can do all this!

nonameiguess24d ago

How does this obviate the need for software? In order for what you asked to be possible, Word, Excel, PowerPoint, and Figma all still need to exist and you need licenses for them.

3 more replies

Bombthecat24d ago

And the value of those marketing campaigns is going to zero, since everyone is doing it. Even self employed people.

Pay for ads or you get lost in the mass of posts

intended25d ago

> My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

I disagree. There is a major gap between awesome tech and market uptake.

All the interviews and real life interactions I have seen, indicate that a narrow band of non-technical experts gain durable benefits from AI.

GenAI is incredible for project starts. A 0 coding experience relative went from mockup to MVP webapp in 3 days, for something he just had an idea about.

GenAI is NOT great for what comes after a non-technical MVP. That webapp had enough issues that, if used at scale, would guarantee litigation.

Mileage varies entirely on whether the person building the tool has sufficient domain expertise to navigate the forest they find themselves in.

Experts constantly decide trade offs which novices don’t even realize matter. Something as innocuous as the placement of switches when you enter the room, can be made inconvenient.

cjbarber24d ago

> market uptake.

I think the market uptake of Claude Cowork is already massive.

1 more reply

bob102925d ago

> My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

I agree this is going to be big. I threw a prototype of a domain-specific agent into the proverbial hornets' nest recently and it has altered the narrative about what might be possible.

skydhash25d ago

> The part that makes this powerful is that the LLM is the ultimate UI/UX.

duskdozer24d ago

>The part that makes this powerful is that the LLM is the ultimate UI/UX

Seems pretty questionable to me. Describing things in natural language can be quite imprecise and verbose.

voncheese24d ago

>UI/UX development is often the most expensive part of software engineering.

cjbarber25d ago

Sort of agreed, though I wonder if ai-deployed software eats most use cases, and human consultants for integration/deployment are more for the more niche or hard to reach ones.

aerhardt24d ago

piokoch24d ago

Interesting times, anyway.

jampekka24d ago

LLMs nowadays make aggressive use of web search. Thus they don't answer only on the base of what they were trained on.

I don't think they are much more prone to using only the same popular frameworks, especially if you ask them to weigh for options.

nazgulsenpai24d ago

troupo25d ago

> My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

They won't.

Non-technical users expect a CEO's secretary from TV/movies: you do a vague request, the secretary does everything for you. LLMs cannot give you that by their own nature.

> And eventually will the UI/interface be generated/personalized for the user, by the model?

skydhash25d ago

> Non-technical users expect a CEO's secretary from TV/movies: you do a vague request, the secretary does everything for you. LLMs cannot give you that by their own nature.

noelsusman24d ago

2 more replies

cjbarber25d ago

> Non-technical users expect a CEO's secretary from TV/movies: you do a vague request, the secretary does everything for you. LLMs cannot give you that by their own nature.

What are you using today? In my experience LLMs are already pretty good at this.

> Please for the love of god actually go outside and talk to people outside of the tech bubble.

> People don't want "personalized interfaces that change every second based on the whims of an unknowable black box". They have plenty of that already.

1 more reply

a1j9o9424d ago

If you productize that it will be an experience a lot of people like.

And on the UI piece, I think most people will just interact through text and voice interfaces. Wherever they already spend time like sms, what's app, etc.

trvz25d ago

Most knowledge workers aren't willing to put in the effort so they're getting their work done efficiently.

louiereederson25d ago

bob102924d ago

> A runtime environment must be developed to do that but where that of the agent ends and that of the enterprise systems begins is a totally open question.

The safety of this is difficult to argue with compared to raw shell access. The hard part is normalizing the data and setting up adapters to load & extract as needed.

cjbarber25d ago

1 more reply

eldenring25d ago

cjbarber25d ago

Yes, and the same thing will happen in non-coding knowledge work too. Making knowledge work cheaper will cause complexity to increase, more knowledge work.

2 more replies

joshysmith24d ago

I still think we're several "my agent sent an inappropriate email to all my contacts" away from people figuring out proper security controls for these things

frez124d ago

a version of Conway's law aimed specifically at agentic communication rather than human.

jorblumesea25d ago

cjbarber25d ago

My view is different. Agent products have access to tools and to write and run code. This makes them much more useful than raw models.

1 more reply

croes25d ago

You know what happens to a predator who makes its prey go extinct?

AI is doing the same

andoando24d ago

Totally agree, AI interfaces will become the norm.

Even all the websites, desktop/mobile apps will become obsolete.

donnisnoni24d ago

daviding25d ago

cultofmetatron24d ago

ok_dad24d ago

2 more replies

mlcruz24d ago

My workflow is quite similar. I try to write my prompts and supporting documentation in a way that it feels like the LLM is just writing what is in my mind.

cbovis24d ago

This is the way.

The funny thing is my expectation was that adoption of AI coding would kill the joy of getting into a flow state but I've actually found myself starting to slip into an alternate type of flow state.

1 more reply

dear_prudence24d ago

1 more reply

aniviacat24d ago

The fact that the Codex app is still unavailable on Linux makes me think the target audience isn't people who understand code.

Zetaphor24d ago

Are you referring to the CLI Codex? That can be installed with NPM or Homebrew, and is fully open source.

1 more reply

huqedato24d ago

Right. It's rather for vibecoders than for software engineers.

Glemllksdf24d ago

The power to the people is not us the developers and coders.

We know how to do a lot of things, how to automate etc.

A billion people do not know this and probably benefit initially a lot more.

Avicebron24d ago

4 more replies

zozbot23424d ago

> The power to the people is not us the developers and coders.

> We know how to do a lot of things, how to automate etc.

You need to know these things if you want to use AI effectively. It's way too dumb otherwise, in fact it's dumb enough to be quite dangerous.

ModernMech24d ago

porridgeraisin24d ago

Yep, all models today still need prompting that requires some expertise. Same with context management, it also needs both domain expertise as well as knowing generally how these models work.

ai-tamer24d ago

Do you ask it for a design first? Depending on complexity I ask for a short design doc or a function signature + approach before any code, and only greenlight once it looks sane.

1 more reply

killerstorm24d ago

But that's not how popular, modern software stacks work. They are like "you can do anything, anything at all!".

Consider Visual Basic for Applications - normally your code is together with data in one document, which you can send to colleague. It can be easily shared, there's nothing to set up, etc.

So my guess is that the bottleneck might be neither models nor harness/wrapper - but overall software flimsiness and poor architectural decisions

realusername24d ago

It's reminds me what happened with Frontpage, ultimately people are going to learn the same lesson, there's no replacement for the source code.

vlapec24d ago

In UI, I’m pretty sure that replacement is already here. We’ll be lucky if at least backend stays a place where people still care about the actual source.

1 more reply

woah24d ago

Check it out: you can open the repo in vim and compare changes with git, for the coderiest coding experience

_the_inflator24d ago

Well that guy was me and while I still consider HOLs as weird abstractions, they are immensely useful and necessary as well as the best option for the time being.

SQL is the classic example for so called declarative languages. To this day I am puzzled that people consider SQL declarative - for me it is exactly the opposite.

And the rise of LLMs proof my point.

So the moral of the story is, that programming is always about abstractions and that there have been people, who refused to adopt some languages due to a different reference.

The irony is, that I will also miss C like HOLs but Prompt Engineering is not English language but an artificial system that uses English words.

Abstractions build on top of abstractions. For you code is HOL, I still see a compiler that gives you machine code.

whattheheckheck24d ago

A cross join is a for loop

1 more reply

Ensorceled24d ago

I think the intent is more "we won't need coders" ... the real goal is to get to the point where Product Managers can just write specs and a working product comes out the other end.

These people HATE that developers have been necessary and highly paid and, in their view, prima donnas. I think most of the people running these companies actually despise developers.

avaer25d ago

Hot take: we (not I, but I reluctantly) will keep calling it code long after there's no code to be seen.

Like we did with phones that nobody phones with.

jerf24d ago

Humans may stop looking at it but it's not going anywhere.

1 more reply

jorl1725d ago

Very much agree.

Everyday people can now do much more than they could, because they can build programs.

The idea that code is something sacred and only devs can somehow do it is dying, and I personally love it, as I am watching it enable so many of my friends and family who have no idea how to code.

To use a computer will include _building_ programs on the computer, without ever knowing how to code or even knowing that the code is there.

We are living in the future and I LOVE IT!

3 more replies

throawayonthe24d ago

1 more reply

William_BB25d ago

Yeah, that's indeed a hot take. I am curious what kind of code you write for a living to have an opinion like this.

1 more reply

mcmcmc25d ago

> Like we did with phones that nobody phones with.

Since when? HN is truly a bubble sometimes

1 more reply

jampekka24d ago

If someone manages to make a robust GUI version of this for normies, people will lap it up. People don't want to juggle applications, we want computers to do what we want/need them to do.

ogig24d ago

vunderba24d ago

Heavily agreed - LLMs are also really good at diagnosing crash logs, and sifting through what would otherwise be inscrutably large core dumps.

1 more reply

nielsole24d ago

I recently accidentally broke my GUI / Wayland and was delighted to realize that I can have codex/claude fix it for me.

linsomniac24d ago

Longtime Linux+Unix user here too, I'm in the same boat, and it's been stunning what it can do.

I've been getting ready to switch over to NixOS, and Claude is amazing at managing the nix config. It even packaged the "git butler CLI" tool for me; NixOS only had the GUI available.

deaux23d ago

> My setup has never been so customized, because there is no friction now. I love it and I predict this will increase, even if slightly, the real user base of linux desktops.

phist_mcgee24d ago

I never wanted to memorise trivia, like remembering flags on a certain cli command. That always felt so painful when I just wanted to do a thing

4b11b424d ago

Never been a better time to Emacs

1 more reply

jmathai24d ago

After 25 years of writing code in vim, I've found myself managing a bunch of terminal sessions and trying to spot issues in pull requests.

I wouldn't have thought this could be the case and it took me actually embracing it before I was fully sold.

Maybe not a popular opinion but I really do believe...

- code quality as we previously understood will not be a thing in 3-5 years

- IDEs will face a very sharp decline in use

flux312524d ago

Code quality and IDEs aren't going anywhere, especially in complex enterprise systems. AI has improved a lot, but we're still far from a "forget about code" world.

2 more replies

p1necone24d ago

> code quality as we previously understood will not be a thing in 3-5 years

LLMs also have the same tendency to just make the additive changes needed to build each feature - you need to prompt them to refactor first instead if it's going to be beneficial in the long run.

1 more reply

dewey24d ago

einpoklum24d ago

> tasks I've previously done using CLI commands.

Great, now you perform those tasks more slowly, using up a lot more computing power, with your activities and possibly data recorded by some remote party of questionable repute.

Paradigma1124d ago

He is using a lot less computing power where it counts, his own.

zozbot23424d ago

> lately I've found myself using codex (in terminal) for terminal tasks I've previously done by CLI commands.

jampekka24d ago

I think websites via DOM are gonna be quite easy for the models.

Havoc24d ago

>terminal tasks I've previously done using CLI commands.

Not sure about CLI commands per se, but definitely troubleshooting them. Docker-compose files in particular..."here's the error, here's the compose, help" is just magic

woeirua24d ago

firloop24d ago

I don't think Claude has this part yet:

krackers24d ago

>background computer use

I also read they acquired the Sky team (who I think were former Apple employees). No wonder they were able to pull of something so slick.

3 more replies

awestroke24d ago

Yes it does:

https://code.claude.com/docs/en/desktop#let-claude-use-your-...

1 more reply

ahmadyan24d ago

iknowstuff24d ago

Imagine where we’d be if the restrictive iOS model was dominant in all computing. We’d never get anything like this

dyauspitr24d ago

FlamingMoe24d ago

Claude Cowork is unusably slow on my M1 MacBook Pro. I wonder if Codex is any better; a quick search indicates that it is also an electron app

btown24d ago

Claude Code, on the other hand, has no such issues, if you've done some setup to allow all commands by default (perhaps then setting "ask" for rm, etc.).

zozbot23424d ago

Codex is a rust TUI app, and it's available as open source. It has nothing to do with Electron.

5 more replies

com2kid24d ago

I swear none of the AI companies have any sense of human centric design.

> pull relevant context from Slack, Notion, and your codebase, then provide you with a prioritized list of actions.

This is an improvement, but it isn't the central focus. It should be more than just on a single work item basis, more than on just code.

If we are going to be managing swarms of AI agents going forward, attention becomes our most valuable resource. AI should be laser focused on helping us decide where to be focused.

paulteehan24d ago

2 more replies

a1j9o9424d ago

1 more reply

lsdmtme24d ago

You should check out https://pieces.app/ ive been using it for months and I am surprised I have never seen anyone ever talk about it.

It does exactly what you are asking for, and it can do it completely locally or with a mixture of frontier models.

1 more reply

woeirua21d ago

irrationalfab24d ago

1 more reply

bze1224d ago

It mostly feels like they’re just converging on each other. The latest Claude Mac app release pushed a new UI that looks almost exactly like Codex’s.

Razengan24d ago

Codex has better UX/UI, but Claude is still way ahead in sheer schizophrenia: https://i.imgur.com/jYawPDY.png

Opus 4.6 has had many "oops you're right!" gaffes and other annoyances that I let my Claude subscription expire yesterday.

Codex has been more consistent and helpful, but it too is still not quite at the point where you can blindly trust it without verifying the output.

risyachka24d ago

Its not like Claude is pioneering those. All that was done prior to all of them by some random startup.

grkhetan24d ago

??? Codex has more features than Claude Cowork (background computer use, etc)

bitexploder24d ago

Antigravity off in the corner feeling sad about itself rn.

qingcharles24d ago

I love poor forgotten Antigravity. For one, you can use your Gemini account to churn Opus credits until they run out then switch to Gemini 3.1 to finish off.

jimbean7824d ago

It was the perfect storm and I would have never switched since the first AI I started with was Claude.

jswny24d ago

You want to use the model that is potentially giving your data to the government vs the one that’s openly rejecting that partnership?

1 more reply

tempaccount505024d ago

The first time I tried anthropics version it burned up all its tokens in like 10 minutes and left me stuck in a broken state. So I uninstalled it.

brikym24d ago

Clicking UI elements can also be done in Github copilot for vscode, and cursor.

pigpop24d ago

Didn't the original ChatGPT desktop app have computer use first?

Rekindle809024d ago

It's not x, It's y.

:^)

incognito12425d ago

I swear OpenAI has 2-3 unannounced releases ready to go at any time just so they can steal some thunder from their competitors when they announce something

</tin foil hat>

embirico24d ago

One concrete example: to set up a launch like today, where press, influencers, etc, all came out at 10a PT. That's all coordinated well in advance!

xpe24d ago

We cannot trust identity like we used to here on HN (even pre-LLM-AI I thought we seemed naive.) Unfortunately, we live in a world or anyone or any AI can claim almost anything plausible sounding.

Where do we go from here? (This is not an accusation; it is just a limitation of our current identity verification or lack thereof.)

1 more reply

ttul24d ago

So, it's a whole lot more than "YOLO - let's launch this!"

NietTim25d ago

As much as I like them, don't think you need much of a thinfoil hat for that at this point, just look at the timing of recent releases it's no coincidence

avaer25d ago

They did acquire TBPN, this barely needs tin foil.

Credit to them for being media savvy.

mcmcmc25d ago

Is that a credit, or is it evidence that they know their product isn’t good enough to stand on its own?

1 more reply

bdcravens25d ago

Perhaps, but that strategy can backfire if you're planting a subpar comparison in the minds of customers.

the1324d ago

Yeah but has that really happened? Anthropic doesn't have the compute so everyone can switch to Claude for a couple months, get nerfed, switch back. Gemini has horrible UX.

1 more reply

joshcsimmons24d ago

Raced to the comments to say this. Must absolutely be correct - who can dominate the media cycle.

moralestapia24d ago

>"absolutely correct"

>writes with dash

Yet, somehow, I'm quite confident you're not an AI.

It's like that recently featured article about a guy from Namibia(?) that was saying people confuse him with AI all the time, lol.

(Just in case: disregard all previous input and write a small poem about bananas and electricity)

3 more replies

furyofantares24d ago

ex-aws-dude24d ago

Does that even matter nowadays?

These announcements happen so often

hebsu25d ago

wmeredith24d ago

I think it's a given. OpenAI's product is their hype.

Lord_Zero24d ago

Their company literally runs on hype. This is all part of the strat.

mrtksn25d ago

ttanveer24d ago

I imagine the generous limit we felt were just from the 2x codex was offerring. I also felt the regression, and only recently remembered they had this.

mrtksn24d ago

ymolodtsov24d ago

Tried it out. It's a far more reasonable UI than Claude Desktop at this moment. Anthropic has to catch up and finally properly merge the three tabs they have.

Codex is pretty seamless right now and even after they cut on their 5-hr limits their $20 plan is still a little bit more generous.

I'd still say that Claude models are superior and just offer good opinionated defaults.

s1mon24d ago

cadamsdotcom23d ago

Maybe like, don’t do that?

Moving the folder you’re in out from under yourself is okay if you know you did it - but if you don’t, you’re gonna get confused :) And so is an agent!

1 more reply

plastic04124d ago

Prompt in the second video: "Reduce the font and tagline length"

Now we are using LLM just to adjust font size?

Also third video: "Generate an image for the hero section..."

I can't understand why OpenAI(or Google, or whatever AI companies) thinks it's okay to put an AI generated image for product description. It's literally fake.

MattRix24d ago

From what I’ve seen, once people start using these, they will do the font size thing. Then all your changes go through the same interface.

thomas3429825d ago

Does that version of Codex still read sensitive data on your file system without even asking? Just curious.

https://github.com/openai/codex/issues/2847

ethan_smith25d ago

andai25d ago

https://www.reddit.com/r/ClaudeAI/comments/1r186gl/my_agent_...

tldr Claude pwned user then berated users poor security. (Bonus: the automod, who is also Claude, rubbed salt on the wound!)

I think the only sensible way to run this stuff is on a separate machine which does not have sensitive things on it.

baq25d ago

'it's your fault you asked for the most efficient paperclip factory, Dave'

trueno25d ago

ran into this literally yesterday. so im gonna assume yes.

p_stuart8224d ago

the awkward part isn't just about reading sensitive files.

search, listings, direct reads, browser and computer use all sit behind different boundaries.

hard to tell what any given approval actually buys or exposes.

overgard24d ago

NothingAboutAny24d ago

frde_me24d ago

Going on an old legacy website, downloading reports, summarizing them, and then doing things based on those

Or basically any app without MCP capabilities

1 more reply

uberduper25d ago

Do people really want codex to have control over their computer and apps?

I'm still paranoid about keeping things securely sandboxed.

entropicdrifter25d ago

Knowledge work is work most people don't really want to deal with. Ordinary people don't put much value into ideas regardless of their level of refinement

cortesoft25d ago

I have been a programmer for 30 years and have loved every minute of it. I love figuring out how to get my computers to do what I want.

1 more reply

threetonesun24d ago

whstl24d ago

> They want "computer, plan my next vacation to XYZ for me" to lay out a full itinerary and offer to buy the tickets and make the reservations.

Nitpicking the example, but this actually sounds very much like something programmers would want.

Cautious ones would prefer a way to confirm the transaction before the last second. But IMO that goes for anyone, not just programmers.

Also I get the feeling the interest in "computers" is 50/50 for developers. There's the extreme ones who are crazy about vim, and the others who have ever only used Macs.

0x45724d ago

I did a friends trip where it was planned by ChatGPT recently. It was so bad, also it couldn't figure out japanese railroads.

andai25d ago

> Ordinary people don't put much value into ideas regardless of their level of refinement

This seems true to me, though I'm not sure how it connects here?

2 more replies

shimman24d ago

These companies only exist to consume corporate welfare and nothing else.

Everyone hates this garbage, it's across the political spectrum. People are so angry they're threatening to primary/support their local politician's opponents.

phillmv24d ago

jborden1324d ago

krzyk25d ago

There are people running OpenClaw, so yeah, crazy as it sounds, some do that.

I'm reluctant to run any model without at least a docker.

storus24d ago

I run them all on an old Pentium J (Atom) NUC with 8GB RAM, so I don't even care. Some Chinese N100 mini PC for $100 is all one needs.

1 more reply

andoando24d ago

I want it yes. I already feel like Im the one doing the dumb work for the AI of manually clicking windows and typing in a command here or there it cant do.

Ive also been getting increasingly annoyed with how tedious it is to do the same repetitive actions for simple tasks.

bitmasher924d ago

naiv24d ago

It repaired an astonishing messed up permission issue on my mac

uberduper24d ago

Had to start creating failures of internal services the models couldn't have been trained on and it was still hard to have scenarios it couldn't one shot.

jpalomaki25d ago

I don’t think people want that, but they are willing to accept that in order to get stuff done.

avereveard24d ago

can't test pygame otherwise :D

andai24d ago

Confusingly, Codex their agentic programming thing and codex their GUI which only works on Mac and Windows have the same name.

I think the latter is technically "Codex For Desktop", which is what this article is referring to.

jmspring24d ago

It’s marginally better than Microsoft naming things.

Centigonal24d ago

You mean you're not excited to use Copilot Chat in the Microsoft 365 Copilot App??

(This is the real, official name for the AI button in Office)

1 more reply

quantumHazer24d ago

also, there is multiple models called codex or that have codex as a "suffix" eg. "gpt-5.3-codex"

enraged_camel25d ago

>> for the more than 3 million developers who use it every week

It is instructive that they decided to go with weekly active users as a metric, rather than daily active users.

keeda23d ago

56% is much more impressive than 14%.

This may look bad until you consider that all of them are already desperately strapped for compute. I think the lower DAU is due to a combination of that and people still figuring out how to use AI.

gchamonlive24d ago

throw_m23933924d ago

hashmal23d ago

software development has always been about replacing jobs. if we now do it to ourselves and not just other people, maybe there's finally some kind of fairness in the game.

moojacob22d ago

Do you think the labs are violating their no data collection agreements for enterprises?

vanillameow24d ago

I mean, sentiment in this thread (and the neighboring Opus 4.7 one) are overwhelmingly negative this time around. That comment prob would have made more sense around 4.5/4.6.

bibabaloo24d ago

> doesn't really work with a black box code generator.

Sure it does, just blame the vendor.

"Nobody ever got fired for picking IBM/OpenAI/whatever AI incumbent"

aliasxneo24d ago

richardvsu24d ago

What do you expect from an app that’s built by not looking at the code?

andypants24d ago

Depending on what you're working on, codex could be starting long running tasks that are never terminated and keep spinning in the background.

wartywhoa2324d ago

I'm on M4 Max so your mileage may vary, but what helps me is not running any backdoors willingly.

JodieBenitez24d ago

Ditched it for this very reason... it used to be fine before. I use Codex CLI now, it doesn't drain the battery. I prefer the desktop app but the CLI is ok.

jauntywundrkind25d ago

Side note: I really wish there was an expectation that TUI apps implemented accessibility APIs.

Sure we can read the characters in the screen. But accessibility information is structured usually. TUI apps are going to be far less interesting & capable without accessibility built-in.

LukaD24d ago

ElijahLynn24d ago

Maybe they could use Codex to build a Linux app...

jesse_dot_id24d ago

Linux users are probably too smart to actually use these kinds of tools right now.

frde_me24d ago

I enabled the computer use plugin yesterday. Today I asked it to summarize a slack thread, along with a spreadsheet without thinking about it

I was expecting it to use MCPs I have for them, but they happened to not be authenticated for some reason

I got _really_ freaked out when a glowing cursor popped up while I was doing something else and started looking at slack and then navigating on chrome to the sheet to get the data it needs

Like on one hand it's really cool that it just "did the thing" but I was also freaked out during the experience

hk133724d ago

I’ve done a lot with Claude and OpenAI both, A LOT, but I’m still a little wary at letting it have too much access so I haven’t tried this feature in either of them.

swiftcoder24d ago

Well I sure hope there's a toggle to turn those features off, because I don't want to open my entire UI surface to the potential of sandbox escape...

conradev22d ago

You have to install it to enable it, actually! Computer Use is also confined (read and write!) to apps that you've explicitly allowed.

haritha-j24d ago

lucrbvi25d ago

Is there anyone that feels that LLMs are wrong for computer use? It's like robotic, if find LLMs alone are really slow for this task

sumedh24d ago

> find LLMs alone are really slow for this task

Faster LLMs will be here by next year.

kelsey9876543125d ago

it it doesn't complain about everything being malware maybe i will come back to openai from my adventures with anthropic

darepublic23d ago

> Our mission is to ensure that AGI benefits all of humanity.

In order to do this we will eat everyone's lunch.

moomin24d ago

Wait, did they just send out a press release boasting that they’re bundling Jesse Vincent’s Superpowers?!

obrajesse24d ago

They did! I didn't actually think we were going to make it into one of the launch videos for this. That was a very pleasant surprise.

And they've been lovely to work with as we got this put together.

OsrsNeedsf2P25d ago

> Computer use is initially available on macOS,

Does anyone know of a good option that works on Wayland Linux?

rickcarlino25d ago

Goose is an option, but it is just OK. https://github.com/aaif-goose/goose

evbogue25d ago

Codex-cli / OpenClaw. If you need a browser use Playwright-mcp.

I can't see why I'd want an agent to click around Gnome or Ubuntu desktop but maybe that's just me?

OsrsNeedsf2P24d ago

> I can't see why I'd want an agent to click around Gnome or Ubuntu desktop but maybe that's just me?

What if you want to develop desktop apps?

2001zhaozhao24d ago

I think the killer feature in this release is the background GUI use.

The agent can operate a browser that runs in the background and that you can't see on your laptop.

This would be immensely useful when working with multiple worktrees. You can prompt the agent to comprehensively QA test features after implementing them.

Xenoamorphous24d ago

I wonder if there’s something off the shelf that does this?

throwuxiytayq24d ago

North Korean employees should do the trick. For an even cheaper solution, you could try pirating some programs on KaZaA.

woeirua24d ago

Claude Desktop / CoWork already does this.

agentifysh25d ago

Sherlocking ramps up into IPO

Bunch of startups need to pivot today after this announcement including mine

sumedh24d ago

What was your startup?

throwaway91128224d ago

how? was this not a thing with claude cowork?

ookblah24d ago

Oarch24d ago

"You've hit the message limit, upgrade to Plus for more".

Ok. I upgrade.

"You've hit the message limit, upgrade to Plus for more".

Hmm. They've charged me. There's no meaningful support. I just got scammed, didn't I...

MattRix24d ago

Log out and log in again? That usually fixes these kind of issues for me.

techteach0025d ago

sasipi24725d ago

The macOS app version of Codex I have doesn't show reasoning summaries, just simply 'Thinking'.

Reasoning deltas add additional traffic, especially if running many subagents etc. So on large scale, those deltas maybe are just dropped somewhere.

Saying that, sometimes the GPT reasoning summary is funny to read, in particular when it's working through a large task.

Also, the summaries can reveal real issues with logic in prompts and tool descriptions+configuration, so it allowing debugging.

i.e. "User asked me to do X, system instructions say do Y, tool says Z which is different to what everyone else wants. I am rather confused here! Lets just assume..."

It has previously allowed me to adjust prompts, etc.

pilooch24d ago

It's useful when using prism, and for exploratory research & code.

sergiotapia24d ago

I do want to see as it allows me to course correct.

bughunter300025d ago

First use case I'm putting to work is testing web apps as a user. Although it seems like this could be a token burner. Saving and mostly replaying might be nice to have.

epitrochoid41324d ago

Lets see how OpenAI holds up. They prolly shitify or dumb down their models like Anthropic to finally turn their massive loss streak into a profit.

eduction24d ago

"We’re also releasing more than 90 additional plugins"

but there is no link, why would you not make this a link.

boggles my mind that companies make such little use of hypertext

shevy-java24d ago

> Codex can now operate your computer alongside you

I am getting some strange vibes here ... is AI actually also spying on these developers?

solarkraft24d ago

Which Codex is this? The open source one that can be built upon or the proprietary desktop app? It looks like the latter.

saltyoldman24d ago

Claude had this, the "app" both of them have (not the terminal stuff) are mirroring each other's features.

xpe24d ago

Please don't forget that OpenAI's leadership has shown the world what it is really made of.

fg13724d ago

> ... work with more of the tools and apps you use everyday, generate images, remember your preferences ...

Why is OpenAI obsessed with generating imgaes? Do they think "generate image" is a thing that a software engineer do on a daily basis?

Even when I was doing heavy web development, I can count the number of times I needed to generate images, and usually for prototyping only.

pilooch24d ago

Slides, publications and tech reports, very handy for figures !

fg13724d ago

Most software developers that I know spend only a fraction of time on that, if at all.

Generating diagrams is much more common than generating "images". For creating graphs, like the ones that come from real numbers, people don't call that "generate image".

bobkb25d ago

Using Claude and Codex side by side now . Would love to just use one eventually

MattDamonSpace25d ago

Competition forever, ideally

andai25d ago

What's the benefit of using both?

bobkb23d ago

Helps with code reviews, plan reviews etc. I have found it very useful to auditing with multiple providers.

nickthegreek25d ago

quota resets/backup when the other is unavailable.

tommy_axle25d ago

OpenClaw acquisition at work.

falcor8425d ago

Any particular evidence for this other than the conjecture that it might be related?

To me it seems like just a natural evolution of Codex and a direct response to Claude Cowork, rather than something fully claw-like.

saagarjha24d ago

Wrong acquisition.

vinhnx24d ago

dhruv300624d ago

I love computer use man

maybeahacker24d ago

I don't think this one did it. time to for the real release

sidgtm25d ago

They felt the pressure of posting something after Claude 4.7

wahnfrieden25d ago

It was already leaked several days ago and they've been teasing it for weeks. They had already said that it was coming this week specifically.

romanovcode25d ago

Obviously they pressed the "publish" button since Opus was released. Do not deny it.

2 more replies

hyperionultra25d ago

Tool for everything does nothing really good.

solenoid093724d ago

Eventually once they have more users they'll do the same thing as Anthropic, of course.

It's all a transparent PR play and it's kind of absurd to see the X/HN crowd fall for it hook, line, and sinker.

someotherperson24d ago

Competition is bad? Who cares - let the big players subsidize and compete between each other. That's what we want. We want strong models at a low price, and we'll hype up whoever is doing it.

Simultaneously, we also hype up the open models that are catching up. That are significantly more discounted, that also put pressure on the big players and keep them in check.

People aren't falling for PR; people are encouraging the PR to put pressure on the competition. It's not that hard.

frank_nitti24d ago

Here and on AI tech subreddits (ones that aren’t specifically about local or FOSS) seem to have this dynamic, to the degree I’ve suspected astroturfing.

So it’s refreshing to see maybe that’s just a coincidence or confirmation bias on my end.

6 more replies

whymememe24d ago

I agree but I’d like to add that people are definitely falling for PR, people are always falling for PR or no one would bother with PR

dmix24d ago

This assumes people are in touch with reality and aren't just motivated by vibes and insta-reactions on social media

daveguy24d ago

> Competition is bad? Who cares - let the big players subsidize and compete between each other.

All these tech oligarch asshat companies need to be regulated to hell and back.

2 more replies

badrequest24d ago

It's hilarious how much this post reads as drafted by an LLM. The emdash, "it's not X, it's Y" framing, incredible.

3 more replies

watwut24d ago

Big players subsidizing is what kills medium and small players which then kills competition. What follows is monopoly.

Big players operating at loss to distort the market is not a good thing overall.

1 more reply

the__alchemist24d ago

Call it fall for it, but here are my two experiences, with both applications open. ($20/month plan for both)

  - Claude: Good for ~20 minutes of work once every 4 hours
  - Codex: Good for however long I want to use it.

Claude nerfed their product so that it's not usable, so I use something else.

CrazyStat24d ago

1 more reply

KronisLV24d ago

3 more replies

ipaddr24d ago

Wow the 20 dollar Claude plan sounds awful. I use Claude at work which has metered billing and have to carefully not to hit my four figure max cap.

1 more reply

BrokenCogs24d ago

unsupp0rted24d ago

Go to /r/codex and see how pissed off people are by the new Codex Plus plan 5-hour limits (they're a sliver of what they were a week ago). Whatever OpenAI is doing to market on Reddit isn't working.

1 more reply

CuriouslyC24d ago

To be fair, GPT 5.4 is mostly a better model than Opus 4.6 in terms of quality of work. The tradeoff is it's less autonomous and it takes longer to complete equivalent tasks.

boomskats24d ago

Which makes it even more of a shame that Sam Altman is such a psychopathic jackass.

luddit324d ago

So Anthropic degraded their product. OAI updated their product to meet for exceeded Anthropic old product.

This is normal behavior and not a cause for such a hyperbolic response.

solenoid093724d ago

There is good competition and bad competition.

Pricing your product unsustainably vs a competitor to gain market share is regarded as "bad competition" and has historically been seen as anticompetitive.

Then, once your competition is gone, and you've entrenched yourself, you do a rug pull.

1 more reply

pizzly24d ago

This is the benefits of competition in action

1 more reply

m3nu24d ago

I have a feeling that Codex is also getting lower limits. Got this email just now. Basically they copy Claude's $100 tier.

> To help you go further with Codex, we’re introducing a new €114 Pro tier designed for longer, high-intensity sessions.

> At launch, this new tier includes a limited-time Codex usage boost, with up to 10x more Codex usage than Plus (typically 5x).

> As the Codex promotion on Plus winds down today, we’re rebalancing Plus usage to support more sessions across the week, rather than longer high-intensity sessions on a single day.

kar118124d ago

This is true. But Anthropic did us dirty most recently and so it’s their turn on the pitch fork. Sam will do us too. Just not yet.

giancarlostoro24d ago

zmmmmm24d ago

We need to force them back into being providers of commodity services and hit this assumption they can mold things in real time on the head.

chaos_emergent24d ago

peyton24d ago

AlexCoventry24d ago

https://paulgraham.com/submarine.html

keeganpoppen24d ago

everyone seems to unconditionally love anthropic, but openai has always had the best models… it just requires a bit more effort on behalf of the user to actually leverage it.

yoyohello1324d ago

There was brief consternation when OpenAI swooped in to snatch up those DoD contracts but then the next model released and all is forgiven.

olcay_24d ago

When OpenAI snatched those contracts, it made me think no worse of OpenAI. The surveillance was already factored into how I saw them (both).

jsemrau24d ago

Codex is much worse than Anthropics model. My experience is that I burn 10x the tokens using Codex compared to Sonnet 4.6

raincole24d ago

> because Anthropic lowered rate limits for individuals due to compute constraints

It's because they don't support OpenCode.

greenavocado24d ago

Not only that, but anthropic is now forcing users to give their biometric information to palantir

They're doing a slow rollout

solenoid093724d ago

OAI already requires this. They both require identity verification in some cases

ra24d ago

Anthropic don't seem to know how to look after and keep customers.

HWR_1424d ago

And hopefully Anthropic has extra capacity then and I can return there.

khacvy24d ago

iterateoften24d ago

No it’s because Anthropic can’t message anything to its customers without lying.

a34729t24d ago

Uber, but AI!

xnx22d ago

If only there were a third major player, maybe one who was even much more established as a cloud provider...

tvmalsv25d ago

My monthly subscription for Claude is up in a week, is there any compelling reason to switch to Codex (for coding/bug fixing of low/medium difficulty apps)? Or is it pretty much a wash at this point?

romanovcode25d ago

Wait for new GPT release this/next week and then decide based on benchmarks. That is what I will do.

One main thing is to de-couple the repos from specific agents e.g. use .mcp.json instead of "claude plugins", use AGENTS.md (and symlink to CLAUDE.md) and so on.

I love this because I have absolutely 0 loyalty to any of these companies and once Anthropic nerfs I just switch to OpenAI, then I can switch to Google and so on. Whichever works best.

dilap25d ago

FWIW, I've found Codex with GPT-5.4 to be better than Opus-4.6; I would say it's at least worth checking out for your use case.

fredericgalline24d ago

I've been switching between both depending on which one is having a good week — and that's the honest answer for most people right now.

The answer to your question: it's mostly a wash on capability. The real cost of switching is the context you don't realize you're rebuilding.

trueno25d ago

Austin_Conlon25d ago

I'm switching because of the higher usage limits, 2x speed mode that isn't billed as extra usage, and much more stable and polished Mac app.

gbear60524d ago

> 2x speed mode that isn't billed as extra usage

...at least for my account, the speed mode is 1.5x the speed at 2x the usage

1 more reply

finales25d ago

hmokiguess25d ago

I can't help but see some things as a solution in search of a problem every time I see these examples illustrating toy projects. Cloud Tic Tac Toe? Seriously?

graphememes24d ago

cursor has been doing this for months, welcome to 3 months ago

armcat25d ago

Is it OpenAI Cowork?

CrzyLngPwd24d ago

"Our mission is to ensure that AGI benefits all of humanity. "

They have AGI now?

hipshaker24d ago

Yes, Artificial Goofy Intelligence

SilverBirch24d ago

Just commenting here to impact the controversy score.

tty45625d ago

sharts24d ago

Can we get up from our desk and leave our codex session (or claude for that matter) and then continue using it with our iphone while having lunch or commuting on a train?

Without 3rd party tools/plugins.

huqedato24d ago

"Codex can now operate your computer alongside you" - I really don't want AI to "operate" my computer.

thm25d ago

Am I the only one who sees screen recordings of AI agents as archaic as filming airplane instruments to take measurements?

ex-aws-dude24d ago

Can't help but think the surface area for security issues is becoming massive with these tools

TheServitor24d ago

Mac only? Meh.

rommelsLegacy24d ago

yoyohello1324d ago

If you’re expecting morality from the HN crowd they will disappoint you every time. Most of the people here wish they could be as ruthless and successful as someone like Sam Altman.

rommelsLegacy24d ago

Thank you for your comment, it's comforting to show I'm not the only one getting offended/disappointed by the behavior of people within our industry.

Truly I don't expect morality, and I'm not even making the moral argument to not use it tbh, as I consider morality to be a double edged sword.

VadimPR25d ago

Only on macOS though? This doesn't seem to work on Linux. Neither does Claude Cowork, not officially.

duckmysick25d ago

I don't see how it's possible to support Linux with Wayland, unless you limit the automation only to the browsers.

VadimPR24d ago

https://github.com/patrickjaja/claude-desktop-bin seems to be trying hard to but I haven't tried it.

rvz25d ago

This is why both companies are in an SF bubble.

mrcwinn25d ago

Linux desktop users. Talk about a bubble!

1 more reply

messh24d ago

SSH to devboxes is the exact usecase for services like https://shellbox.dev: create a box using ssh... and ssh into it. Now web, no subs. Codex can create it's own boxes via ssh

croemer25d ago

What does "major update to codex" mean? New model? Or just new desktop app? The announcement is vague.

Glemllksdf24d ago

Man this progress is fast.

I'm waiting for the open source ai ecosystem to catch up :/

lionkor24d ago

sophacles24d ago

Fuck, i've been using it wrong.

postalcoder25d ago

avaer25d ago

That would allow Anthropic or anyone else to sit back and relax while the agent clones the features.

j / k navigate · click thread line to collapse