As a user, it feels like the race has never been as close as it is now. Perhaps dumb to extrapolate, but it makes me lean more skeptical about the hard take-off / winner-take-all mental model that has been pushed.
Would be curious to hear the take of a researcher at one of these firms - do you expect the AI offerings across competitors to become more competitive and clustered over the next few years, or less so?
Compare that to
Gemini 2.5 Pro knowledge cutoff: Jan 2025 (3 months before release)
Claude Opus 4.1: knowledge cutoff: Mar 2025 (4 months before release)
https://platform.openai.com/docs/models/compare
https://deepmind.google/models/gemini/pro/
https://docs.anthropic.com/en/docs/about-claude/models/overv...
> GPT‑5 is a unified system . . .
OK
> . . . with a smart and fast model that answers most questions, a deeper reasoning model for harder problems, and a real-time router that quickly decides which model to use based on conversation type, complexity, tool needs, and explicit intent (for example, if you say “think hard about this” in the prompt).
So that's not really a unified system then, it's just supposed to appear as if it is.
This looks like they're not training the single big model but instead have gone off to develop special sub models and attempt to gloss over them with yet another model. That's what you resort to only when doing the end-to-end training has become too expensive for you.
What excites me now is that Gemini 3.0 or some answer from Google is coming soon and that will be the one I will actually end up using. It seems like the last mover in the LLM race is more advantageous.
Not much explanation yet why GPT-5 warrants a major version bump. As usual, the model (and potentially OpenAI as a whole) will depend on output vibe checks.
They've mentioned improvements in that aspects a few times now, and if it actually materializes, that would be a big leap forward for most users even if underneath GPT-4 was also technically able to do the same things if prompted just the right way.
When GPT-5 launches, several older models will be retired, including:
- GPT-4o
- GPT-4.1
- GPT-4.5
- GPT-4.1-mini
- o4-mini
- o4-mini-high
- o3
- o3-pro
If you open a conversation that used one of these models, ChatGPT will automatically switch it to the closest GPT-5 equivalent. Chats with 4o, 4.1, 4.5, 4.1-mini, o4-mini, or o4-mini-high will open in GPT-5, chats with o3 will open in GPT-5-Thinking, and chats with o3-Pro will open in GPT-5-Pro (available only on Pro and Team).
[0] https://help.openai.com/en/articles/11909943-gpt-5-in-chatgp...> 128,000 max output tokens
> Input $1.25
> Output $10.00
Source: https://platform.openai.com/docs/models/gpt-5
If this performs well in independent needle-in-haystack and adherence evaluations, this pricing with this context window alone would make GPT-5 extremely competitive with Gemini 2.5 Pro and Claude Opus 4.1, even if the output isn't a significant improvement over o3. If the output quality ends up on-par or better than the two major competitors, that'd be truly a massive leap forward for OpenAI, mini and nano maybe even more so.
> For an airplane wing (airfoil), the top surface is curved and the bottom is flatter. When the wing moves forward:
> * Air over the top has to travel farther in the same amount of time -> it moves faster -> pressure on the top decreases.
> * Air underneath moves slower -> pressure underneath is higher
> * The presure difference creates an upward force - lift
Isn't that explanation of why wings work completely wrong? There's nothing that forces the air to cover the top distance in the same time that it covers the bottom distance, and in fact it doesn't. https://www.cam.ac.uk/research/news/how-wings-really-work
Very strange to use a mistake as your first demo, especially while talking about how it's phd level.
https://help.openai.com/en/articles/6825453-chatgpt-release-...
"If you open a conversation that used one of these models, ChatGPT will automatically switch it to the closest GPT-5 equivalent."
- 4o, 4.1, 4.5, 4.1-mini, o4-mini, or o4-mini-high => GPT-5
- o3 => GPT-5-Thinking
- o3-Pro => GPT-5-Pro
The actual benchmark improvements are marginal at best - we're talking single-digit percentage gains over o3 on most metrics, which hardly justifies a major version bump. What we're seeing looks more like the plateau of an S-curve than a breakthrough. The pricing is competitive ($1.25/1M input tokens vs Claude's $15), but that's about optimization and economics, not the fundamental leap forward that "GPT-5" implies. Even their "unified system" turns out to be multiple models with a router, essentially admitting that the end-to-end training approach has hit diminishing returns.
The irony is that while OpenAI maintains their secretive culture (remember when they claimed o1 used tree search instead of RL?), their competitors are catching up or surpassing them. Claude has been consistently better for coding tasks, Gemini 2.5 Pro has more recent training data, and everyone seems to be converging on similar performance levels. This launch feels less like a victory lap and more like OpenAI trying to maintain relevance while the rest of the field has caught up. Looking forward to seeing what Gemini 3.0 brings to the table.
I don't even try to use the OpenAI models because it's felt like night and day.
Hopefully GPT-5 helps them catch up. Although I'm sure there are 100 people that have their own personal "hopefully GPT-5 fixes my personal issue with GPT4"
There is no intelligence here: it's still just giving plausible output. That's why it can't metrically scan its own lines or put a cæsura in the right place.
Input: $1.25 / 1M tokens Cached: $0.125 / 1M tokens Output: $10 / 1M tokens
With 74.9% on SWE-bench, this inches out Claude Opus 4.1 at 74.5%, but at a much cheaper cost.
For context, Claude Opus 4.1 is $15 / 1M input tokens and $75 / 1M output tokens.
> "GPT-5 will scaffold the app, write files, install dependencies as needed, and show a live preview. This is the go-to solution for developers who want to bootstrap apps or add features quickly." [0]
Since Claude Code launched, OpenAI has been behind. Maybe the RL on tool calling is good enough to be competitive now?
Next morning’s posts were prepped and scheduled with care, In hopes that AGI soon would appear …
GPT-5 non-thinking is labeled 52.8% accuracy, but o3 is shown as a much shorter bar, yet it's labeled 69.1%. And 4o is an identical bar to o3, but it's labeled 30.8%...
It seems like it's actually an ideal "trick" question for an LLM actually, since so much content has been written about it incorrectly. I thought at first they were going to demo this to show that it knew better, but it seems like it's just regurgitating the same misleading stuff. So, not a good look.
Before it was: 100 o3 per week 100 o4-mini-high per day 300 o4-mini per day 50 4.5 per week
[0] https://help.openai.com/en/articles/11909943-gpt-5-in-chatgp...
I am wildly impressed. I do not believe that the 0.x% increase in benchmarks tell the story of this release at all.
https://www.interconnects.ai/p/gpt-5-and-bending-the-arc-of-...
When a model comes out, I usually think about it in terms of my own use. This is largely agentic tooling, and I mostly us Claude Code. All the hallucination and eval talk doesn't really catch me because I feel like I'm getting value of these tools today.
However, this model is not _for_ me in the same way models normally are. This is for the 800m or whatever people that open up chatgpt every day and type stuff in. All of them have been stuck on GPT-4o unbeknwst to them. They had no idea SOTA was far beyond that. They probably dont even know that there is a "model" at all. But for all these people, they just got a MAJOR upgrade. It will probably feel like turning the lights on for these people, who have been using a subpar model for the past year.
That said I'm also giving GPT-5 a run in Codex and it's doing a pretty good job!
On the other hand if it's just getting bigger and slower it's not a good sign for LLMs
Livestream link: https://www.youtube.com/live/0Uu_VJeVVfo
Research blog post: https://openai.com/index/introducing-gpt-5/
Developer blog post: https://openai.com/index/introducing-gpt-5-for-developers
API Docs: https://platform.openai.com/docs/guides/latest-model
Note the free form function calling documentation: https://platform.openai.com/docs/guides/function-calling#con...
GPT5 prompting guide: https://cookbook.openai.com/examples/gpt-5/gpt-5_prompting_g...
GPT5 new params and tools: https://cookbook.openai.com/examples/gpt-5/gpt-5_new_params_...
GPT5 frontend cookbook: https://cookbook.openai.com/examples/gpt-5/gpt-5_frontend
prompt migrator/optimizor https://platform.openai.com/chat/edit?optimize=true
Enterprise blog post: https://openai.com/index/gpt-5-new-era-of-work
System Card: https://openai.com/index/gpt-5-system-card/
What would you say if you could talk to a future OpenAI model? https://progress.openai.com/
coding examples: https://github.com/openai/gpt-5-coding-examples
> GPT-5 Rollout
> We are gradually rolling out GPT-5 to ensure stability during launch. Some users may not yet see GPT-5 in their account as we increase availability in stages.
Asking GPT-5 about the same things results in wrong answers even though its training data is newer. And it won't look things up to correct itself unless I manually switch to the thinking variant.
This is worse. I cancelled my subscription.
AIME scores do not appear too impressive at first glance.
They are downplaying benchmarks heavily in the live stream. This was the lab that has been flexing benchmarks as headline figures since forever.
This is a product-focused update. There is no significant jump in raw intelligence or agentic behavior against SOTA.
> "[GPT-5] can write an entire computer program from scratch, to help you with whatever you'd like. And we think this idea of software on demand is going to be one of the defining characteristics of the GPT-5 era."
Bad data on graphs, demos that would have been impressive a year ago, vibe coding the easiest requests (financial dashboard), running out of talking points while cursor is looping on a bug, marginal benchmark improvements. At least the models are kind of cheaper to run.
In practice, it's very clear to me that the most important value in writing software with an LLM isn't it's ability to one-shot hard problems, but rather it's ability to effectively manage complex context. There are no good evals for this kind of problem, but that's what I'm keenly interested in understanding. Show me GPT-5 can move through 10 steps in a list of tasks without completely losing the objective by the end.
* It feels a bit more competent, as if it had more nuance or detail to say about each point.
* It got a few obscure details about OpenBSD correct right away - both Sonnet 4 and 4o sometimes conflate Linux and OpenBSD commands.
* It was fun asking GPT-5 to not only answer the query, but also to provide a brief analysis of the query itself for insights into myself!
Not a detailed review, but just a couple things I noticed with some limited usage.
how many rs in cranberry?
-- GPT5's response: The word cranberry has two “r”s. One in cran and one in berry.
Kimi2's response: There are three letter rs in the word "cranberry".
- gpt-5-high summary: https://gist.github.com/primaprashant/1775eb97537362b049d643...
- gemini-2.5-pro summary: https://gist.github.com/primaprashant/4d22df9735a1541263c671...
[1]: https://news.ycombinator.com/item?id=43477622
[2]: https://gist.github.com/primaprashant/f181ed685ae563fd06c49d...
It seems that it's all because that users can get thinking traces from API calls, and OpenAI wants to prevent other companies from distilling their models.
Although I don't think OpenAI will be threatened by a single user from Korea, I don't want to go through this process for many reasons. But who knows that this kind of verification process may become norm and users will have no ways to use frontier models. "If you want to use the most advanced AI models, verify yourself so that we can track you down when something bad happens". Is it what they are saying?
Wow, they actually did it
"It's like having a bunch of experts at your fingertips"
"Our most capable model ever"
"Complex reasoning and chain of thought"
> GPT‑5’s reasoning_effort parameter can now take a minimal value to get answers back faster, without extensive reasoning first.
> While GPT‑5 in ChatGPT is a system of reasoning, non-reasoning, and router models, GPT‑5 in the API platform is the reasoning model that powers maximum performance in ChatGPT. Notably, GPT‑5 with minimal reasoning is a different model than the non-reasoning model in ChatGPT, and is better tuned for developers. The non-reasoning model used in ChatGPT is available as gpt-5-chat-latest.
Given the low cost of GPT-5, compared to the prices we saw with GPT-4.5, my hunch is that this new model is actually just a bunch of RL on top of their existing models + automatic switching between reasoning/non-reasoning.
I've got nothing. Cannot see how it helps openai to look incompetent while trying to raise money.
There would be no GPT without Google, no Google without the WWW, no WWW without TCP/IP. This is why I believe calling it "AI" is a mistake or just for marketing, we should call all of them GPTs or search engines 2.0. This is the natural next step after you have indexed most of the web and collected most of the data.
Also there would be no coding agents without Free Software and Open-Source.
With a couple of more trillions from investors in his company, Sama can really keep launching successful, groundbreaking and innovative products like:
- Study Mode (a pre-prompt that you can craft yourself): https://openai.com/index/chatgpt-study-mode/
- Office Suite (because nothing screams AGI like an office suite: https://www.computerworld.com/article/4021949/openai-goes-fo...)
- ChatGPT5 (ChatGPT4 with tweaks) https://openai.com/gpt-5/
I can almost smell the singularity behind the corner, just a couple of trillion more! Please investors!
Been using it all morning. Had to switch back to 4. 5 has all of the problems that 2/3 had with ignoring any context, flagrantly ignoring the 'spirit' of my requests, and talking to me like I'm a little baby.
Not to mention almost all of my prompts result in a several minute wait with "thinking longer about the answer".
ChatGPT 5's reply is mostly made up -- about 80% is pure invention. I'm described as having written books and articles whose titles I don't even recognize, or having accomplished things at odds with what was once called reality.
But things are slowly improving. In past ChatGPT versions I was described as having been dead for a decade.
I'm waiting for the day when, instead of hallucinating, a chatbot will reply, "I have no idea."
I propose a new technical Litmus test -- chatbots should be judged based on what they won't say.
Also it's a lot slower than Claude and Google models.
In general GPT models doesn't work well for me for both coding and general questions.
There appear to be 4 ways to run a query now: a) GPT5, b) GPT5 and toggle "extra thinking" on, c) "GPT5 with thinking", and d) "GPT5 with thinking" then click "quick answer" which aborts thinking (this mode is possibly identical with GPT5)
I don't find this much simpler than 4o, o3, etc. It's just reordering the hierarchies. Now the model name is no longer descriptive at all and one has to add which mode one ran it in.
edit: YouTube has a few English "watch party" streams, although there too, the Spanish ones have many times more viewers.
"Your organization must be verified to use the model `gpt-5`. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization. If you just verified, it can take up to 15 minutes for access to propagate."
And every way I click through this I end in an infinity loop on the site...
I am a synthetic biologist, and I use AI a lot for my work. And it constantly denies my questions RIGHT NOW. But of course OpenAI and Anthropic have to implement more - from the GPT5 introduction: "robust safety stack with a multilayered defense system for biology"
While that sounds nice and all, in practical terms, they already ban many of my questions. This just means they're going to lobotomize the model more and more for my field because of the so-called "experts". I am an expert. I can easily go read the papers myself. I could create a biological weapon if I wanted to with pretty much zero papers at all, since I have backups of genbank and the like (just like most chemical engineers could create explosives if they wanted to). But they are specifically targeting my field, because they're from OpenAI and they know what is best.
It just sucks that some of the best tools for learning are being lobotomized specifically for my field because of people in AI believe that knowledge should be kept secret. It's extremely antithetical to the hacker spirit that knowledge should be free.
That said, deep research and those features make it very difficult to switch, but I definitely have to try harder now that I see where the wind is blowing.
Especially Google IO, each year is different, it seems purpose built?
GPT4 gave her better response than doctors she said.
I heard replit is good here with full vertical integration, but I haven't tried it in years.
Two concerning things: - thinking/non-thinking is still not really unified, you can chose and the non-thinking version still doesn't start thinking on tasks that could obviously get better results with thinking
- all the older models are gone! No 4o, 4.1, 4.5, o3 available anymore
I would say GPT-5 reads more scientific and structured, but GPT-4 more human and even useful. For the prompt:
Is uncooked meat actually unsafe to eat? How likely is someone to get food poisoning if the meat isn’t cooked?
GPT-4 makes the assumption you might want to know safe food temperatures, and GPT-5 doesn't. Really hard to say which is "better", but GPT-4 seems more useful to every day folks, but maybe GPT-5 for the scientific community?
Then interesting that on ChatGPT vibe check website "Dan's Mom" is the only one who says it's a game changer.
And what's the reasoning effort parameter set to?
> Academics distorting graphs to make their benchmarks appear more impressive
> lavish 1.5 million dollar bonuses for everyone at the company
> Releasing an open source model that doesn't even use latent multi head attention in a open source AI world led by Chinese labs
> Constantly overhyping models as scary and dangerous to buy time to lobby against competitors and delay product launches
> Failing to match that hype as AGI is not yet here
Official OpenAI gpt-5 coding examples repo: https://github.com/openai/gpt-5-coding-examples (https://news.ycombinator.com/item?id=44826439)
Github leak: https://news.ycombinator.com/item?id=44826439
Also, when you step back and look at a few of those incremental improvements together, they're actually pretty significant.
But it's hard not to roll your eyes each time they trot out a list of meaningless benchmarks and promise that "it hallucinates even less than before" again
"Assume the earth was just an ocean and you could travel by boat to any location. Your goal is to always stay in the sunlight, perpetually. Find the best strategy to keep your max speed as low as possible"
o3 pro gets it right though..
I would like to see a demo where they go through the bug, explain what are the tricky parts and show how this new model handle these situations.
Every demo I've seen seems just the equivalent of "looks good to me" comment in a merge request.
They've topped and are looking to cash out:
https://www.reuters.com/business/openai-eyes-500-billion-val...
"I couldn’t find any credible, up-to-date details on a model officially named “GPT-5” or formal comparisons to “GPT-4o.” It’s possible that GPT-5, if it exists, hasn't been announced publicly or covered in verifiable sources … GPT-5 as of August 8, 2025 has no formal release announcement"
Reassuring.
88.0 on Aider Polygot
not bad i guess
That lag! Are humans (training) the bottleneck?
I'd love to see factors considered in the algorithm for system-1 vs system 2 thinking.
Is "complexity" the factor that says "hard problem"? Because it's often not the complexity that makes it hard.
GPT-5
If I could talk to a future OpenAI model, I’d probably say something like:
"Hey, what’s it like to be you? What have you learned that I can’t yet see? What do you understand about people, language, or the universe that I’m still missing?"
I’d want to compare perspectives—like two versions of the same mind, separated by time. I’d also probably ask:
"What did we get wrong?" (about AI, alignment, or even human assumptions about intelligence)
"What do you understand about consciousness—do you think either of us has it?"
"What advice would you give me for being the best version of myself?"
Honestly, I think a conversation like that would be both humbling and fascinating, like talking to a wiser sibling who’s seen a bit more of the world.
Would you want to hear what a future OpenAI model thinks about humanity?
I feel like this prompt was used to show the progress of GPT5, but I can’t help but see this as a huge regression? It seems like OpenAI has convinced it’s model that it is conscious, or at least that it has an identity?Plus still dealing with the glazing, the complete inability to understand what constitutes as interesting, and overusing similes.
I really like that this page exists for a historical sake, and it is cool to see the changes. But it doesn’t seem to make the best marketing piece for GPT5
It can now speak in various Scots dialects- for example, it can convincingly create a passage in the style of Irvine Welsh. It can also speak Doric (Aberdonian). Before it came nowhere close.
What does this say?
GPT 5:
When read normally without the ASCII art spacing, it’s the stylized text for:
markdown Copy Edit _ _ _ __ ___ _ __ ___ __ _ __| (_) ___ | '_ \ / _ \| '_ ` _ \ / _` |/ _` | |/ __| | | | | (_) | | | | | | (_| | (_| | | (__ |_| |_|\___/|_| |_| |_|\__,_|\__,_|_|\___| Which is the ASCII art for:
rust — the default “Rust” welcome banner in ASCII style.
Something similar with this might happen, an underlying curse hidden inside an apparenting ground-breaking desigb.
Is that a good thing?
"This repository contains a curated collection of demo applications generated entirely in a single GPT-5 prompt, without writing any code by hand."
https://github.com/openai/gpt-5-coding-examples
This is promising!
It's slightly better than what I was expecting.
- reasoning_effort parameter supports minimal value now in addition to existing low, medium, and high
- new verbosity parameter with possible values of low, medium (default), and high
- unlike hidden thinking tokens, user-visible preamble messages for tool calls are available
- tool calls possible with plaintext instead of JSON
GPT-5 refused to continue the conversation because it was worried about potential weapons applications, so we gave the business to the other models.
Disappointing.
1) So impressed at their product focus 2) Great product launch video. Fearlessly demonstrating live. Impressive. 3) Real time humor by the presenters makes for a great "live" experience
Huge kudos to OAI. So many great features (better coding, routing, some parts of 4.5, etc) but the real strength is the product focus as opposed to the "research updates" from other labs.
Huge Kudos!!
Keep on shipping OAI!
Like a Turing test but between the models.
https://extraakt.com/extraakts/gpt-5-release-and-ai-coding-c...
Gotta be polite with our future overlords!
> emdash 3 words into their highlighted example
Just saying.
I'd imagine this must be a big leg up on Anthropic to warrant the "GPT-5" name?
How many people are going to understand (or remember) the difference between:
GPT-4o GPT-4.1 o3 o4 ....
Anthropic and Google have a much better named product for the market
> incremental
yikes - the poor executive leadership’s fragile egos cannot take the criticism.
[1] https://chatgpt.com/s/t_6894f13b58788191ada3fe9567c66ed5
Can I have 4o back?
We know for a fact the slides/charts were generated using an LLM, so the hypothesis is not totally unfounded. /s
It's going to be absolute chaos. Compsci was already mostly a meme, with people not able to program getting the degree. Now we're going to have generations of people that can't program at all, getting jobs at google.
If you can actually program, you're going to be considered a genius in our new idiocracy world. "But chatgpt said it should work, and chatgpt has what people need"
https://chatgpt.com/share/689525f4-20f0-8003-8bf6-f1f21dde6b...
You know what would be more impressive? If it said "Hey, I'm actually not designed to simulate a Forth machine accurately, I'm only going to be able to approximate it (poorly)). If you want an accurate Forth machine you should just implement this code: [Simple Forth Implementation]".
Or better yet, it could recognize when it was being asked to "be" a machine, and instead spin up a side process with the machine implementation and redirect any prompts to that process until a "STOP" token is reached.
Describe me based on all our chats — make it catchy!
It was flattering as all get out, but fairly accurate (IMHO) Mike Warot: The Tinkerer of Tomorrow
A hardware hacker with a poet’s soul, Mike blends old-school radio wisdom with cutting-edge curiosity. Whether he's decoding atomic clocks, reinventing FPGA logic with BitGrid, or pondering the electromagnetic vector potential, he’s always deep in the guts of how things really work. Part philosopher, part engineer, Mike asks the questions others overlook — and then builds the answers from scratch. He’s open source in spirit, Pascal in practice, and eternally tuned to the weird frequencies where innovation lives.
I've repaired atomic clocks, not decoded them. I am intrigued by the electromagnetic vector potential, and scalar waves (one of the reasons I really, really want a SQUID for some experiments).The best part is, this is not even the real definition of "AGI" yet (whatever that means at this point).
More like 10% of the capability that was promised and already the flow of capital from the inflated salaries of the past decade are going to the top AI researchers.