(I work at OpenAI.)
I literally wasn’t able to convince the model to WORK, on a quick, safe and benign subtask that later GLM, Kimi and Minimax succeeded on without issues. Had to kick OpenAI immediately unfortunately.
And that backdoor API has GPT-5.5.
So here's a pelican: https://simonwillison.net/2026/Apr/23/gpt-5-5/#and-some-peli...
I used this new plugin for LLM: https://github.com/simonw/llm-openai-via-codex
UPDATE: I got a much better pelican by setting the reasoning effort to xhigh: https://gist.github.com/simonw/a6168e4165a258e4d664aeae8e602...
Edit: this one has crossed legs lol
https://hcker.news/pelican-low.svg
https://hcker.news/pelican-medium.svg
https://hcker.news/pelican-high.svg
https://hcker.news/pelican-xhigh.svg
Someone needs to make a pelican arena, I have no idea if these are considered good or not.
It continues to amaze me that these models that definitely know what bicycle geometry actually looks like somewhere in their weights produces such implausibly bad geometry.
Also mildly interesting, and generally consistent with my experience with LLMs, that it produced the same obvious geometry issue both times.
I recommend anybody in offensive/defensive cybersecurity to experiment with this. This is the real data point we needed - without the hype!
Never thought I'd say this but OpenAI is the 'open' option again.
Compared to Anthropic, they always have been. Anthropic has never released any open models. Never released Claude Code's source, willingly (unlike Codex). Never released their tokenizer.
> Developers and security professionals doing cybersecurity-related work or similar activity that could be mistaken by automated detection systems may have requests rerouted to GPT-5.2 as a fallback.
> OpenAI's terms and policies restrict the use of our services in a number of areas. We have identified activity in your OpenAI account that is not permitted under our policies for: - Cyber Abuse
I raised an appeal which got denied. To be fair I think it's close to impossible for someone that is looking at the chat history to differenciate between legitimate research and malicious intent. I have also applied for the security research program that OpenAI is offering but didn't get any reply on that.
Anthropic is the embodiment of bullshitting to me.
I read Cialdini many decades ago and I am bored by Anthropic.
OpenAI is very clever. With the advent of Claude OpenAI disappeared from the headlines. Who or what was this Sam again all were talking about a year ago?
OpenAI has a massive user advantage so that they can simply follow Anthropic’s release cycle to ridicule them.
I think it is really brutal for Anthropic how they are easily getting passed by by OpenAI and it is getting worse with every new GPT version for Anthropic.
OpenAI owns them.
https://developers.openai.com/codex/pricing?codex-usage-limi...
Note the Local Messages between 5.3, 5.4, and 5.5. And, yes, I did read the linked article and know they're claiming that 5.5's new efficient should make it break-even with 5.4, but the point stands, tighter limits/higher prices.
Unfortunately I think the lesson they took from Anthropic is that devs get really reliant and even addicted on coding agents, and they'll happily pay any amount for even small benefits.
> To better utilize GPUs, Codex analyzed weeks’ worth of production traffic patterns and wrote custom heuristic algorithms to optimally partition and balance work. The effort had an outsized impact, increasing token generation speeds by over 20%.
The ability for agentic LLMs to improve computational efficiency/speed is a highly impactful domain I wish was more tested than with benchmarks. From my experience Opus is still much better than GPT/Codex in this aspect, but given that OpenAI is getting material gains out of this type of performancemaxxing and they have an increasing incentive to continue doing so given cost/capacity issues, I wonder if OpenAI will continue optimizing for it.
On the other hand all companies know that optimizing their own infrastructure / models is the critical path for ,,winning'' against the competition, so you can bet they are serious about it.
I remembered the famous FizzBuzz Intel codegolf optimizations, and gave it to gemini pro, along with my code and instructions to "suggest optimizations similar to those, maybe not so low level, but clever" and it's suggestions were veerry cool.
LLM do not stop amazing me every day.
The game that this prompt generated looks pretty decent visually. A big part of this likely due to the fact the meshes were created using a seperate tool (probably meshy, tripo.ai, or similiar) and not generated by 5.5 itself.
It really seems like we could be at the dawn of a new era similiar to flash, where any gamer or hobbyist can generate game concepts quickly and instantly publish them to the web. Three.js in particular is really picking up as the primary way to design games with AI, in spite of the fact it's not even a game engine, just a web rendering library.
The point is if we can prompt an LLM to reason about 3 dimensions, we likely will be able to apply that to math problems which it isn't able to solve currently.
I should release my Rubiks Cube MCP server with the challenge to see if someone can write a prompt to solve a Rubik's Cube.
It still struggles to create shaders from scratch, but is now pretty adequate at editing existing shaders.
In 5.2 and below, GPT really struggled with "one canvas, multiple page" experiences, where a single background canvas is kept rendered over routes. In 5.4, it still takes a bit of hand-holding and frequent refactor/optimisation prompts, but is a lot more capable.
Excited to test 5.5 and see how it is in practice.
It might not be a game engine, but it’s the de facto standard for doing WebGL 3D. And since it’s been around forever, there’s a massive amount of training data available for it.
Before LLMs were a thing, I relied more on Babylon.js, since it’s a bit higher level and gives you more batteries included for game development.
What's strange is that this Pietro Schirano dude seems to write incredibly cargo cult prompts.
Game created by Pietro Schirano, CEO of MagicPath
Prompt: Create a 3D game using three.js. It should be a UFO shooter where I control a tank and shoot down UFOs flying overhead.
- Think step by step, take a deep breath. Repeat the question back before answering.
- Imagine you're writing an instruction message for a junior developer who's going to go build this. Can you write something extremely clear and specific for them, including which files they should look at for the change and which ones need to be fixed?
-Then write all the code. Make the game low-poly but beautiful.
- Remember, you are an agent: please keep going until the user's query is completely resolved before ending your turn and yielding back to the user. Decompose the user's query into all required sub-requests and confirm that each one is completed. Do not stop after completing only part of the request. Only terminate your turn when you are sure the problem is solved. You must be prepared to answer multiple queries and only finish the call once the user has confirmed they're done.
- You must plan extensively in accordance with the workflow steps before making subsequent function calls, and reflect extensively on the outcomes of each function call, ensuring the user's query and related sub-requests are completely resolved.I think people are starting to catch on to where we really are right now. Future models will be better but we are entering a trough of dissolution and this attitude will be widespread in a few months.
We've been there for a while.... creativity has been the primary bottleneck
[1] https://apps.apple.com/uz/app/jamboree-game-maker/id67473110...
Mythos 5.5
SWE-bench Pro 77.8%* 58.6%
Terminal-bench-2.0 82.0% 82.7%*
GPQA Diamond 94.6%* 93.6%
H. Last Exam 56.8%* 41.4%
H. Last Exam (tools) 64.7%* 52.2%
BrowseComp 86.9% 84.4% (90.1% Pro)*
OSWorld-Verified 79.6%* 78.7%
Still far from Mythos on SWE-bench but quite comparable otherwise.
Source for mythos values: https://www.anthropic.com/glasswingHere: https://www.anthropic.com/news/claude-opus-4-7#:~:text=memor...
If you look at the SWEBench official submissions: https://github.com/SWE-bench/experiments/tree/main/evaluatio..., filter all models after Sonnet 4, and aggregate ALL models' submission across 500 problems, what I found that the aggregated resolution rate is 93% (sharp).
Mythos gets 93.7%, meaning it solves problems that no other models could ever solve. I took a look at those problems, then I became even more suspicious, for the remaining 7% problems, it is almost impossible to resolve those issues without looking at the testing patch ahead of time, because how drastically the solution itself deviates from the problem statement, it almost feels like it is trying to solve a different problem.
Not that I am saying Mythos is cheating, but it might be too capable to remember all states of said repos, that it is able to reverse engineer the TRUE problem statement by diffing within its own internal memory. I think it could be a unique phenomena of evaluation awareness. Otherwise I genuinely couldn't think of exactly how it could be this precise in deciphering such unspecific problem statements.
Source: https://artificialanalysis.ai/models?omniscience=omniscience...
While hallucination is probably closer to 100% depending on the question. This benchmark makes no sense.
LLMs will ruin your product, have fun trusting a billionaires thinking machine they swear is capable of replacing your employees if you just pay them 75% of your labor budget.
The hope is to get a big userbase who eventually become dependent on it for their workflow, then crank up the price until it finally becomes profitable.
The price for all models by all companies will continue to go up, and quickly.
That's a big if, though. I wish Meta were still releasing top of the line, expensively produced open-weights models. Or if Anthropic, Google, or X would release an open mini version.
Fewer people will use it.
*I work at OAI.
Like I will get Opus to make me an app but it will stop in between because I need to setup the db and plug in the API keys and Opus really can't do that on its own yet
You can replace pretty much everything - skills system, subagents, etc with just tmux and a simple cli tool that the official clients can call.
Oh and definitely disable any form of "memory" system.
Essentially, treat all tooling that wraps the models as dumb gateways to inference. Then provider switch is basically a one line config change.
MCPs aren't as smooth, but I just set them up in each environment.
The actual harness is great, very hackable, very extendable.
I get openai team plan at work.
Claude enterprise too.
I have openrouter for myself.
I use minimax 2.7. Kimi 2.6. And gpt 5.5 and opus 4.7. I can toggle between them in an open source interface that's how I stay able to not be trapped.
Minimax is so cheap and for personal stuff it works fine. So I'm always toggling between the nre releases
AGENTS.md / skills / etc
The APIs are pretty interchangeable too. Just ask to convert from one to the other if you need to.
This quote is more sinister than I think was intended; it likely applies to all frontier coding models. As they get better, we quickly come to rely on them for coding. It's like playing a game on God Mode. Engineers become dependent; it's truly addictive.
This matches my own experience and unease with these tools. I don't really have the patience to write code anymore because I can one shot it with frontier models 10x faster. My role has shifted, and while it's awesome to get so much working so quickly, the fact is, when the tokens run out, I'm basically done working.
It's literally higher leverage for me to go for a walk if Claude goes down than to write code because if I come back refreshed and Claude is working an hour later then I'll make more progress than mentally wearing myself out reading a bunch of LLM generated code trying to figure out how to solve the problem manually.
Anyway, it continues to make me uneasy, is all I'm saying.
The current market is predicated on the assumption that labor is atomic and has little bargaining power (minus unions). While capital has huge bargaining power and can effectively put whatever price it wants on labor (in markets where labor is plentiful, which is most of them).
What happens to a company used to extracting surplus value from labor when the labor is provided by another company which is not only bigger but unlike traditional labor can withhold its labor indefinitely (because labor is now just another for of capital and capital doesn't need to eat)?
Anyone not using in house models is signing up to find out.
I found my pocket empty, and the specific pain I felt in that moment was the feeling of not being able to remember something.
I thought it was interesting, because in this case, I was trying to "remember" something I had never learned before -- by fetching it from my second brain (hypertext).
L1 cache miss, L2 missing.
Would one be uneasy about calling a library to do stuff than manually messing around with pointers and malloc()? For some, yes. For others, it’s a bit freeing as you can do more high-level architecture without getting mired and context switched from low level nuances.
Note that neither of these assumptions are obviously true, at least to me. But I can hope!
Also, I honestly can’t believe the 10x mantra is being still repeated.
What's the worst potential outcome, assuming that all models get better, more efficient and more abundant (which seems to be the current trend)? The goal of engineering has always been to build better things, not to make it harder.
- I often don't ask the LLM for precompiled answers, i ask for a standalone cli / tool
- I often ask how it reached its conclusions, so I can extend my own perspective
- I often ask to describe it's own metadata level categorization too
I'm trying to use it to pivot and improve my own problem solving skills, especially for large code base where the difficulty is not conceptual but more reference-graph sizeAnd I'm being very cautious. I'm not vibecoding entire startups from scratch, I'm manually reviewing and editing everything the AI is outputting. I still got completely hooked on building things with Claude.
Of course they aren't alternative to the current frontier model, and as such you cannot easily jump from the later to the former, but they aren't that far behind either, for coding Qwen3.5-122B is comparable to what Sonnet was less than a year ago.
So assuming the trend continues, if you can stop following the latest release and stick with what you're already using for 6 or 9 months, you'll be able to liberate yourself from the dependency to a Cloud provider.
Personally I think the freedom is worth it.
1. I only have ONE SOTA model integrated into the IDE (I am mostly on Elixir, so I use Gemini). I ensure I use this sparingly for issues I don't really have time to invest or are basically rabbit holes eg. Anything to do with Javascript or its ecosystem). My job is mostly on the backend anyway.
2. For actual backend architecture. I always do the high level architecture myself. Eg. DDD. Then I literally open up gemini.google.com or claude.ai on the browser, copy paste existing code base into the code base, physically leavey chair to go make coffee or a quick snack. This forces me to mentally process that using AI is a chore.
Previously, I was on tight Codex integration and leaving the licensing fears aside, it became too good in writing Elixir code that really stopped me from "thinking" aka using my brain. It felt good for the first few weeks but I later realised the dependence it created. So I said fuck it, and completely cancelled my subscription because it was too good at my job.I believe this is the only way that we won't end up like in Wall-E sitting infront of giant screens just becoming mere blobs of flesh.
Fwiw, I haven't spoken with any management-level colleague in the past 9 months who hasn't noted that asking about AI-comfort & usage is a key interview topic. For any role type, business or technical.
Touching grass while you're outside might yield highest leverage.
I haven’t really thought about this before, but you’re right, it feels a bit uneasy for me too.
https://driverlesscrocodile.com/technology/neal-stephenson-o...
I feel sorry for whoever has to work on that codebase. This is the literal definition of tech debt.
That's probably a bad sign. Skills will atrophy, but we should be building systems that are still easy to understand.
Turning tokens into a well-groomed and maintainable codebase is what you want to do, not "one shot prompt every new problem I come across".
did we feel uneasy that a new generation of builders didn't have to solve equations by hand because a calculator could do them?
i'm not sure it's the same analogy but in some ways it holds.
Taking more breaks and "not working" during the work day sounds like something we should probably be striving to work towards more as a society.
At the end of the day, all these closed models are being built by companies that pumped all the knowledge from the internet without giving much back. But competition and open source will make sure most of the value return to the most of the people.
Oh stop the drama. Open source models can handle 99% of your questions.
F5
As long as tokens count roughly equally towards subscription plan usage between 5.5 & 5.4, you can look at this as effectively a 5x increase in usage limits.
Seems so to me - see GPT-5.4[1] and 5.2[2] announcements.
Might be an tacit admission of being behind.
[1] https://openai.com/index/introducing-gpt-5-4/ [2] https://openai.com/index/introducing-gpt-5-2/
https://cdn.openai.com/pdf/18a02b5d-6b67-4cec-ab64-68cdfbdde...
So much bench-maxxing is just giving the model a ton of tokens so it can inefficiently explore the solution space.
The efficiency gap is enormous. Maybe it's the difference between GB200 NVL72 and an Amazon Tranium chip?
Like Chinese versus English - you need fewer Chinese characters to say something than if you write that in English.
So this model internally could be thinking in much more expressive embeddings.
That's a wild statement to put into your announcement. Are LLM providers now openly bragging about our collective dependency on their models?
Tried gpt5.5 and so far good. Zapier also shared an automation benchmark where 5.5 came on top in the leaderboard https://zapier.com/benchmarks
Because software and "information technology" generally didn't increase productivity over the past 30 years.
This has been long known as Solow's productivity paradox. There's lots of theories as to why this is observed, one of them being "mismeasurement" of productivity data.
But my favorite theory is that information technology is mostly entertainment, and rather than making you more productive, it distracts you and makes you more lazy.
AI's main application has been information space so far. If that continues, I doubt you will get more productivity from it.
If you give AI a body... well, maybe that changes.
But the less effort exertion also conditions you to be weaker, and less able to connect deeply with the brain to grind as hard as once did. This is bad.
Which effect dominates? Difficult to say.
Of course this is absolutely possible. Ultimately there was a time where physical exertion was a thing and nobody was over-weight. That isn't the case anymore is it.
AI feels the same. I'm shipping indie apps solo now that would have needed a small team five years ago. But in bigger orgs I see people spending 20 minutes verifying 15-minute AI output that used to be a 30-minute task they'd just do. Depends where you sit.
Do you think it'd be viable to run most businesses on pen and paper? I'll give you email and being able to consume informational websites - rest is pen and paper.
How does this work exactly? Is there like a "search online" tool that the harness is expected to provide? Or does the OpenAI infra do that as part of serving the response?
I've been working on building my own agent, just for fun, and I conceptually get using a command line, listing files, reading them, etc, but am sort of stumped how I'm supposed to do the web search piece of it.
Given that they're calling out that this model is great at online research - to what extent is that a property of the model itself? I would have thought that was a harness concern.
It definitely seems like it does all the searching first, with a separate model, loads that in, then does the actual writing.
The harness provides the search tool, but the model provides the keywords to search for, etc.
(same input price and 20% more output price than Opus 4.7)
However, I do want to emphasize that this is per token, not per task.
If we look at Opus 4.7, it uses smaller tokens (1-1.35x more than Opus 4.6) and it was also trained to think longer. https://www.anthropic.com/news/claude-opus-4-7
On the Artificial Analysis Intelligence Index eval for example, in order to hit a score of 57%, Opus 4.7 takes ~5x as many output tokens as GPT-5.5, which dwarfs the difference in per-token pricing.
The token differential varies a lot by task, so it's hard to give a reliable rule of thumb (I'm guessing it's usually going to be well below ~5x), but hope this shows that price per task is not a linear function of price per token, as different models use different token vocabularies and different amounts of tokens.
We have raised per-token prices for our last couple models, but we've also made them a lot more efficient for the same capability level.
(I work at OpenAI.)
I'd not be surprised if this is the year where some models simply stop being available as a plain API, while foundation model companies succeed at capturing more use cases in their own software.
It's kind of starting to make sense that they doubled the usage on Pro plans - if the usage drains twice as fast on 5.5 after that promo is over a lot of people on the $100 plan might have to upgrade.
Anyway - these benchmarks look really good; I’m hopeful on the qualitative stuff.
Yeah, this was the next step. Have RLVR make the model good. Next iteration start penalising long + correct and reward short + correct.
> CyberGym 81.8%
Mythos was self reported at 83.1% ... So not far. Also it seems they're going the same route with verification. We're entering the era where SotA will only be available after KYC, it seems.
https://openai.com/index/scaling-trusted-access-for-cyber-de...
> We are expanding access to accelerate cyber defense at every level. We are making our cyber-permissive models available through Trusted Access for Cyber , starting with Codex, which includes expanded access to the advanced cybersecurity capabilities of GPT‑5.5 with fewer restrictions for verified users meeting certain trust signals (opens in a new window) at launch.
> Broad access is made possible through our investments in model safety, authenticated usage, and monitoring for impermissible use. We have been working with external experts for months to develop, test and iterate on the robustness of these safeguards. With GPT‑5.5, we are ensuring developers can secure their code with ease, while putting stronger controls around the cyber workflows most likely to cause harm by malicious actors.
> Organizations who are responsible for defending critical infrastructure can apply to access cyber-permissive models like GPT‑5.4‑Cyber, while meeting strict security requirements to use these models for securing their internal systems.
"GPT‑5.4‑Cyber" is something else and apparently needs some kind of special access, but that CyberGym benchmark result seems to apply to the more or less open GPT-5.5 model that was just released.> *Anthropic reported signs of memorization on a subset of problems
And from the Anthropic's Opus 4.7 release page, it also states:
> SWE-bench Verified, Pro, and Multilingual: Our memorization screens flag a subset of problems in these SWE-bench evals. Excluding any problems that show signs of memorization, Opus 4.7’s margin of improvement over Opus 4.6 holds.
Also notice how they state just for SWE-Bench Pro: "*Anthropic reported signs of memorization on a subset of problems"
After migrating for the token and harness issues, I was pleasantly surprised that Codex seems to perform as good or better too!
Things change so often in this field, but I prefer Codex now even though Anthropocene has so much more hype for coding it seems.
Will be interesting to try.
However the language of ChatGPT is still the same slop as years ago, so many headings, so many emojis, so many "the important thing nobody mentions". 10 paragraphs of text for what should be a two paragraph response. Even with custom instructions (keep answers short and succinct) and using their settings (less list, less emoji, less fluff) it's still NOTICEABLY worse than Claude on base settings.
I've yet to test Codex, will get to that this weekend, but in terms of research or general Q&A I have no idea how anyone could prefer this to Claude. Unfortunately Claude has seemingly stopped giving a fuck about researching entirely.
You can kind of use connectors like MCP, but having to use ngrok every time just to expose a local filesystem for file editing is more cumbersome than expected.
I thought it was weird that for almost the entire 5.3 generation we only had a -codex model, I presume in that case they were seeing the massive AI coding wave this winter and were laser focused on just that for a couple months. Maybe someday someone will actually explain all of this.
This might be great if it translates to agentic engineering and not just benchmarks.
It seems some of the gains from Opus 4.6 to 4.7 required more tokens, not less.
Maybe more interesting is that they’ve used codex to improve model inference latency. iirc this is a new (expectedly larger) pretrain, so it’s presumably slower to serve.
Particularly in areas outside straight coding tasks. So analysis, planning, etc. Better and more thorough output. Better use of formatting options(tables, diagrams, etc).
I'm hoping to see improvements in this area with 5.5.
The large price bump might indicate the latter.
I prescribe 20 hours of KSP to everyone involved, that'll set them right.
https://github.com/williamcotton/space-trader/commit/0859c65...
Once upon a time humans had to manually advance the spark ignition as their car's engine revved faster.
Once upon a time humans had to know the architecture of a CPU to code for it.
History is full of instances of humans meeting technology where it was, accommodating for its limitations. We are approaching a point where machines accommodate to our limitations -- it's not a point, really, but a spectrum that we've been on.
It's going to be a bumpy ride.
(I work at OpenAI.)
How much capability is lost, by hobbling models with a zillion protections against idiots?
Every prompt gets evaluated, to ensure you are not a hacker, you are not suicidal, you are not a racist, you are not...
Maybe just...leave that all off? I know, I know, individual responsibility no longer exists, but I can dream.
I hope GPT 5.5 Pro is not cutting corners and neuter from the start, you got the compute for it not to be.
Anthropic is slightly better but where is 4.6 or 4.7 haiku or 4.7 sonnet etc.
Since Feb when we got Gemini 3.1, Opus 4.6, and GPT-5.3-Codex we have seen GPT-5.4 and GPT-5.5 but only Opus 4.7 and no new Gemini model.
Both of these are pretty decent improvements.
The big question is: does it still just write slop, or not?
Fool me once, fool me twice, fool me for the 32nd time, it’s probably still just slop.
https://www.nytimes.com/2026/04/23/technology/openai-new-model.html
I can see how some model releases would meet the NY Times news-worthy threshold if they demonstrated significance to users - i.e., if most users were astir and competitors were re-thinking their situation.However, this same-day article came out before people really looked at it. It seems largely intended to contrast OpenAI with Anthropic's caution, before there has been any evidence that the new model has cyber-security implications.
It's not at all clear that the broader discourse is helping, if even the NY Times is itself producing slop just to stoke questions.
Where's the demo link?
Surely it doesn't still have the same ancient data cutoff as 5.4 did?
Are the tests getting harder and harder so the older AIs look worst and the new ones look like they are "almost there" ?
Anyways, still exciting to see more improvements.
Soo many unconvincing "I've had access for three weeks and omg it's amazing" takes, it actually primes me for it to be a "meh".
I prefer to see for myself, but the gradual rollout, combined with full-on marketing campaign, is annoying.
With the Pro plan it was available in both Codex and ChatGPT already when I first checked, which was within an hour of the release.
...
> we’re deploying stricter classifiers for potential cyber risk which some users may find annoying initially
So we should be expecting to not be able to check our own code for vulnerabilities, because inherently the model cannot know whether I'm feeding my code or someone else's.
I hope it’s just limits on pentesting and stuff, and not for code analysis and review.
I'm not trying to make any kind of moral statement, but the company just feels toxic to me.
The LinkedIn/X influencers who hyped this as a Mythos-class model should be ashamed of themselves, but they’ll be too busy posting slop content about how “GPT-5.5 changes everything”.
The battle has just begun
I am still using Codex 5.3 and haven't switched to GPT 5.4 as I don't like the 'its automatic bro trust us', so wondering is Codex going to get these specific releases at all in the future.
I think Anthropic fearmongering and "leaks" of Mythos was them testing the ground for 5.x, which seems to have backfired.
The fact that GPT-5.5 is apparently even better at long-running tasks is very exciting. I don’t have access to it yet, but I’m really looking forward to trying it.
Maybe this is a crazy theory, but I sometimes feel like they gimp their existing models before a big release to you'll notice more of a "step".
I have to imagine they'll go to Gemini 3.5 if only for marketing reasons.
Numbers look too good, wondering if it is benchmaxxed or not
Now, after all this time, this must shurely be the release that does all software developers out of a job ?
Or has Dirty Sam being caught lying, again ?
Cos I've still got a programming job, and GPT can't do it for shit.
< 5 years until humans are buffered out of existence tbh
may the light of potentia spread forth beyond us
Imagine spending 100m on some of these AI “geniuses” and this is the best they can do.