Starting off the ChatGPT Plus web users with the Pro model, then later swapping it for the Standard model -- would meet the claims of model behavior consistency, while still qualifying as shenanigans.
In this particular case, I'm happy to report that the speedup is time per token, so it's not a gimmick from outputting fewer tokens at lower reasoning effort. Model weights and quality remain the same.
Happy to retract if you can state [0] is false.
I won't BS you that costs are never part of our decision making. If costs didn't matter, we'd have unlimited rate limits and 10M token context windows and subscription pricing of $0. But as someone in the room where these decisions are made, I can honestly report that our goal is almost always trying to figure out how to make people happier, not trick them. We're trying to fairly earn subscriptions, not scam anyone. In the cases where we have accidentally misled people (e.g., saying voice mode was weeks away), it was optimistic planning, not nefarious intent.
API model behavior is guaranteed to nearly stay the same (modulo standard non-determinism, bugs, etc.). ChatGPT is harder to promise, not because we pull more shenanigans there, but just because we might tweak system prompts, add/remove tools, run A/B tests, etc. that vary performance a bit. But we definitely don't do things like quantize during busy parts of the day or nerf models after publishing evals - that would feel pretty shady.
I.e.: "keep API model behavior constant" says nothing about the consumer ChatGPT web app, mobile apps, third-party integrations, etc.
Similarly, it might mean very specifically that a "certain model timestamp" remains constant but the generic "-latest" or whatever model name auto-updates "for your convenience" to the new faster performance achieved through quantisation or reduced thinking time.
You might be telling the full, unvarnished truth, but after many similar claims from OpenAI that turned out to be only technically true, I remain sceptical.
ChatGPT model behavior can definitely change over time. We share release notes here (https://help.openai.com/en/articles/6825453-chatgpt-release-...), and we also make changes or run A/B tests that aren't reported there. Plus, ChatGPT has memory, so as you use it, its behavior can technically change even with no changes on our end.
That said, I do my best to be honest and communicate the way that I would want someone to communicate with me.
its worth you guys doing on your end, some analysis of why customers are getting worse results a week or two later, and putting out some guidelines about what context is poisonous and the like
I don't think they perceive it as a con game, on the contrary. They say below: "we also recently reduced the thinking effort in ChatGPT. Our intent here was purely user experience, not cost savings."
They are not the only ones playing this game. Google did the same with Gemini Pro.
Anthropic:
In the past month, OpenAI has released for codex users:
- subagents support
- a better multi agent interface (codex app)
- 40% faster inference
No joke, with the first two my productivity is already up like 3x. I am so stoked to try this out.
[features]
collab = trueThen from that they realized they could just run API calls more like staff, fast, not at capacity.
Then they leave the billion other people's calls at remaining capacity.
https://thezvi.substack.com/i/185423735/choose-your-fighter
> Ohqay: Do you get faster speeds on your work account?
> roon: yea it’s super fast bc im sure we’re not running internal deployment at full load