undefined | Better HN

0 pointssimonw7mo ago0 comments

I had preview access for a couple of weeks. I've written up my initial notes so far, focusing on core model characteristics, pricing (extremely competitive) and lessons from the model card (aka as little hype as possible): https://simonwillison.net/2025/Aug/7/gpt-5/

0 comments

dang7mo ago

Related ongoing thread:

GPT-5: Key characteristics, pricing and model card - https://news.ycombinator.com/item?id=44827794

nilsherzig7mo ago

> In my own usage I’ve not spotted a single hallucination yet

Did you ask it to format the table a couple paragraphs above this claim after writing about hallucinations? Because I would classify the sorting mistake as one

simonwOP7mo ago

That wasn't a hallucination, that was it failing to sort things correctly.

nilsherzig7mo ago

So a hallucination would have been if it made up a new row?

What about the „9.9 / 9.11“ example?

It’s unclear to me where to draw the line between skill issue and hallucination. I image that one influences the other?

jaccola7mo ago

Out of interest, how much does the model change (if at all) over those 2 weeks? Does OpenAI guarantee that if you do testing from date X, that is the model (and accompaniments) that will actually be released?

I know these companies do "shadow" updates continuously anyway so maybe it is meaningless but would be super interesting to know, nonetheless!

simonwOP7mo ago

It changed quite a bit - we got new model IDs to test every few days. They did tell us when the model was "frozen", and I ran my final tests against those IDs.

OpenAI and Anthropic don't update models without changing their IDs, at least for model IDs with a date in them.

OpenAI do provide some aliases, and their gpt-5-chat-latest and chatgpt-4o-latest model IDs can change without warning, but anything with a date in (like gpt-5-2025-08-07) stays stable.

BryantD7mo ago

In the interests of gathering these pre-release impressions, here's Ethan Mollick's writeup: https://www.oneusefulthing.org/p/gpt-5-it-just-does-stuff

Thank you to Simon; your notes are exactly what I was hoping for.

candiddevmike7mo ago

This post seems far more marketing-y than your previous posts, which have a bit more criticality to them (such as your Gemini 2.5 blog post here: https://simonwillison.net/2025/Jun/17/gemini-2-5/). You seem to gloss over a lot of GPT-5's shortcomings and spend more time hyping it than other posts. Is there some kind of conflict of interest happening?

simonwOP7mo ago

You really think so? My goal with this post was to provide the non-hype commentary - hence my focus on model characteristics, pricing and interesting notes from the system card.

I called out the prompt injection section as "pretty weak sauce in my opinion".

I did actually have a negative piece of commentary in there about how you couldn't see the thinking traces in the API... but then I found out I had made a mistake about that and had to mostly remove that section! Here's the original (incorrect) text from that: https://gist.github.com/simonw/eedbee724cb2e66f0cddd2728686f... - and the corrected update: https://simonwillison.net/2025/Aug/7/gpt-5/#thinking-traces-...

The reason there's not much negative commentary in the post is that I genuinely think this model is really good. It's my favorite model right now. The moment that changes (I have high hopes for Claude 5 and Gemini 3) I'll write about it.

drewbitt7mo ago

I am seeing the conflict from other tech influencers who were given early access or even invited to OpenAI events pre-release.

simonwOP7mo ago

I was invited to the OpenAI event pre-release too - here's my post about that: https://simonwillison.net/2025/Aug/7/previewing-gpt-5/

yahoozoo7mo ago

Like many other industries, you probably lose preview access if you are negative.

yahoozoo7mo ago

Also, when most people have already dismissed OpenAI’s open weight models as trash, there’s this: https://simonwillison.net/2025/Aug/5/gpt-oss/

Suspicious.

1 more reply

camgunz7mo ago

From the guidelines: Please don't post insinuations about astroturfing, shilling, brigading, foreign agents, and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data.

mhh__7mo ago

I don't think that this applies to commenting on someone's blog.

1 more reply

HAL30007mo ago

Maybe there is a misconception about what his blog is about. You should treat it more like a YouTuber reporting, not an expert evaluation, more like an enthusiast testing different models and reiterating some points about them, but not giving the opinions of an expert or ML professional. His comment history on this topic in this forum clearly shows this.

It’s reasonable that he might be a little hyped about things because of his feelings about them and the methodology he uses to evaluate models. I assume good faith, as the HN guidelines propose, and this is the strongest plausible interpretation of what I see in his blog.

simonwOP7mo ago

I consider myself an expert in the field of LLMs, and I try to write in a way that supports that.

3 more replies

blackhaj77mo ago

If Simon isn't an expert then I am not sure who is

dcreater7mo ago

Yes I noticed the same. This is very concerning

j / k navigate · click thread line to collapse

0 comments

dang7mo ago

Related ongoing thread:

GPT-5: Key characteristics, pricing and model card - https://news.ycombinator.com/item?id=44827794

nilsherzig7mo ago

> In my own usage I’ve not spotted a single hallucination yet

Did you ask it to format the table a couple paragraphs above this claim after writing about hallucinations? Because I would classify the sorting mistake as one

simonwOP7mo ago

That wasn't a hallucination, that was it failing to sort things correctly.

nilsherzig7mo ago

So a hallucination would have been if it made up a new row?

What about the „9.9 / 9.11“ example?

It’s unclear to me where to draw the line between skill issue and hallucination. I image that one influences the other?

jaccola7mo ago

I know these companies do "shadow" updates continuously anyway so maybe it is meaningless but would be super interesting to know, nonetheless!

simonwOP7mo ago

It changed quite a bit - we got new model IDs to test every few days. They did tell us when the model was "frozen", and I ran my final tests against those IDs.

OpenAI and Anthropic don't update models without changing their IDs, at least for model IDs with a date in them.

OpenAI do provide some aliases, and their gpt-5-chat-latest and chatgpt-4o-latest model IDs can change without warning, but anything with a date in (like gpt-5-2025-08-07) stays stable.

BryantD7mo ago

In the interests of gathering these pre-release impressions, here's Ethan Mollick's writeup: https://www.oneusefulthing.org/p/gpt-5-it-just-does-stuff

Thank you to Simon; your notes are exactly what I was hoping for.

candiddevmike7mo ago

simonwOP7mo ago

You really think so? My goal with this post was to provide the non-hype commentary - hence my focus on model characteristics, pricing and interesting notes from the system card.

I called out the prompt injection section as "pretty weak sauce in my opinion".

drewbitt7mo ago

I am seeing the conflict from other tech influencers who were given early access or even invited to OpenAI events pre-release.

simonwOP7mo ago

I was invited to the OpenAI event pre-release too - here's my post about that: https://simonwillison.net/2025/Aug/7/previewing-gpt-5/

yahoozoo7mo ago

Like many other industries, you probably lose preview access if you are negative.

yahoozoo7mo ago

Also, when most people have already dismissed OpenAI’s open weight models as trash, there’s this: https://simonwillison.net/2025/Aug/5/gpt-oss/

Suspicious.

1 more reply

camgunz7mo ago

mhh__7mo ago

I don't think that this applies to commenting on someone's blog.

1 more reply

HAL30007mo ago

simonwOP7mo ago

I consider myself an expert in the field of LLMs, and I try to write in a way that supports that.

3 more replies

blackhaj77mo ago

If Simon isn't an expert then I am not sure who is

dcreater7mo ago

Yes I noticed the same. This is very concerning

j / k navigate · click thread line to collapse