undefined | Better HN

0 pointsmsoad1y ago0 comments

They are admitting[1] that the new model is the gpt2-chatbot that we have seen before[2]. As many highlighted there, the model is not an improvement like GPT3->GPT4. I tested a bunch of programming stuff and it was not that much better.

It's interesting that OpenAI is highlighting the Elo score instead of showing results for many many benchmarks that all models are stuck at 50-70% success.

[1] https://twitter.com/LiamFedus/status/1790064963966370209

[2] https://news.ycombinator.com/item?id=40199715

0 comments

cube22221y ago

I think the live demo that happened on the livestream is best to get a feel for this model[0].

I don't really care whether it's stronger than gpt-4-turbo or not. The direct real-time video and audio capabilities are absolutely magical and stunning. The responses in voice mode are now instantaneous, you can interrupt the model, you can talk to it while showing it a video, and it understands (and uses) intonation and emotion.

Really, just watch the live demo. I linked directly to where it starts.

Importantly, this makes the interaction a lot more "human-like".

[0]: https://youtu.be/DQacCB9tDaw?t=557

fvdessen1y ago

The demo is impressive but personally, as a commercial user, for my practical use cases, the only thing I care about is how smart it is, how accurate are its answers and how vast is its knowledge. These haven’t changed much since GPT-4, yet they should, as IMHO it is still borderline in its abilities to be really that useful

7 more replies

aaroninsf1y ago

Absolutely agree.

This model isn't about basemark chasing or being a better code generator; it's entirely explicitly focused on pushing prior results into the frame of multi-modal interaction.

It's still a WIP, most of the videos show awkwardness where its capacity to understand the "flow" of human speech is still vestigial. It doesn't understand how humans pause and give one another space for such pauses yet.

But it has some indeed magic ability to share a deictic frame of reference.

I have been waiting for this specific advance, because it is going to significantly quiet the "stochastic parrot" line of wilfully-myopic criticism.

It is very hard to make blustery claims about "glorified Markov token generation" when using language in a way that requires both a shared world model and an understanding of interlocutor intent, focus, etc.

This is edging closer to the moment when it becomes very hard to argue that system does not have some form of self-model and a world model within which self, other, and other objects and environments exist with inferred and explicit relationships.

This is just the beginning. It will be very interesting to see how strong its current abilities are in this domain; it's one thing to have object classification—another thing entirely to infer "scripts plans goals..." and things like intent, and, deixis. E.g. how well does it now understand "us" and "them" and "this" vs "that"?

Exciting times. Scary times. Yee hawwwww.

2 more replies

ChuckMcM1y ago

I expect the really solid use case here will be voice interfaces to applications that don't suck. Something I am still surprised at is that vendors like Apple have yet to allow me to train the voice to text model so that it only responds to me and not someone else.

So local modelling (completely offline but per speaker aware and responsive), with a really flexible application API. Sort of the GTK or QT equivalent for voice interactions. Also custom naming, so instead of "Hey Siri" or "Hey Google" I could say, "Hey idiot" :-)

Definitely some interesting tech here.

OJFord1y ago

I assume (because they don't address it or look at all phased) the audio cutting in and out is just an artefact of the stream?

1 more reply

mvdtnz1y ago

Interesting that they decided to keep the horrible ChatGPT tone ("wow you're doing a live demo right now?!"). It comes across just so much worse in voice. I don't need my "AI" speaking to me like I'm a toddler.

6 more replies

clhodapp1y ago

Call me overly paranoid/skeptical, but I'm not convinced that this isn't a human reading (and embellishing) a script. The "AI" responses in the script may well have actually been generated by their LLM, providing a defense against it being fully fake, but I'm just not buying some of these "AI" voices.

We'll have to see when end users actually get access to the voice features "in the coming weeks".

gabiruh1y ago

It's weird that the "airplane mode" seems to be ON on the phone during the entire presentation.

6 more replies

spaceman_20201y ago

This is going straight into 'Her' territory

snthpy1y ago

Hectic!

Thanks for this.

modeless1y ago

"not that much better" is extremely impressive, because it's a much smaller and much faster model. Don't worry, GPT-5 is coming and it will be better.

talldayo1y ago

Chalmers: "GPT-5? A vastly-improved model that somehow reduces the compute overhead while providing better answers with the same hardware architecture? At this time of year? In this kind of market?"

Skinner: "Yes."

Chalmers: "May I see it?"

Skinner: "No."

7 more replies

TIPSIO1y ago

Obviously given enough time there will always be better models coming.

But I am not convinced it will be another GPT-4 moment. Seems like big focus on tacking together multi-modal clever tricks vs straight better intelligence AI.

Hope they prove me wrong!

1 more reply

littlestymaar1y ago

I don't think a bigger model would make sense for OpenAI: it's much more important for them that they keep driving inference coat down, because there's no viable business model if they don't.

Improving the instruction tuning, the RLHF step, increase the training size, work on multilingual capabilities, etc. make sense as a way to improve quality, but I think increasing model size doesn't. Being able to advertize a big breakthrough may make sense in terms of marketing, but I don't believe it's going to happen for two reasons:

- you don't release intermediate steps when you want to be able to advertise big gains, because it raises the baseline and reduce the effectiveness of your ”big gains” in terms of marketing.

- I don't think they would benefit in an arm race with Meta, trying to keeping a significant edge. Meta is likely to be able to catch-up eventually on performance, but they are not so much of a threat in terms of business. Focusing on keeping a performance edge instead of making their business viable would be a strategic blunder.

1 more reply

mupuff12341y ago

And how can one be so sure of that?

Seems to me that performance is converging and we might not see a significant jump until we have another breakthrough.

4 more replies

moomoo111y ago

I really hope GPT5 is good. GPT4 sucks at programming.

3 more replies

jameshart1y ago

I think this comment is easily misread as implying that this GPT4o model is based on some old GPT2 chatbot - that’s very much not what you meant to say, though.

This model has been being tested under a code name of ‘gpt2-chatbot’ but it is very much a new GPT4+-level model, with new multimodal capabilities - but apparently some impressive work around inference speed.

Highlighting so people don’t get the impression this is just OpenAI slapping a new label on something a generation out of date.

lossolo1y ago

I agree. I tried a few programming problems that, let's say, seem to be out of the distribution of their training data and which GPT4 failed to solve before. The model couldn't find a similar pattern and failed to solve them again. What's interesting is that one of these problems were solved by Opus, which seems to indicate that the majority of progress in the last months should be attributed to the quality/source of the training data.

aixpert1y ago

useless anecdata but I find the new model very frustrating, often completely ignoring what I say in follow up queries. it's giving me serious Siri vibes

(text input in web version)

maybe it's programmed to completely ignore swearing but how could I not swear after it gave me repeatedly info about you.com when I try to address it in second person

dragonwriter1y ago

> As many highlighted there, the model is not an improvement like GPT3->GPT4.

The improvements they seem to be hyping are in multimodality and speed (also price – half that of GPT-4 Turbo – though that’s their choice and could be promotional, but I expect it’s at least in part, like speed, a consequence of greater efficiency), not so much producing better output for the same pure-text inputs.

kybercore1y ago

the model scores 60 points higher in lmsys than the best gpt 4 turbo model from april, that's still a pretty significant jump in text capability

avereveard1y ago

I tested a few use cases in the chat, and it's not particularly more intelligent but they seem to have solved laziness. I had to categorize my expenses to do some budgeting for the family, and in gpt 4 I had to go ten in ten, confirm the suggested category, download the file, took two days as I was constantly hitting the limit. gpt4o did most of the grunth work, then commincated anomalies in bulk, asked for suggestion for these, and provided a downloadable link in two answers, calling the code interpreter mulitple times, and working toward the goal on it's own.

and the prompt wasn't a monstrosity, and it wasn't even that good, it was just one line "I need help to categorize these expenses" and off it went. hope it won't get enshittified like turbo, because this finally feels as great as 3.5 was for goal seeking.

ozzydave1y ago

Heh - I'm using ChatGPT for the same thing! Works 10X better than Rocket Money, which was supposed to be an improvement on Mint but meh.

vitorgrs1y ago

They are admitting that is the im-also-a-good-gpt2-chatbot. There was 3.... Don't ask me why.

The "gpt2-chatbot" was the worst of the three.

j / k navigate · click thread line to collapse

0 comments

cube22221y ago

I think the live demo that happened on the livestream is best to get a feel for this model[0].

Really, just watch the live demo. I linked directly to where it starts.

Importantly, this makes the interaction a lot more "human-like".

[0]: https://youtu.be/DQacCB9tDaw?t=557

fvdessen1y ago

7 more replies

aaroninsf1y ago

Absolutely agree.

This model isn't about basemark chasing or being a better code generator; it's entirely explicitly focused on pushing prior results into the frame of multi-modal interaction.

But it has some indeed magic ability to share a deictic frame of reference.

I have been waiting for this specific advance, because it is going to significantly quiet the "stochastic parrot" line of wilfully-myopic criticism.

Exciting times. Scary times. Yee hawwwww.

2 more replies

ChuckMcM1y ago

Definitely some interesting tech here.

OJFord1y ago

I assume (because they don't address it or look at all phased) the audio cutting in and out is just an artefact of the stream?

1 more reply

mvdtnz1y ago

6 more replies

clhodapp1y ago

We'll have to see when end users actually get access to the voice features "in the coming weeks".

gabiruh1y ago

It's weird that the "airplane mode" seems to be ON on the phone during the entire presentation.

6 more replies

spaceman_20201y ago

This is going straight into 'Her' territory

snthpy1y ago

Hectic!

Thanks for this.

modeless1y ago

"not that much better" is extremely impressive, because it's a much smaller and much faster model. Don't worry, GPT-5 is coming and it will be better.

talldayo1y ago

Chalmers: "GPT-5? A vastly-improved model that somehow reduces the compute overhead while providing better answers with the same hardware architecture? At this time of year? In this kind of market?"

Skinner: "Yes."

Chalmers: "May I see it?"

Skinner: "No."

7 more replies

TIPSIO1y ago

Obviously given enough time there will always be better models coming.

But I am not convinced it will be another GPT-4 moment. Seems like big focus on tacking together multi-modal clever tricks vs straight better intelligence AI.

Hope they prove me wrong!

1 more reply

littlestymaar1y ago

I don't think a bigger model would make sense for OpenAI: it's much more important for them that they keep driving inference coat down, because there's no viable business model if they don't.

- you don't release intermediate steps when you want to be able to advertise big gains, because it raises the baseline and reduce the effectiveness of your ”big gains” in terms of marketing.

1 more reply

mupuff12341y ago

And how can one be so sure of that?

Seems to me that performance is converging and we might not see a significant jump until we have another breakthrough.

4 more replies

moomoo111y ago

I really hope GPT5 is good. GPT4 sucks at programming.

3 more replies

jameshart1y ago

I think this comment is easily misread as implying that this GPT4o model is based on some old GPT2 chatbot - that’s very much not what you meant to say, though.

Highlighting so people don’t get the impression this is just OpenAI slapping a new label on something a generation out of date.

lossolo1y ago

aixpert1y ago

useless anecdata but I find the new model very frustrating, often completely ignoring what I say in follow up queries. it's giving me serious Siri vibes

(text input in web version)

maybe it's programmed to completely ignore swearing but how could I not swear after it gave me repeatedly info about you.com when I try to address it in second person

dragonwriter1y ago

> As many highlighted there, the model is not an improvement like GPT3->GPT4.

kybercore1y ago

the model scores 60 points higher in lmsys than the best gpt 4 turbo model from april, that's still a pretty significant jump in text capability

avereveard1y ago

ozzydave1y ago

Heh - I'm using ChatGPT for the same thing! Works 10X better than Rocket Money, which was supposed to be an improvement on Mint but meh.

vitorgrs1y ago

They are admitting that is the im-also-a-good-gpt2-chatbot. There was 3.... Don't ask me why.

The "gpt2-chatbot" was the worst of the three.

j / k navigate · click thread line to collapse