It's really how it works.
Winner of the 'understatement of the week' award (and it's only Monday).
Also top contender in the 'technically correct' category.
Seems like these would be similar.
As a language learner, this would be tremendously useful.
The web page implies you can try it immediately. Initially it wasn't available.
A few hours later it was in both the web UI and the mobile app - I got a popu[ telling me that GPT-4o was available. However nothing seems to be any different. I'm not given any option to use video as an input, the app can't seem to pick up any new info from my voice.
I'm left a bit confused as to what I can do that I couldn't do before. I certainly can't seem to recreate much of the stuff from the announcement demos.
I imagine that there is a lot of usage at the HQ, human + AI karaoke?
Sorry to hijack, but how the hell can I solve this? I have the EXACT SAME error on two iOS devices (native app only — web is fine), but not on Android, Mac, or Windows.
It will be fully available in Eu with the GDPR compliance?
(not that this is the most important thing about the announcement at all. Just an aside)
Yeh it's cringe. I had to stop listening.
Why did they make the woman sound like she's permanently on the brink of giggling? It's nauseating how overstated her pretentious banter is. Somewhere between condescending nanny and preschool teacher. Like how you might talk to a child who's at risk of crying so you dial up the positive reinforcement.
I believe it can be toned down using system prompts, which they'll expose in future iterations
Consequences of audio2audio (rather than audio >text text>audio). Being able to manipulate speech nearly as well as it manipulates text is something else. This will be a revelation for language learning amongst other things. And you can interrupt it freely now!
I could be wrong but I haven't seen any non-speech demos.
Magic.
Based on the casual production of these videos, the product must be this good.
The new voice mode sounds better, but the current voice mode did also have inflection that made it feel much more natural than most computer voices I've heard before.
Link in case other readers are curious: https://llm.datasette.io
We've already seen how much damage dishonest actors can do by manipulating our text communications with words they don't mean, plans they don't intend to follow through on, and feelings they don't experience. The social media disinfo age has been bad enough.
Are you sure you want a machine which is able to manipulate our emotions on an even more granular and targetted level?
LLMs are still machines, designed and deployed by humans to perform a task. What will we miss if we anthropomorphize the product itself?
The only slightly annoying thing at the moment is they seem hard to interrupt, which is an important mechanism in conversations. But that seems like a solvable problem. They kind of need to be able to interpret body language a bit to spot when the speaker is about to interrupt.