undefined | Better HN

0 pointsfamouswaffles2y ago0 comments

Speech is a lot more than just the words being conveyed.

Tone, Emphasis, Speed, Accent are all very important parts of how humans communicate verbally.

Before today, voice mode was strictly your audio>text then text>audio. All that information destroyed.

Now the same model takes in audio tokens and spits back out audio tokens directly.

Watch this demo, it's the best example of the kind of thing that would be flat out impossible with the previous setup.

https://www.youtube.com/live/DQacCB9tDaw?si=2LzQwlS8FHfot7Jy

0 comments

7 comments · 2 top-level

scarface_742y ago· 3 in thread

The ability to have an interactive voice conversation has been available for the iOS app for the longest.

kaibee2y ago

Kinda stretching the definition of interactive there.

scarface_742y ago

How so? You don’t have to press the mic button after every sentence. You press the headphone button and speak like you normally would and it speaks back once you stop talking.

How much more “interactive” could it be?

1 more reply

famouswafflesOP2y ago

Right but this works differently.

barrell2y ago· 2 in thread

Flat out impossible? If you mean “without clicking anything”, sure, but you could interrupt with your thumb, exit chat to send images and go back (maybe video too, I’ve never had any need), and honestly the 2-3 second response time never once bothered me.

I’m very excited about all these updates and it’s really cool tech, but all I’m seeing is quality of life improvements and some cool engineering.

That’s not necessarily a bad thing. Not everything has to be magic or revolutionary to be a cool update

famouswafflesOP2y ago

Did you even watch the video ? It's just baffling how I have to spell this out.

Skip to 11:50 or watch the very first demo with the breathing. None of that is possible with TTS and STT. You can't ask old voice mode to slow down or modulate tone or anything like that because it's just working with text.

barrell2y ago

Yes I watched the demo. True those things were not possible, so if that’s what’s blowing you away then fair enough I guess. For me that doesn’t impact at all anything have ever used voice for or probably will ever use voice for.

I’ve voice chatted with ChatGPT for hundreds of hours and never once thought “can you modulate your tone please?”, so those improvements are a far cry from magic or revolutionary imho. Again, that’s not to say they aren’t cool tech, forward advancements, or impressive —- but magic or revolutionary are pretty high bars.

To each their own though.

2 more replies

j / k navigate · click thread line to collapse

0 comments

7 comments · 2 top-level

scarface_742y ago· 3 in thread

The ability to have an interactive voice conversation has been available for the iOS app for the longest.

kaibee2y ago

Kinda stretching the definition of interactive there.

scarface_742y ago

How so? You don’t have to press the mic button after every sentence. You press the headphone button and speak like you normally would and it speaks back once you stop talking.

How much more “interactive” could it be?

1 more reply

famouswafflesOP2y ago

Right but this works differently.

barrell2y ago· 2 in thread

I’m very excited about all these updates and it’s really cool tech, but all I’m seeing is quality of life improvements and some cool engineering.

That’s not necessarily a bad thing. Not everything has to be magic or revolutionary to be a cool update

famouswafflesOP2y ago

Did you even watch the video ? It's just baffling how I have to spell this out.

barrell2y ago

To each their own though.

2 more replies

j / k navigate · click thread line to collapse