Should (will?) Apple have to open up and allow other agents direct access in the same way Siri has? Either allowing user control over the backend that Siri uses for everything, or alternatively allowing other agents system access so that “hey Alexa” or “hey ChatGPT” are monitored and actioned the same as “hey Siri”?
All that to say that currently the market for voice assistants is nascent. No one has paid (directly) for Siri or Ok-google, and amazon can also make the case that you don't pay for Alexa, you pay for the device.
Now that OpenAI has a better service, that is also unbundled from hardware, and understandably people might want to substitute Siri, then iOS only offering Siri can start to be considered anticompetitive.
I also wonder if the way that Apple is integrating OpenAI into Siri now is a gambit against such a thing.
If Siri acts as a frontend to other competing products, then it likely keeps the wolves at bay so to speak.
For example:
1. Siri can run Shazam/Music Recognition on a locked phone, but has no Shortcut/Automation ability to copy over the song info to a native note, which is as simple as: 'Run Recognize Music; delay; Create note with $Title - $Artist in $Notebook; dismiss Siri; Stop.' If I try telling Siri to run this Shortcut, it asks me to unlock the phone. The Shortcut has 'Allow Running When Locked' set to ON. I guess Siri is limited in one of two ways: it is either unable to access Shortcuts or it can't run Music Recognition. The latter would be bonkers, since it can do that natively by me saying 'Shazam', and the former simply disrespects my settings and gives me no way of making it work.
There used to be a workaround for this: you'd first enable Voice Control (an accessibility feature that's different from Siri, I guess), tell Siri to turn Voice Control on, call it out and tell it to run the Shortcut. It runs, but asks for a passcode as soon as it has to open the Notes app. Useless.
2. I can't start a voice recording without unlocking my phone, while I am able to launch the Camera app and start a video recording, which makes no sense to me.
3. I can't turn Location Services off without unlocking, which might be one of those instances where Apple thinks it knows better (i.e. the phone's stolen or I get kidnapped and the perpetrators can't turn location off as easily), but my default is having it off and using location only when I need it (e.g., navigating in an unfamiliar location, being lost, and... that's about it).
ChatGPT already ignores instructions such as 'questions are NEVER rhetorical, answer every question I ask directly', 'NEVER apologise or say sorry' and 'NEVER pretend to be human, and NEVER imply you have emotion or personality'.
I worry that OpenAI will make it worse from leaning into this Her stuff.
I just don't want to talk to a pseudo-human. I want to talk to the machine.
From that video: I would have wanted to complete that conversation within ten seconds, any more and it is wasting my time with its personality.
Human: "French - Pronounce croissant"
Bot: "Crossiant. Notice emphasis on nasal `iant` syllable. Crossiant"
Human: "Pronuniation of baguette"
Bot: "Baguette. Notice emphasis on second syllable. Baguette."
Any less density than that, and I feel like OpenAI doesn't respect me or my time.
"We’re starting to roll out advanced Voice Mode to a small group of ChatGPT Plus users. Advanced Voice Mode offers more natural, real-time conversations, allows you to interrupt anytime, and senses and responds to your emotions.
Users in this alpha will receive an email with instructions and a message in their mobile app. We'll continue to add more people on a rolling basis and plan for everyone on Plus to have access in the fall. As previously mentioned, video and screen sharing capabilities will launch at a later date.
Since we first demoed advanced Voice Mode, we’ve been working to reinforce the safety and quality of voice conversations as we prepare to bring this frontier technology to millions of people.
We tested GPT-4o's voice capabilities with 100+ external red teamers across 45 languages. To protect people's privacy, we've trained the model to only speak in the four preset voices, and we built systems to block outputs that differ from those voices. We've also implemented guardrails to block requests for violent or copyrighted content.
Learnings from this alpha will help us make the Advanced Voice experience safer and more enjoyable for everyone. We plan to share a detailed report on GPT-4o’s capabilities, limitations, and safety evaluations in early August."