These demos show people talking to artificial intelligence. This is new. Humans are more partial to talking than writing. When people talk to each other (in person or over low-latency audio) there's a rich metadata channel of tone and timing, subtext, inexplicit knowledge. These videos seem to show the AI using this kind of metadata, in both input and output, and the conversation even flows reasonably well at times. I think this changes things a lot.
I'm also incredibly excited about the possibility of this as an always available coding rubber duck. The multimodal demos they showed really drove this home, how collaboration with the model can basically be as seamless as screensharing with someone else. Incredible.
I don't want to chat with computers to do basic things. I only want to chat with computers when the goal is to iterate on something. If the computer is too dumb to understand the request and needs to initiate iteration, I want no part.
(See also 'The Expanse' for how sci-fi imagined this properly.)
Is that because you're not used to it? Honestly asking.
This is probably the first time it feels natural where as all our previous experiences make "chat bots" and "automated phone systems", "automated assistants" absolutely terrible.
Naturally, we dislike it because "it's not human". But this is true of pretty much any thing that approaches "uncanny valley". But, if the "it's not human" solves your answer 100% better/faster than the human counter part, we tend to accept it a lot faster.
This is the first real contender. Siri was the "glimpse" and ChatGPT is probably the reality.
[EDIT]
https://vimeo.com/945587328 the Khan academy demo is nuts. The inflections are so good. It's pretty much right there in the uncanny valley because it does still feel like you're talking to a robot but it also directly interacting with it. Crazy stuff.
But of course this was the age-old debate with our favorite golden-eyed android; and unsurprisingly, he too received the same sort of animosity:
Bones was deeply skeptical when he first met Data: "I don't see no points on your ears, boy, but you sound like a Vulcan." And we all know how much he loved those green-blooded fools.
Likewise, Dr. Pulanski has since been criticized for her rude and dismissive attitudes towards Data that had flavors of what might even be considered "racism," or so goes the Trekverse discussion on the topic.
And let's of course not forget when he was on trial essentially for "humanity," or whether hew as indeed just the property of Starfleet, and nothing more.
More recent incarnations of Star Trek: Picard illustrated the outright ban on "synthetics" and indeed their effective banishment; non-synthetic life -- from human to Roman -- simply weren't ok with them.
Yes this is all science fiction silliness -- or adoration depending on your point of view -- but I think it very much reflects the myriad directions our real life world is going to scatter (shatter?) in the coming years ahead.
We get the upside of conversation, and avoid the downside of falling asleep at the wheel (as Ethan Mollick mentions in "Co-Intelligence".)
I was literally just thinking about this a few days ago... that we need a multi-modal language model with speech training built-in.
As soon as this thing rolls out, we'll be talking to language models like we talk to each other. Previously it was like dictating a letter and waiting for the responding letter to be read to you. Communication is possible, but not really in the way that we do it with humans.
This is MUCH more human-like, with the ability to interrupt each other and glean context clues from the full richness of the audio.
The model's ability to sing is really fascinating. It's ability to change the sound of its voice -- its pacing, its pitch, its tonality. I don't know how they're controlling all that via GPT-4o tokens, but this is much more interesting stuff than what we had before.
I honestly don't fully understand the implications here.
Amazon, Google, and Apple have sunk literally billions of dollars into this idea only to find out that, no, we aren't.
We are with other humans, yes. When socialization is part of the conversation. When I'm talking to my local barista I'm not just ordering a coffee, I'm also maintaining a relationship with someone in my community.
But when it comes to work, writing >>> talking. Writing is clarity of ideas. Talking is cult of personality.
And when it comes to inputs/outputs, typing is more precise and more efficient.
Don't get me wrong, this is an incredibly revolutionary piece of technology, but I don't think the benefits of talking you're describing (timing, subtext, inexplicit knowledge) are achievable here either (for now), since even that requires HOURS of interaction over days/weeks/months of experiences for humans to achieve with each other.
>>> But when it comes to work, writing >>> talking. Writing is clarity of ideas. Talking is cult of personality.
A lot of people think of their colleagues as part of a professional community as well, though.
Is it so?
Speaking most of the time is for short exchange of information (pleasantries to essential information exchanges).
I prefer writing for long in-depth thought exchanges (whether by emails, blogs etc.)
In many cultures - European or Asian, people are not very loquacious in everyday life.
I’m 100% a text everything never calls person but I can’t live without Alexa these days, every time I’m in a hotel or on vacation I nearly ask a question out loud.
I also hate how much Alexa sucks so this is a big deal. I spent years weeding out what it could do and can’t do so it will be nice to have one that I don’t have to treat like a toddler
(We mostly use it in car trips -- great for keeping the kids (ages 8, 12) occupied with endless Harry Potter trivia questions, answers to science questions, etc.)
Besides - not sure if I want this level of immersion/fake when talking to a computer...
"Her" comes to mind pretty quickly…
If you don’t complete your thought in one go, you have to insert filler words to keep it listening.
I've long felt that embracing the concept of the 'prompt' was a terrible idea for Siri and all the other crappy voice assistants. They built ecosystems on top of this dumb reduction, which only engineers could have made: that _talking to someone_ is basically taking turns to compose a series of verbal audio snippets in a certain order.
The previous ChatAI app was getting pretty good once you learned the difference between run on sentences or breaking it up enough.
The tonality and inflections in the voice are a little too good.
Most people put on a spectrum/average aren't that good at speaking and communicating and that stands out as an uncanny valley approach. It is mindbogglingly good at it though.
I don't think that's generally true, other than for socializing with other humans.
Note how people, now having a choice, prefer to text each other most of the time rather than voice call.
I don't think people sitting at work in their cube farm want to be talking to their computer either. The main use for voice would seem to be for occasional use talking to an assistant on a smartphone.
Maybe things will change in the future when we get to full human AGI level, treating the AGI as an equal, more as a person.
More on the IBM Personal Speech Assistant for which I am on a patent (since expired) by Liam Comerford: http://liamcomerford.com/alphamodels3.html "The Personal Speech Assistant was a project aimed at bringing the spoken language user interface into the capabilities of hand held devices. David Nahamoo called a meeting among interested Research professionals, who decided that a PDA was the best existing target. I asked David to give me the Project Leader position, and he did. On this project I designed and wrote the Conversational Interface Manager and the initial set of user interface behaviors. I led the User Interface Design work, set specifications and approved the Industrial Design effort and managed the team of local and offsite hardware and software contractors. With the support of David Frank I interfaced it to a PC based Palm Pilot emulator. David wrote the Palm Pilot applications and the PPOS extensions and tools needed to support input from an external process. Later, I worked with IBM Vimercati (Italy) to build several generations of processor cards for attachment to Palm Pilots. Paul Fernhout, translated (and improved) my Python based interface manager into C and ported it to the Vimercati coprocessor cards. Jan Sedivy's group in the Czech Republic Ported the IBM speech recognizer to the coprocessor card. Paul, David and I collaborated on tools and refining the device operation. I worked with the IBM Design Center (under Bob Steinbugler) to produce an industrial design. I ran acoustic performance tests on the candidate speakers and microphones using the initial plastic models they produced, and then farmed the design out to Insync Designs to reduce it to a manufacturable form. Insync had never made a functioning prototype so I worked closely with them on Physical UI and assemblability issues. Their work was outstanding. By the end of the project I had assembled and distributed nearly 100 of these devices. These were given to senior management and to sales personnel."
Thanks for the fun/educational/interesting times, Liam!
As a bonus for that work, I had been offered one of the chessboards that been used when IBM Deep Blue defeated Garry Kasparov, but I turned it down as I did not want a symbol around of AI defeating humanity.
Twenty-five years later, how far that aspiration towards conversational speech with computers has come. Some ideas I've put together to help deal with the fallout: https://pdfernhout.net/beyond-a-jobless-recovery-knol.html "This article explores the issue of a "Jobless Recovery" mainly from a heterodox economic perspective. It emphasizes the implications of ideas by Marshall Brain and others that improvements in robotics, automation, design, and voluntary social networks are fundamentally changing the structure of the economic landscape. It outlines towards the end four major alternatives to mainstream economic practice (a basic income, a gift economy, stronger local subsistence economies, and resource-based planning). These alternatives could be used in combination to address what, even as far back as 1964, has been described as a breaking "income-through-jobs link". This link between jobs and income is breaking because of the declining value of most paid human labor relative to capital investments in automation and better design. Or, as is now the case, the value of paid human labor like at some newspapers or universities is also declining relative to the output of voluntary social networks such as for digital content production (like represented by this document). It is suggested that we will need to fundamentally reevaluate our economic theories and practices to adjust to these new realities emerging from exponential trends in technology and society."
Another idea for dealing with the consequences is using AI to facilitate Dialogue Mapping with IBIS for meetings to help small groups of people collaborate better on "wicked problems" like dealing with AI's pros and cons (like in this 2019 talk I gave at IBM's Cognitive Systems Institute Group). https://twitter.com/sumalaika/status/1153279423938007040
Talk outline here: https://cognitive-science.info/wp-content/uploads/2019/07/CS...
A video of the presentation: https://cognitive-science.info/wp-content/uploads/2019/07/zo...
Yeah, and it's only the beginging.