undefined | Better HN

0 pointsSoftTalker1y ago0 comments

Yes, agree. Chatting with a computer has all the worst attributes of talking to a person, without any of the intuitive understanding, nonverbal cues, even tone of voice, that all add meaning when two human beings talk to each other.

0 comments

13 comments · 4 top-level

TeMPOraL1y ago· 6 in thread

That comment made sense 3 years ago. LLMs already solved "intuitive understanding", and the realtime multimodal variants (e.g. the thing behind "Advanced Voice" in ChatGPT app) handle tone of voice in both directions. As for nonverbal cues, I don't know yet - I got live video enabled in ChatGPT only few days ago and didn't have time to test it, but I would be surprised if it couldn't read the basics of body language at this point.

Talking to a computer still sucks as an user interface - not because a computer can't communicate on multiple channels the way people do, as it can do it now too. It sucks for the same reason talking to people sucks as an user interface - because the kind of tasks we use computers for (and that aren't just talking with/to/at other people via electronic means) are better handle by doing than by talking about them. We need an interface to operate a tool, not an interface to an agent that operates a tool for us.

As an example, consider driving (as in, realtime control - not just "getting from point A to B"): a chat interface to driving would suck just as badly as being a backseat driver sucks for both people in the car. In contrast, a steering wheel, instead of being a bandwidth-limiting indirection, is an anti-indirection - not only it lets you control the machine with your body, the control is direct enough that over time your brain learns to abstract it away, and the car becomes an extension of your body. We need more of tangible interfaces like that with computers.

The steering wheel case, of course, would fail with "AI-level smarts" - but that still doesn't mean we should embrace talking to computers. A good analogy is dance - it's an interaction between two independently smart agents exploring an activity together, and as they do it enough, it becomes fluid.

So dance, IMO, is the steering wheel analogy for AI-powered interfaces, and that is the space we need to explore more.

ryandrake1y ago

> We need an interface to operate a tool, not an interface to an agent that operates a tool for us.

Excellent comment and it gets to the heart of something I've had trouble clearly articulating: We've slowly lost the concept that a computer is a tool that the user wields and commands to do things. Now, a computer has its own mind and agency, and we "request" it to do things and "communicate" with it, and ask it to run this and don't run that.

Now, we're negotiating and pleading with the man inside of the computer, Mr. Computer, who has its own goals and ambitions that don't necessarily align with your own as a user. It runs what it wants to run, and if that upsets you, user, well tough shit! Instead of waiting for a command and then faithfully executing it, Mr. Computer is off doing whatever the hell he wants, running system applications in the background, updating this and that, sending you notifications, and occasionally asking you for permission to do even more. And here you are as the user, hobbled and increasingly forced to "chat" with it to get it to do what you want.

Even turning your computer off! You used to throw a hardware switch that interrupts the power to the main board, and _sayonara_ Mr. Computer! Now, the switch does nothing but send an impassioned plea to the operating system to pretty please, with sugar on top, when you're not busy could you possibly power off the computer (or mostly power it off, because off doesn't even mean off anymore).

xp841y ago

This is a great observation. I've mostly thought of it, not in relation to AI, but in relation to the way Apple and to a lesser extent, Microsoft, act like they are the owners of the computers we "buy." An update will be installed now. Your silly user applications will be closed by force if necessary. System stability depends on it!

The modern OS values the system's theoretical 'system health' metrics far above things like "whether the user can use it to do some user task."

Another great example is how you can't boot a modern Mac laptop, on AC power, until it has decided its battery is sufficiently charged. Why? None of your business.

Anyway to get back on topic, this is an interesting connection you've made, the software vendor will perhaps delegate decisions like "is the user allowed to log into the computer at this time" or "is a reboot mandatory" to an "agent" running on the computer. If we're lucky we'll get to talk to that agent to plead our case, but my guess is Apple and Microsoft will decide we aren't qualified to have input to the decisions.

1 more reply

Karrot_Kream1y ago

> Now, a computer has its own mind and agency, and we "request" it to do things and "communicate" with it, and ask it to run this and don't run that.

FWIW this happens what happens with modern steering wheels as well. Power steering is its own complicated subsystem that isn't just about user input. It has many more failure modes than an old-fashioned, analog steering wheel. The reason folks feel like "Mr. Computer" has a mind of its own is because of the mismatch between user desire and effect. This is a UX problem.

I also think chat and RAG are the biggest two UX paradigms we've spent exploring when it comes to LLMs. It's probably worth folks exploring other UX for LLMs that are enabling for the user. Suggestions in documents and code seem to be a UX that more people enjoy using but even then there's a mismatch.

smj-edison1y ago

This is one reason I love what Bret Victor has been doing with Dynamic Land[1]. He's really been doing in on trying to engage as many senses as possible, and make the whole system understandable. One of his big points is that the future in technology is helping us understand more, not defer our understanding to something else.

[1] https://dynamicland.org/

EDIT: love your analogy to dance!

taeric1y ago

I think this gets to how a lot of these conversations go past each other? A chat interface for getting a ride from a car is almost certainly doable? So long as the itinerary and other details remain separate things? At large, you are basically using a chat bot to be a travel agent, no?

But, as you say, a chat interface would be a terrible way to actively drive a car. And that is a different thing, but I'm growing convinced many will focus on the first idea while staving off the complaints of the latter.

In another thread, I assert that chat is probably a fine way to order up something that fits a repertoire that trained a bot. But, I don't think sticking to the chat window is the best way to interface with what it delivers. You almost certainly want to be much more actively "hands on" in very domain specific ways with the artifacts produced.

TeMPOraL1y ago

> But, I don't think sticking to the chat window is the best way to interface with what it delivers. You almost certainly want to be much more actively "hands on" in very domain specific ways with the artifacts produced.

Yes, this is what I've also tried to hint at in my comment, but failed part-way. In most of the cases I can imagine chat interface to be fine (or even ideal), it's really only good as a starting point. Take two examples based on your reply:

1) Getting a car ride. "Computer, order me a cab home" is a good start. It's even OK if I then get asked to narrow it down between several different services/fares (next time I'll remember to specify that up front). But if I want to inspect the route (or perhaps adjust it, in a hypothetical service that supports it), I'd already prefer an interactive map I can scroll and zoom, with PoIs I can tap on to get their details, than to continue a verbal chat.

2) Ordering food in a fast food restaurant. I'm fine starting it with a conversation if I know what I want. However, getting back the order summary in prose (or worse, read out loud) would already be taxing, and if I wanted to make final adjustments, I'd beg for buttons and numeric input boxes. And, in case I don't know what I want, or what is available (and at what prices), a chat interface is a non-starter. Interactive menu is a must.

You sum this up perfectly:

> You almost certainly want to be much more actively "hands on" in very domain specific ways with the artifacts produced.

Chat may be great to get that first artifact, but afterwards, there's almost always a more hands-on interface that would be much better.

1 more reply

taeric1y ago· 2 in thread

Yeah, this is something I didn't make clear on my post. Chat between people is the same bad UI. People read in the aggression that they bring to their reading. And get mad at people who are legit trying to understand something.

You have some of the same problems with email, of course. Losing threading, in particular, made things worse. It was a "chatification of email" that caused people to lean in to email being bad. Amusing that we are now seeing chat applications rise to replace email.

SoftTalkerOP1y ago

Yeah this is part of why RTO is not an entirely terrible idea. Remote work has these downsides -- working with another person over a computer link sucks pretty hard, no matter how you do it (not saying WFH doesn't have other very real upsides).

taeric1y ago

Agreed.

I'm actually in an awkward position where I was very supportive of RTO two years ago, but have since become very reliant on some things I could not do with a rigid RTO policy.

Regardless of RTO or WFH, patience and persistence remain vital qualities.

aylmao1y ago· 1 in thread

I would also call it having all the worst attributes of a CLI, without the succinctness, OS integration, and program composability of one.

1ucky1y ago

You should check out out MCP by Anthropic, which solves some of the issues you mentioned.

hakfoo1y ago

The idea of chat interfaces always seemed to be to disguise available functionality.

It's a CLI without the integrity. When you bought a 386, it came with a big book that said "MS-DOS 4.01" and enumerated the 75 commands you can type at the C:\> prompt and actually make something useful happen.

When you argue with ChatGPT, its whole business is to not tell you what those 75 commands are. Maybe your prompt fits its core competency and you'll get exactly what you wanted. Maybe it's hammering what you said into a shape it can parse and producing marginal garbage. Maybe it's going to hallucinate from nothing. But it's going to hide that behind a bunch of cute language and hopefully you'll just keep pulling the gacha and blaming yourself if it's not right.

j / k navigate · click thread line to collapse

0 comments

13 comments · 4 top-level

TeMPOraL1y ago· 6 in thread

So dance, IMO, is the steering wheel analogy for AI-powered interfaces, and that is the space we need to explore more.

ryandrake1y ago

> We need an interface to operate a tool, not an interface to an agent that operates a tool for us.

xp841y ago

The modern OS values the system's theoretical 'system health' metrics far above things like "whether the user can use it to do some user task."

Another great example is how you can't boot a modern Mac laptop, on AC power, until it has decided its battery is sufficiently charged. Why? None of your business.

1 more reply

Karrot_Kream1y ago

> Now, a computer has its own mind and agency, and we "request" it to do things and "communicate" with it, and ask it to run this and don't run that.

smj-edison1y ago

[1] https://dynamicland.org/

EDIT: love your analogy to dance!

taeric1y ago

TeMPOraL1y ago

You sum this up perfectly:

> You almost certainly want to be much more actively "hands on" in very domain specific ways with the artifacts produced.

Chat may be great to get that first artifact, but afterwards, there's almost always a more hands-on interface that would be much better.

1 more reply

taeric1y ago· 2 in thread

SoftTalkerOP1y ago

taeric1y ago

Agreed.

I'm actually in an awkward position where I was very supportive of RTO two years ago, but have since become very reliant on some things I could not do with a rigid RTO policy.

Regardless of RTO or WFH, patience and persistence remain vital qualities.

aylmao1y ago· 1 in thread

I would also call it having all the worst attributes of a CLI, without the succinctness, OS integration, and program composability of one.

1ucky1y ago

You should check out out MCP by Anthropic, which solves some of the issues you mentioned.

hakfoo1y ago

The idea of chat interfaces always seemed to be to disguise available functionality.

j / k navigate · click thread line to collapse