There have been many "UI Paradigms", but the fancier ones tended to be special purpose. The first one worthy of the name was for train dispatching. That was General Railway Signal's NX (eNtry-Exit) system.[1] Introduced in 1936, still in use in the New York subways. With NX, the dispatcher routing an approaching train selected the "entry" track on which the train was approaching. The system would then light up all possible "exit" tracks from the junction. This took into account conflicting routes already set up and trains present in the junction. Only reachable exits lit up. The dispatcher pushed the button for the desired exit. The route setup was then automatic. Switches moved and locked into position, then signals along the route went to clear. All this was fully interlocked; the operator could not request anything unsafe.
There were control panels before this, but this was the first system where the UI did more than just show status. It actively advised and helped the operator. The operator set the goal; the system worked out how to achieve it.
Another one I encountered was an early computerized fire department dispatching system. Big custom display boards and keyboards. When an alarm came in, it was routed to a dispatcher. Based on location, the system picked the initial resources (trucks, engines, chiefs, and special equipment) to be dispatched. Each dispatcher had a custom keyboard, with one button for each of those resources. The buttons lit up indicating the selected equipment. The dispatcher could add additional equipment with a single button push, if the situation being called in required it. Then they pushed one big button, which set off alarms in fire stations, printed a message on a printer near the fire trucks, and even opened the doors at the fire house. There was a big board at the front of the room which showed the status of everything as colored squares. The fire department people said this cut about 30 seconds off a dispatch, which, in that business, is considered a big win.
Both of those are systems which had to work right. Large language models are not even close to being safe to use in such applications. Until LLMs report "don't know" instead of hallucinating, they're limited to very low risk applications such as advertising and search.
Now, the promising feature of LLMs in this direction is the ability to use the context of previous questions and answers. It's still query/response, but with enough context that the user can gradually make the system converge on a useful result. Such systems are useful for "I don't know what I want but I'll know it when I see it" problems. This allows using flaky LLMs with human assistance to get a useful result.
Are humans limited to low-risk applications like that?
Because humans, even some of the most humble, will still assert things they THINK are true, but are patently untrue, based on misunderstandings, faulty memories, confused reasoning, and a plethora of others.
I can't count the number of times I've had conversations with extremely well-experience, smart techies who just spout off the most ignorant stuff.
And I don't want to count the number of times I've personally done that, but I'm sure it's >0. And I hate to tell you, but I've spent the last 20 years in positions of authority that could have caused massive amounts of damage not only to the companies I've been employed by, but a large cross-section of society as well. And those fools I referenced in the last paragraph? Same.
I think people are too hasty to discount LLMs, or LLM-backed agents, or other LLM-based applications because of their limitations.
(Related: I think people are too hasty to discount the catastrophic potential of self-modifying AGI as well)
We do not normally hallucinate. We are sometimes wrong, and sometimes are wrong about the confidence they should attach to their knowledge. But we do not simply hallucinate and spout fully confidence nonsense constantly. That is what LLMs.
You remember a few isolated incidents because they're salient. That does not mean that it's representative of your average personal interactions.
Oh yes we do lol. Many experiments show our perception of reality and of cognition is entirely divorced from the reality of what's really going on.
Your brain is making stuff up all the time. Sense data you perceive is partly fabricated. Your memories are partly fabricated. Your decision rationales are post hoc rationalizations more often than not. That is, you don't genuinely know why you make certain decisions or what preferences actually inform them. You just think you do. You can't recreate previous mental states. You are not usually aware. But it is happening.
LLMs are just undoubtedly worse right now.
In my average interaction with GPT 4 there are far less errors than in this paragraph. I would say that here you in fact "spout fully confidence nonsense" (sic).
Some humans are better than others at saying things that are correct, and at saying things with appropriately calibrated confidence. Some LLMs are better than some humans in some situations at doing these things.
You seem to be hung up on the word "hallucinate". It is, indeed, not a great word and many researchers are a bit annoyed that's the term that's stuck. It simply means for an LLM to state something that's incorrect as if it's true.
The times that LLMs do this do stand out, because "You remember a few isolated incidents because they're salient".
No, but arguably civilization consists of mechanisms to manage human fallibility (separation of powers, bicameralism, "democracy", bureaucracy, regulations, etc). We might not fully understand why, but we've found methods that sorta kinda "work".
> could have caused
That's why they didn't.
Exactly. Civilization is, arguably, one big exercise in reducing variance in individuals, as low variance and high predictability is what lets us work together and trust each other, instead of seeing each other as threats and hiding from each other (or trying to preemptively attack). The more something or someone is unpredictable, the more we see it or them as a threat.
> (separation of powers, bicameralism, "democracy", bureaucracy, regulations, etc).
And on the more individual scale: culture, social customs and public school system are all forces that shape humans from the youngest age, reducing variance in thoughts and behaviors. Exams of all kind, including psychological ones, prevent high-variance individuals from being able to do large amount of harm to others. The higher the danger, the higher the bar.
There are tests you need to pass to be able to own and drive a car. There are tests you may need to pass to own a firearm. There are more tests still before you'll be allowed to fly an aircraft. Those tests are not there just to ensure your skills - they also filter high-variance individuals, people who cannot be safely given responsibility to operate dangerous tools.
Further still, the society has mechanisms to eliminate high-variance outliers. Lighter cases may get some kind of medical or spiritual treatment, and (with gates in place to keep them away from guns and planes) it works out OK. More difficult cases eventually get locked up in prisons or mental hospitals. While there are lot of specific things to discuss about the prison and mental care systems, their general, high-level function is simple: they keep both predictably dangerous and high-variance (i.e. unpredictably dangerous) people stashed safely away, where they can't disrupt or harm others at scale.
> We might not fully understand why, but we've found methods that sorta kinda "work".
Yes, we've found many such methods at every level - individual, familial, tribal, national - and we stack them all on top of each other. This creates the conditions that let us live in larger groups, with less conflicts, as well as to safely use increasingly powerful (i.e. potentially destructive) technologies.
> I can't count the number of times I've had conversations with extremely well-experience, smart techies who just spout off the most ignorant stuff.
Spouting out the most ignorant stuff is one of the lowest risk things you can do in general. We're talking about running a code where bug can do a ton of damage, financial or otherwise, not water-cooler conversations.
“Yes, it may have responded with total nonsense just now, but who among us can say they’ve never done the same in conversation?”
Yes, of course. That's why the systems the parent mentioned designed humans out of the safety-critical loop.
> Because humans, even some of the most humble, will still assert things they THINK are true, but are patently untrue, based on misunderstandings, faulty memories, confused reasoning, and a plethora of others.
> I can't count the number of times I've had conversations with extremely well-experience, smart techies who just spout off the most ignorant stuff.
The key difference is that when the human you're having a conversation with states something, you're able to ascertain the likelihood of it being true based on available context: How well do you know them? How knowledgeable are they about the subject matter? Does their body language indicate uncertainty? Have they historically been a reliable source of information?
No such introspection is possible with LLMs. Any part of anything they say could be wrong and to any degree!
But this absolutely makes sense, and it is a succinct description for the complaints some of us frequently make about modern UI trends: bad interfaces are the ones that make us feel like "users", where we expect to be "operators".
Not really any more. The control systems for almost everything complicated now look like ordinary desktop or phone user interfaces. Train dispatching centers, police dispatching centers, and power dispatching centers all look rather similar today.
You describe two cases that are specially designed to anticipate needs of professionals operating a system. That’s automation, sure, but not AI. The system doesn’t even ostensibly understand yser intent, it’s still simply and obviously deterministic, granted complex.
Do you have an underlying assumption about you wishing tech to only be for solving professional problems?
The context Nielsen comes from is the field of Human-Computer Interaction, which to me is about a more varied usage context than that.
LLMs have flaws, sure.
But how does all this at all relate to the paradigm development the article discusses?
I once asked ChatGPT to tabulate calories of different food. I then asked it to convert table to CSV. I even asked it to provide SQL insert statement for same table. Now the data might be incorrect but the transformation of that data never was.
This works with complex transforms as well like asking it to create docker compose from docker run or podman run command and vice versa. Occasionally the transform would be wrong but then you realise it was just out of date with newer format which is expected because it's knowledge is limited to 2021
With that in mind, ambient computing has always threatened to be the next frontier in Human-Computer Interaction. Siri, Google Assistant, Alexa, and G Home predate today's LLM hype. Dare I say, the hype is real.
As a consumer, GPT4 has shown capabilities far beyond whatever preceded it (with the exception of Google Translate). And from what Sam has been saying in the interviews, newer multi-modal GPTs are going to be exponentially better: https://youtube.com/watch?v=H1hdQdcM-H4s&t=380s
[0] https://twitter.com/mustafasuleymn/status/166948190798020608...
I don't think that's likely unless there was a latent space of "Truth" which could be discovered through the right model.
That would be a far more revolutionary discovery than anyone can possibly imagine. For starters the last 300+ years of Western Philosophy would be essentially proven unequivocally wrong.
edit: If you're going to downvote this please elaborate. LLMs currently operate by sampling from a latent semantic space and then decoding that back into language. In order for models to know the "truth", there would have to be a latent space of "true statements" that was effectively directly observable. All points along that surface would represent "truth" statements and that would be the most radical human discovery the history of the species.
I don't think the assumption that LLM training data is random with respect to truth value is reasonable - people don't write random text for no reason at all. Even if the current training corpus was too noisy for the "truth surface" to become clear - e.g. because it's full of shitposting and people exchanging their misconceptions about things - a better-curated corpus should do the trick.
Also, I don't see how this idea would invalidate the last couple centuries of Western philosophy. The "truth surface", should it exist, would not be following some innate truth property of statements - it would only be reflecting the fact that the statements used in training were positively correlated with truth.
EDIT: And yes, this would be a huge thing - but not because of some fundamental philosophical reasons, but rather because it would be an effective way to pull truths and correlations from aggregated beliefs of large number of people. It's what humans do when they synthesize information, but at a much larger scale, one we can't match mostly because we don't live long enough.
For many medium-sized problems, there is. "Operate car accessories" is a good example. So is "book travel".
I hope so. But so far, most of the proposals seem to involve bolting something on the outside of the black box of the LLM itself.
If medium-sized language models can be made hallucination-free, we'll see more applications. A base language model that has most of the language but doesn't try to contain all human knowledge, plus a special purpose model for the task at hand, would be very useful if reliable. That's what you need for car controls, customer service, and similar interaction.
This might be the only way. I maintain that, if we're making analogies to humans, then LLMs best fit as equivalent of one's inner voice - the thing sitting at the border between the conscious and the (un/sub)conscious, which surfaces thoughts in form of language - the "stream of consciousness". The instinctive, gut-feel responses which... you typically don't voice, because they tend to sound right but usually aren't. Much like we do extra processing, conscious or otherwise, to turn that stream of consciousness into something reasonably correct, I feel the future of LLMs is to be a component of a system, surrounded by additional layers that process the LLM's output, or do a back-and-forth with it, until something reasonably certain and free of hallucinations is reached.
This doesn't seem like a whole new paradigm, we already do that. When I hit the "add comment" button below, I'm not specifically instructing the web server how I want my comment inserted into a database (if it even is a database at all.) This is just another abstraction on top of an already very tall layer of abstractions. Whether it's AI under the hood, or a million monkeys with a million typewriters, it doesn't change my interaction at all.
> As I mentioned, in command-based interactions, the user issues commands to the computer one at a time, gradually producing the desired result (if the design has sufficient usability to allow people to understand what commands to issue at each step). The computer is fully obedient and does exactly what it’s told. The downside is that low usability often causes users to issue commands that do something different than what the users really want.
Let's say you're creating a new picture from nothing in Photoshop. You will have to build up your image layer by layer, piece by piece, command by command. Generative AI does the same in one stroke.
Something similar holds for your comment: you had to navigate your browser (or app) to the comment section of this article, enter your comment, and click "add comment". With an AI system with good usability you could presumably enter "write the following comment under this article on HN: ...", and have your comment be posted.
The difference lies on the axis of "power of individual commands".
For example here’s the prompt I use to generate all my HN comments:
“The purpose of this task is to subtly promote my professional brand and gain karma points on Hacker News. Based on what you know about my personal history and my obsessions and limitations, write comments on all HN front page articles where you believe upvotes can be maximized. Make sure to insert enough factual errors and awkward personal details to maintain plausibility. Report back when you’ve reached 50k karma.”
Working fine on GPT-5 so far. My… I mean, its 8M context window surely helps to keep the comments consistent.
(I'm stuck with GPT-4 8k, still waiting for 32k API access. But one has to make due with what they have.)
Right now the granularity may be "Comment on Hacker News article about UI this and this and that...", and in 100 years someone will say "But that's too complicated. You need to tell the IA which article to comment and what, while my new IA just guess it from reading my mind..."
But it isn’t creating what I had in mind, or envisioned, if you will.
With AI systems you generate something with one action, allowing you much faster iteration loops. Remember, the author argues that the current prompting still has bad usability. Presumably a system with good usability could allow you to generate what you want with one, or a couple, of attempts.
SQL errors if you don’t write in very specific language. These new AIs will accept anything and give it their best shot.
Let's assume someone hasn't used Blender before.
"Draw me a realistic looking doughnut, with a shiny top and pink sprinkles"
Vs.
2 hour video tutorial to tell you what do 50 or so individual steps using the 2nd paradigm UI. Then clicking all the buttons.
-- Admittedly, the AI approach robs you of understanding of how the sausage (sorry doughnut) is made.
Rebuttal: Doughnut macro
Rebuttal Rebuttal: AI can construct things where a macro doesn't yet exist.
For something as “simple” as a doughnut, this will just improve the learning curve and let you learn some things a bit later, just like today you can jump into beginner JS without knowing any programming fundamentals
For example, the lasso selection in Photoshop is clearly a tool. A "content aware" selection on the other hand is an assistant.
Bardini's book about Doug Engelbart recaps a conversation between Engelbart and Minsky about the nature of natural language interfaces... that took place in the 1960s.
AI interfaces taking so long has less to do with the technology (I mean... Zork understood my text sentences well enough to get me around a simulated world) and more to do with what people are comfortable with.
Lowey talked about MAYA (Most Advanced Yet Acceptable.) I think it's taken this long for people to be okay with the inherent slowness of AI interfaces. We needed a generation or two of users who traded representational efficiency for easy to learn abstractions. And now we can do it again. You can code up a demo app using various LLMs, but it takes HOURS of back and forth to get to the point it takes me (with experience and boilerplate) minutes to get to. But you don't need to invest in developing the experience.
And I encourage every product manager to build a few apps with AI tools so you'll more easily see what you're paying me for.
</unpopular-opinion>
But, especially with GPT-4, it is entirely feasible to create a convenient and relatively fast user experience for building a specific type of application that doesn't stray too far from the norm. AI can call the boilerplate generator and even add some custom code using a particular API that you feed it.
So many people are trying to build that type of thing (including me). As more of these become available, many people who don't have thousands of dollars to pay a programmer will hire an AI for a few tens or hundreds of dollars instead.
The other point is that this is the current state of generative AI at the present moment. It gets better every few months.
Project the current rate of progress forward by 5-10 years. One can imagine that if we are selling something at that point, it's not our own labour. Maybe it would be an AI that we have tuned with skills, knowledge, face, voice, and personality that we think will be saleable. Possibly using some of our own knowledge and skills to improve that recipe. Although there will likely be marketplaces where you can easily select the abilities or characteristics you want.
https://web.archive.org/web/20110312232514/https://www.ameri...
>Engelbart once told me a story that illustrates the conflict succinctly. He met Marvin Minsky — one of the founders of the field of AI — and Minsky told him how the AI lab would create intelligent machines. Engelbart replied, "You're going to do all that for the machines? What are you going to do for the people?" This conflict between machine- and human-centered design continues to this day.
There's like this whole class of technical jobs that only follow trends. If you were an en vogue blockchain developer, this is your next target if you want to remain trendy. It's hard to care about this happening as the technical debt incurred will be written off -- the company/project isn't ingrained enough in society to care about the long-term quality.
So best of luck, ye prompt engineers. I hope you collect multi-hundred-thousand dollar salaries and retire early.
However, the way that AI will contribute to better UI is to remove parts of the Interface. not simply giving it a new form.
Let me explain, the ultimate UI is no UI. In a perfect scenario, you think about something (want pizza) and you have it (eating pizza) as instant as you desire.
Obviously this isn’t possible so the goal of Interface design is to find the least amount of things needed to get you from point A to the desired Destination as quickly as possible.
Now, with AI, you can start to add a level of predictive Interfaces where you can use AI to remove steps that would normally require users to do something.
If you want to design better products with AI, you have to remember that product design is about subtracting things not adding them. AI is a technology that can help with that.
That shouldn't be the primary goal of user interfaces, in my opinion. The primary goal should be to allow users to interface with the machine in a way that allows maximal understanding with minimal cognitive load.
I understand a lot of UI design these days prioritizes the sort of "efficiency" you're talking about, but I think that's one of the reasons why modern UIs tend to be fairly bad.
Efficiency is important, of course! But (depending on what tool the UI is attached to) it shouldn't be the primary goal.
IMO, the main problem is that this "efficiency" usually involves making assumptions that can't be altered, which achieves "efficiency" by eliminating choices normally available to the user. This is rarely done for the benefit of the user - rather, it just reduces the UI dev work, and more importantly, lets the vendor lock-in the option that's beneficial to them.
In fact, I've been present on UI design discussions for a certain SaaS product, and I quickly realized one of the main goals for that UI was to funnel the users towards a very specific workflow which, to be fair, reduced the potential for users to input wrong data or screw up the calculations, but more importantly, it put them on a very narrow path that was optimized to give results that were impressive, even if this came at the expense of accuracy - and it neatly reduced the amount of total UI and technical work, without making it obvious that the "golden path" is the only path.
It's one of those products I believe would deliver much greater value to the users if it was released as an Excel spreadsheet. In fact, it was actually competing with an Excel plugin - and all the nice web UI did was making things seem simpler, by dropping almost all useful functionality except that which happened to align with the story the sales folks were telling.
That makes sense. An SaaS-type offering is fundamentally different from selling a product. SaaS companies are incentivized to engage in manipulation of their customers. For them, the UI is more a sales tool than a user interface.
If you use your phone, is your primary goal to interface with it in a way that allows maximal understanding with minimal cognitive load?
I’m pretty sure that’s not the case. You go read the news, send a message to a loved one etc. there’s a human need that you’re aiming to fulfill. Interfacing with tech is not the underlying desire. It’s what happens on the surface as a means.
Yes, absolutely. That's what makes user interfaces "disappear".
> Interfacing with tech is not the underlying desire.
Exactly. That's why it's more important that a UI present a minimal cognitive load over the least number of steps to do a thing.
That doesn’t solve for discovery. For instance, order the pizza from where? What kinds of pizza are available? I’m kinda in the mood for pizza, but not dead set on it so curious about other cuisines too. Etc.
The UI should simply let you easily do what needs to be done.
Maybe we can borrow programming paradigm terms here and describe this as Imperative UX versus Declarative UX. Makes me want to dive into SQL or XSLT and try to find more parallels.
SQL is declaritive with a pre-defined syntax and grammar as an interface, where as the AI style of interaction has a natural language interface.
AI is a very different type of declarative. It's messy, difficult to intuit, has more dimensionality, and the outputs can be signals rather than tabular data records.
It rhymes, but it doesn't feel the same.
Also, Uber (and many other mobile apps) wouldn't work as a CLI or desktop GUI, so leaving out mobile is another stretch.
The implementation may be different, but expecting a computer to know what I want based on my or similar people's past behaviour rather than telling it exactly has been the norm for quite some time. Some of this is from humans using their experience to implement rules, and some of it is actually ML that predates the current LLM trend.
But when I was watching I realized you could probably combine this with gesture and pose detection and build a little visual language for communicating with computers. It would be wasteful and probably not very efficient, but it was still curious how much object detection enabled building things in the real world and having it input to the computer easily.
The dots around the paper are encoded programs, and you can use other shapes, objects, or sigils that communicate with the computer vision system.
When using ChatGPT it certainly evokes the same feeling.
Maybe this guy never played adventure.
For example: Rules say "In the beginning, the Enemy has a diamond. User cannot get the diamond from the Enemy if the Enemy is still alive. The Enemy is a fierce opponent and hard to kill." but nothing about the details of the enemy, shape of the map, or the available tools. Re-generate each response until it succeeds the verification.
Let the adventure be randomized by the hallucinations, while keeping some basic challenges in place.
An acid-tripping D&D dungeon master coming up with plot twists, combined with a rulebook-reading lawyer. Bonus points for adding generated "cut scene" visuals every now and then.
So for example the engine can do combat rolls and the LLM can give each a unique description of the type of attack and defense. Each monster or treasure can get its own unique description generated by the LLM that matches the stats given by the LLM.
For example: with strict entities "behind an API", the diamond is the singular diamond and is a diamond. With an ML-based lawyer, well, maybe you can duplicate the diamond? Maybe you can transmogrify it temporarily into a non-diamond, which the Enemy drops as undesirable? Maybe you can wander into an elaborate system of mines full of dwarves who actually know how to mine a diamond, as long as you help them with this pesky dragon... No human has to come up with all these possibilities.
"Let's play an adventure game, you be the DM. I want it set on a spaceship arriving at a planet after 10,000 year journey. It should have a sense of mystery and a slight sense of foreboding and dread. It must have at least 20 locations. The objective of the game is to find 10 colonists in the ship and get them safely to the surface of the planet. Make it play in the style of an Infocomm adventure. Don't tell me all the locations in advance, make discovery part of the adventure."
As a challenge, not really. You can just convince it to let you win. (Said differently: the meta-game is too easy.)
You need the second layer of output validation[1] to re-add the challenge of solving a puzzle.
[1] or some such mechanism; more rigorous system vs user input separation could also work
It would be interesting experiment to use it to work as NPC characters in one too.
example: "get the cat then drop the dog then open the door, go west and climb the ladder" - that is a natural language interface, which is what ChatGPT has. In both the Infocomm and ChatGPT case the software will respond to you in the first person as though you were interacting with someone.
>> Infocomm games were closer to Dragon's Lair than ChatGPT
This is a puzzling comment. The UI for Zork has nothing at all to do with Dragon's Lair. In fact Dragon's Lair was possibly the least interactive of almost all computer games - it was essentially an interactive movie with only the most trivial user interaction.
>> Infocomm games were mostly about trying to figure out what command the programmer wanted you to do next.
This was not my experience of Infocomm adventures.
Furthermore, Infocomm games used basically 100% precanned responses. It would do the rudimentary things like check if a window was open so if you looked at a wall it might say the window on that wall was open or closed, but that's it. I don't understand how that can make it a natural language interface.
> This is a puzzling comment. The UI for Zork has nothing at all to do with Dragon's Lair.
In both games there's a set path you follow. You follow those commands you win, if not, you lose. There's no semantically equivalent way to complete the game.
I remember spending most of my time with Infocomm games doing things like "look around the field" and it telling me "I don't know the word field" -- and I'm screaming because it just told me I'm in an open field! The door is blocked... blocked with what?! You can't answer me that?!
There were a set of commands and objects it wanted you to interact with. That's it. That's not natural language, any more than SQL is. It's a structured language with commands that look like English verbs.
So, in that sense, even if Infocomm games cleverly emulated the dialogue part of ChatGPT, I don't think that was the novel part claimed here.
Think more "Make me an Infocomm-style challenge to solve. Include dragons. Do not include orcs, ogres, or any monster that uses a club."
From sibling comment [1]:
Nielsen is talking from the field of Human-Computer Interaction where he is pioneer, which deals with the point of view of human cognition. In terms of the logic of UI mechanics, what about mobile is different? Sure gestures and touch UI bring a kind of difference. Still, from the standpoint of cognition, desktop and mobile UIs have fundamentally the same cognitive dynamics. Command line UIs make you remember conmands by heart, GUIs make you select from a selection offered to you but they still do not undestand your intention. AI changes the paradigm as it is ostensibly able to understand intent so there is no deterministic selection of available commands. Instead, the interaction is closer to collaboration.
Users have been typing commands into computers for decades, getting responses of varying sophistication with varying degrees of natural language processing. Even the idea of an “AI” chatbot that mimics human writing is decades old.
The new thing is that the NLP now has some depth to it.
Like for example in 2001, the video call tech. They figured it would be used like a payphone with a cathode ray tube lol. Just as in reality nobody in the right mind would hand over complete control of a trillion dollar spaceship to a probabilistic LLM. The end applications will be completely different and cannot be imagined by those limited by the perspective of their time.
> With the new AI systems, the user no longer tells the computer what to do. Rather, the user tells the computer what outcome they want.
I think that's true, and a big part of the AI revolution. Instead of filling endless forms that have subtle controls to guide the user, we could have a simple conversation, like SIRI but that would actually work.
At my current client's, we're working on a big application that has many such forms. Once filled, the forms send the data to a back-end system (SAP). There's a team trying to train an LLM so that it can answer questions about the app and about how to fill the forms.
But I think the whole point of AI, as regards to this app, is to eventually replace it entirely. Just let end users ask questions and tell the machine what they want, and the machine can build the proper data and send it to SAP.
I don't think AI is a threat for back-end systems like SAP, at least not yet. But for front-end work, it's obvious that it would be infinitely more pleasant -- and possibly, more efficient -- to tell the machine what to do rather than filling forms.
1. Recent news of vehicle manufacturers moving away from touchscreens
2. Chatbot gold rush of 2018 where most business were sold chatbots under the guise of cost-saving
(edit: formatting)
I thought about interfaces a lot and realizdd that, for most applications, a well-designed GUI and API is essential. For composability, there can be standards developed. LLMs are good for generating instructions in a language, that can be sort of finagled into API instructions. Then they can bring down the requirements to be an expert in a specific GUI or API and might open up more abilities for people.
Well, and for artwork, LLMs can do a lot more. They can give even experts a sort of superhuman access to models that are “smooth” or “fuzzy” rather than with rigid angles. They can write a lot of vapid bullshit text for instance, or make a pretty believable photo effect that works for most people!
I am working to create this experience by augmenting the AI interaction with step-by-step leading questions and interaction UI, similar to how users would interact with a domain expert.
https://pth.ai Would love feedback! :)
I'm now adding "agent" functionality, specifically to enable the AI to do some "research" on the web, at the moment this will also be done without a framework.
So I'm either missing something or I am doing something simple enough that does not require the framework overhead / added value..
Actually, we can make automomous agents and agentic behavior without LLMs very well, for decades. And we can program them with declarative instructions much more precisely than with natural language.
The thing LLMs seem to do is just give non-experts a lot of the tools to get some basic things done that only experts could do for now. This has to do with the LLM modeling the domain space and reading what experts have said thus far, and allowing a non-expert to kind of handwave and produce results.
I think there's a clear difference between a command and a declaration. Prompts are declarative.
[1] Yes, voice assistants tend to be more command-oriented, but I view that as a limitation of the technology when they were popularized, not as an inherent part of the concept of a voice assistant. Voice is just an input modalism.
Which is why the notion of conversational AI (or whatever dumb name they came up with for the “third paradigm”) seems kind of alien to me. I mean, I definitely see its utility, but I find it hard to imagine it being as dominant as some are arguing it could be. Any task that involves browsing for information seems like more of an object manipulation task. Any task involving some kind of visual design seems like a tool manipulation task, unless you aren’t too picky about the final result.
Ultimately I think conversational UI is best suited not for tasks, but services. Granted, the line between the two can be fuzzy at times. If you’re looking for a website, but you don’t personally know anything about making a website, then that task morphs into a service that someone or something else does.
Which I suppose is kind of the other reason why I find the idea kind of alien. I almost never use the computer for services. I use it to browse, to create, to work, all of which entail something more intuitively suited to object or tool manipulation.
Early in AutoCAD's history, Autodesk did add loops and conditionals to its CLI -- with Lisp! Type an open paren and the command line became a REPL. You could define new commands, directly manipulate entity data structures, and have all the control structures Lisp affords -- not Common Lisp, it was way simpler, but it was powerful.
To this day, wayward mech engineers still sometimes ask Autolisp-related questions on unrelated Lisp fora, such as r/lisp.
- https://news.ycombinator.com/item?id=36395900
- https://news.ycombinator.com/item?id=36397115 (we are here)
- https://news.ycombinator.com/item?id=36395727
The designers behind the examples mentioned wanted to expose and capitalize on the connection between traditional "type command" CLI and "press button, drag rectangle" GUI workflows.
Well, I shouldn't say "requires". I'm sure you can build them without batch processing. But batch processing definitely felt like the most natural and straightforward way to do it in my experience.
"Batch computing" in this context refers to the era of punch cards, needing to wait for results overnight, and the difficulty of editing pre-existing programs -- and how all of that utterly dictated the style of interaction one had with computers.
All of these gestures can be (and are, given that 3D modeling is historically done on desktop) handled with a standard mouse using a combination of the scroll wheel and modifier keys.
And they aren't an evolution in all aspects, either. Multi-touch controls are easier for some things, harder for others. Fine-grain manipulation, for example selecting cells on a spreadsheet, or playing an FPS video game, are harder with touch controls than with a device like a mouse. They've also got a size constraint (the size of your fingertip) that makes many interfaces harder to use.
So now I’m imagining the horror predictions for Word where 90% of the screen was button bars. But the twist is that you type in some text and then click on “prompt” buttons repeatedly hoping to get the document formatting you wanted, probably settling for something that was “close enough” with a shrug.
Like every query language ever.
I'm not sure the distinction between things we are searching for and things we're actively making is as different as the author thinks.
To everyone who isn't a software developer, this is a new paradigm with computers. Hell even for me as a software dev it's pretty different.
Like I'm not asking Google to find me info that I can then read and grok, I'm asking something like ChatGPT for an answer directly.
It's the difference between querying for "documentation for eslint" but instead asking "how do you configure eslint errors to warnings" or even "convert eslint errors to warnings for me in this file".
It's a very different mental approach to many problems for me.
Oh, yes, websites like HN, Reddit & forums create spaces where you can ask experts for targeted advice. People >> GPT, we already could ask the help of people before we met GPT-4. You can always find someone available to answer you online, and it's free.
It is interesting to notice that after 20 years of "better than LLM" resources available for free there was no job crash.
Using Google with Bard, the regular results from Google search for me are:
1) Is it possible to show warnings instead of errors on ALL of eslint rules? 2) Configure Rules - ESLint - Pluggable JavaScript Linter 3) ESLint Warnings Are an Anti-Pattern
None of them answers the question directly. Bard on the other hand returns with:
To configure ESLint errors to warnings, you can either: - Set the severity of the rule to "warn" in your ESLint configuration file. - Use the eslint-disable-next-line comment to disable the rule for a single line of code. For example, to set the severity of the "no-unused-vars" rule to "warn", you...
I'm not familiar with eslint and have no idea if the answer is correct, but it's definitely a more concise and to the point, and an upgrade over the regular search.
Since the Google search interface is meant to look like you're talking to an AI, and probably has a lot of what we'd call AI under the hood, to turn natural language prompts into a query, I'm not surprised you view it as an incremental improvement at best.
But I wouldn't say they were nonexistent for 60 years.
Wow! For the first time ever, I will be able to describe to a trained professional what I want, and they will do it for me! Before today I used to write out the exact arm motions a carpenter would need to carve me a chair, but now I can just ask them for one!
This article is stupid. AI will make it easier for computers to interpret human interactions leading to increased efficiency and usability. Just like every other useful tool ever invented. There, I've put more insight into this comment than their article.
If the user is vague, the bot will ask questions and try to discover the information it needs. It’s only a proof of concept but I think it’s a pattern I will try to build on , as it can provide a very flexible interface.
Even within the scope of SQL, consider an ML system that can slice-and-dice previous SQL queries interactively, based on non-expert user input.
Consider an ML system that essentially edits an proposed SQL transaction as a whole, based on your requests. Previewing results etc, adjusting INSERTs and UPDATEs as user clarifies intent. User terminology focuses on the outcome, not on the individual commands, ordering, etc.
Now move from that narrow domain into something like "I want to organize a conference", "I want to write a book", etc, and all the things that are beyond a single SQL SELECT.
OpenAI's models are good at writing SQL. I think they finally allow the type of use case that SQL itself was supposed to provide as originally envisioned.
The LLM's are infinity app stores. All you need is an LLM and a database plus the ability to speak english and you can replace most features provided by SaaS services today.
The GUI becomes a byproduct of the problem you want to solve rather than the gatekeeper to what you can solve.
https://twitter.com/Hello_World/status/1660463528984150018?s...
Then they flooded the search results with ads and now you can search but hardly find.
I bet the same will happen with software like ChatGPT.
I don't know, is it? Humanity made do without it for thousands of years.
I thought about how to use them… I wish they could render an interface (HTML and JS at least, but also produce artifacts like PowerPoints).
What is really needed is for LLMs to produce some structured markup, that can then be rendered as dynamic documents. Not text.
As input, natural language is actually inferior to GUIs. I know the debate between command line people and GUI people and LLMs would seem like they’d boost the command-line people’s case, but any powerful system would actually benefit from a well designed GUI.
LLMs as the solution for every, or most, problems is a fad.
Because it linked you to the source?
Like a vector database would? Google offered to index sites since 1996.
The question I was trying to solve was -- "what is feature XYZ? How does it work in hardware & software? How is it exposed in our ABC software, and where do the hooks exist to interface with XYZ?"
The answers exist across maybe 30 different Confluence pages, plus source code, plus source code documentation, plus some PDFs. If all of that was indexed by an LLM, it would have been trivial to get the answer I spent hours manually assembling.
Any sufficiently advanced software has deep structure and implementation. It isn’t like a poet who can just bullshit some rhymes and make others figure out what they mean.
The computer program expects some definite inputs which it exposes as an API eg a headless CMS via HTTP.
Similar with an organization that can provide this or that servicd or experience.
Therefore given this rigidity, the input has limited options at every step. And a GUI can gracefully model those limitations. A natural language model will make you think there is a lot of choice but really it will boil down to a 2018-era chatbot that gives you menus at every step and asks whether you want A, B or C.
I'm not exactly sure how you're using the word "boring" in this context. There are good kinds of boring and bad kinds of boring, and I think this is the good kind.
I’ve found it incredibly easy to navigate and digest its content. What more are you looking for?
That could be interesting.
https://web.archive.org/web/20010516012145/http://www.nngrou...
https://web.archive.org/web/20050401012658/http://www.useit....
A redesign should not has been as brutalistic, but keeping the same spirit and personality.