UI vs. API. vs. UAI (opens in new tab)

(joshbeckman.org)

87 pointsbckmn10mo ago53 comments

53 comments

32 comments · 9 top-level

showerst10mo ago· 13 in thread

I really vehemently disagree with the 'feedforward, tolerance, feedback' pattern.

Protocols and standards like HTML built around "be liberal with what you accept" have turned out to be a real nightmare. Best-guessing the intent of your caller is a path to subtle bugs and behavior that's difficult to reason about.

If the LLM isn't doing a good job calling your api, then make the LLM get smarter or rebuild the api, don't make the API looser.

mort9610mo ago

I'm not sure it's possible to have a technology that's user-facing with multiple competing implementations, and not also, in some way, "liberal in what it accepts".

Back when XHTML was somewhat hype and there were sites which actually used it, I recall being met with a big fat "XML parse error" page on occasion. If XHTML really took off (as in a significant majority of web pages were XHTML), those XML parse error pages would become way more common, simply because developers sometimes write bugs and many websites are server-generated with dynamic content. I'm 100% convinced that some browser would decide to implement special rules in their XML parser to try to recover from errors. And then, that browser would have a significant advantage in the market; users would start to notice, "sites which give me an XML Parse Error in Firefox work well in Chrome, so I'll switch to Chrome". And there you have the exact same problem as HTML, even though the standard itself is strict.

The magical thing of HTML is that they managed to make a standard, HTML 5, which incorporates most of the special case rules as implemented by browsers. As such, all browsers would be lenient, but they'd all be lenient in the same way. A strict standard which mandates e.g "the document MUST be valid XML" results in implementations which are lenient, but they're lenient in different ways.

HTML should arguably have been specified to be lenient from the start. Making a lenient standard from scratch is probably easier than trying to standardize commonalities between many differently-lenient implementations of a strict standard like what HTML had to do.

chowells10mo ago

Are you aware of HTML 5? Fun fact about it: there's zero leniency in it. Instead, it specifies a precise semantics (in terms of parse tree) for every byte sequence. Your parser either produces correct output or is wrong. This is the logical end point of being lenient in what you accept - eventually you just standardize everything so there is no room for an implementation to differ on.

The only difference between that and not being lenient in the first place is a whole lot more complex logic in the specification.

1 more reply

lucideer10mo ago

History has gone the way it went & we have HTML now, there's not much point harking back, but I still find it very odd that people today - with the wisdom of foresight - believe that the world opting for HTML & abandoning XHTML was the sensible choice. It seems odd to me that it's not seen as one of those "worse winning out" stories in the history of technology, like betamax.

The main argument about XHTML not being "lenient" always centred around client UX of error display - Chrome even went on to actually implement a user-friendly partial-parse/partial-render handling of XHTML files that literally solved everyone's complaints via UI design without any spec changes but by this stage it was already too late.

The whole story of why we went with HTML is somewhat hilarious: 1 guy wrote an ill informed blog post bitching about XHTML, generated a lot of hype, made zero concrete proposals to solve its problems, & then somehow convinced major browser makers (his current & former employers) to form an undemocratic rival group to the W3C, in which he was appointed dictator. An absolutely bizarre story for the ages, I do wish it was documented better but alas most of the resources around it were random dev blogs that link rotted.

1 more reply

com2kid10mo ago

> I recall being met with a big fat "XML parse error" page on occasion. If XHTML really took off (as in a significant majority of web pages were XHTML), those XML parse error pages would become way more common

Except JSX is being used now all over the place and JSX is basically the return of XHTML! JSX is an XML schema with inline JavaScript.

The difference now days is all in the tooling. It is either precompiled (so the devs see the error) or generated on the backend by a proper library and not someone YOLOing PHP to super glue strings together, as per how dynamic pages were generated in the glory days of XHTML.

We basically got full circle back to XHTML, but with a lot more complications and a worse user experience!

2 more replies

JimDabell10mo ago

> If XHTML really took off (as in a significant majority of web pages were XHTML), those XML parse error pages would become way more common

This is not true because you are imagining a world with strict parsing but where people are still acting as though they have lax parsing. In reality, strict parsing changes the incentives and thus people’s behaviour.

This is really easy to demonstrate: we already have a world with strict parsing for everything else. If you make syntax error with JSON, it stops dead. How often is it that you run into a website that fails to load because there is a syntax error in JSON? It’s super rare, right? Why is that? It’s because syntax errors are fatal errors. This means that when developing the site, if the developer makes a syntax error in JSON, they are confronted with it immediately. It won’t even load in their development environment. They can’t run the code and the new change can’t be worked on until the syntax error is resolved, so they do that.

In your hypothetical world, they are making that syntax error… and just deploying it anyway. This makes no sense. You changed the initial condition, but you failed to account for everything that changes downstream of that. If syntax errors are fatal errors, you would expect to see far, far fewer syntax errors because it would be way more difficult for a bug like that to be put into production.

We have strict syntax almost everywhere. How often do you see a Python syntax error in the backend code? How often do you run across an SVG that fails to load because of a syntax error? HTML is the odd one out here, and it’s very clear that Postel was wrong:

https://datatracker.ietf.org/doc/rfc9413/

1 more reply

Mikhail_Edoshin10mo ago

I remember "HTML 5 W3C Valid" buttons, proudly displayed by web pages. This was considered cool, so why doing the same for XHTML wouldn't be same?

pwdisswordfishz10mo ago

lol CVE-2020-26870

arscan10mo ago

> Protocols and standards like HTML built around "be liberal with what you accept" have turned out to be a real nightmare.

This feels a bit like the setup to the “But you have heard of me” joke in Pirates of the Caribbean [2003].

paulddraper10mo ago

Or "There are only two kinds of languages: the ones people complain about and the ones nobody uses."

1 more reply

dathinab10mo ago

oh yes so true, but I would generalize it to "to flexible"

- content type sniffing spawned a whole class of attacks, and should have been unnecessary

- a ton of historic security issues where related to html parsing being too flexible, or some JS parts being to flexible (e.g. Array prototype override)

- or login flows being too flexible creating a easy to overlook way to bypass (part of) login checks

- or look at the mess OAuth2/OIDC had been for years because they insisted to over-enginer it and how especially it being liberal about quite many parts lead to more then one or two big security incidents

- (more then strictly needed) cipher flexibility is by now widely accepted to have been an anti pattern

- or how so much theoretically okay but "old" security tech is such a pain to use because it was made to be supper tolerant to everything, like every use case imaginable, every combination of parameters, every kind of partial uninterpretable parts (I'm looking at you ASN.1, X509 certs and many old CA software, theoretically really not bad designed, practically such a pain).

And sure you also can be too strict, high cipher flexibility being an anti-pattern was incorporated into TLS 1.3. But TLS still needs some cipher flexibility, so they fund a compromise of (oversimplified) you can choose 1 of 5 cipher suites but can't change any parameter of that suites.

Just today I read an article (at work, I don't have the link at hand) about some so hypothetical but practically probably doable (with a bunch of more work) scenarios to trick very flexible multi step agents into leaking your secrets. The core approach was that they found a way to have a relative small snippet of text which if it end up in the context has a high chance to basically override the whole context with just your instruction (quite a bit oversimplified). In turn if you can sneak it into someones queries (e.g. you GTP model is allowed to read you mails and it's in a mail send to you) you can then trick the multi step model to grab a secret from your computer (because the agents often run with user permissions) and send it to you (by e.g. instrumenting the agent to scan a website under an url which happens to now contain the secret).

Its a bit hypothetical, its hard to pull of, but it's very well in the realm of possibility due to how content and instructions are on a very fundamental level not cleanly separated (I mean AI vendors do try, but so far that never worked reliable it's in the end all the same input).

wvenable10mo ago

HTML being lenient is what made progressive enhancement possible -- right down the original <img> tag. The web would not have existed at all if HTML was strict right from the start.

dathinab10mo ago

> progressive enhancement possible

no not at all extensible isn't the same as lenient

having a Content-Type header where you can put in new media types (e.g. for images) once browsers support it is extensibility

sniffing the media type instead of strictly relying on the Content-Type header is leniency and had been the source of a lot of security vulnerabilities over the years

or having new top level JS object exposing new APIs is extensibility but allowing overriding the prototypes of fundamental JS objects (i.e. Array.prototype) turned out to be a terrible idea associated with multiple security issues (like idk. ~10 years ago, hence why it now is read only)

same for SAML, its use of XML made it extensible, but they way it leniently encoded XML for signing happened to be a security nightmare

or OAuth2 which is very extensible, but it being too lenient in what you can combine how was the source of many early security incidents and is still source of incompatibilities today (but OAuth2 is anyway a mess)

1 more reply

arccy10mo ago

That's poor reasoning. The web now counts as strict but still extensible: you just have to clearly define how to handle unknown input. The web treats all unknowns as a div.

1 more reply

metayrnc10mo ago· 6 in thread

This is already true for just UI vs. API. It’s incredible that we weren’t willing to put the effort into building good APIs, documentation, and code for our fellow programmers, but we are willing to do it for AI.

bubblyworld10mo ago

I think this can kinda be explained by the fact that agentic AI more or less has to be given documentation in order to be useful, whereas other humans working with you can just talk to you if they need something. There's a lack of incentive in the human direction (and in a business setting that means priority goes to other stuff, unfortunately).

In theory AI can talk to you too but with current interfaces that's quite painful (and LLMs are notoriously bad at admitting they need help).

zahlman10mo ago

> agentic AI more or less has to be given documentation in order to be useful, whereas other humans working with you can just talk to you if they need something. ... In theory AI can talk to you too but with current interfaces that's quite painful (and LLMs are notoriously bad at admitting they need help).

Another framing: documentation is talking to the AI, in a world where AI agents won't "admit they need help" but will read documentation. After all, they process documentation fundamentally the same way they process the user's request.

freedomben10mo ago

I also think it makes a difference that an AI agent can read the docs very quickly, and don't typically care about formatting and other presentation-level things that humans have to care about, whereas a human isn't going to read it all, and may read very little of it. I've been at places where we invested substantial time documenting things, only to have it be glanced at maybe a couple of times before becoming outdated.

The idea of writing docs for AI (but not humans) does feel a little reflexively gross, but as Spock would say, it does seem logical

arscan10mo ago

The feedback loop from potential developer users of your API is excruciatingly slow and typically not a process that an API developer would want to engage in. Recruit a bunch of developers to read the docs and try it out? See how they used it after days/weeks? Ask them what they had trouble with? Organize a hackathon? Yuck. AI, on the other hand, gives you immediate feedback as to the usability of your “UAI”. It makes something, in under a minute, and you can see what mistakes it made. After you make improvements to the docs or API itself, you can effectively wipe its memory by cleaning out the context, and see if what you did helped. It’s the difference between debugging a punchcard based computing system and one that has a fully featured repl.

jnmandal10mo ago

Yeah, this is so true. Well designed APIs are also already almost good enough for AI. There really was always a ton of value in good API design before LLMs. Yet a lot of people still said, for varying reasons, let's just ship slop and focus elsewhere.

righthand10mo ago

We are only willing to have the Llm generate it for AI. Don’t worry people are writing and editing less.

And all those tenets of building good APIs, documentation, and code are opposite the incentive of building enshittified APIs, documentation, and code.

cco10mo ago· 2 in thread

We recently released isagent.dev [1] exactly for this reason!

Internally at Stytch three sets of folks had been working on similar paths here, e.g. device auth for agents, serving a different documentation experience to agents vs human developers etc and we realized it all comes down to a brand new class of users on your properties: agents.

IsAgent was born because we wanted a quick and easy way to identify whether a user agent on your website was an agent (user permissioned agent, not a "bot" or crawler) or a human, and then give you a super clean <IsAgent /> and <IsHuman /> component to use.

Super early days on it, happy to hear others are thinking about the same problem/opportunity.

[1] GitHub here: http://github.com/stytchauth/is-agent

gavmor10mo ago

I feel OP is addressing the complementary, opposite use case in which behavior is to be unified across user agents.

cco10mo ago

> Mostly I’ve just been stressing to my team that we need to consider these interfaces for our application as equal surfaces for how our application will be used. As we build features, we need to make conscious decisions about whether and how those features are accessible and legible in all three of those interfaces.

I read this to mean that they're thinking about all three experiences (interfaces) on equal footing, not as a unified one experience for all three user agents.

darepublic10mo ago· 1 in thread

if you want your app to be automated wouldn't you just publish your api and make that readily available? I understand the need for agentic UI navigation but obviously an api is still easier and less intensive right. The problem is that it isn't always available, and there ui agents can circumvent that. But you want to embrace the automation of your app so.. just work on your API? You can put an invisible node in your UI to tell agents to stop wasting compute and use the api.

jnmandal10mo ago

This is true but also your API needs to actually implement all the use cases (often its only for a subset) and it needs to work well (often there are many nuances or inconsistencies). But I agree there are lots of overlap. No need to completwly reinvent the wheel here. Actually CQRS systems work incredibly well with LLMs already.

jngiam110mo ago· 1 in thread

https://mcpui.dev/ is worth checking out, really nice project; get the tools to bring dynamic ui to the agents.

iregina10mo ago

You're spot on! We (Shopify) are using it now: https://shopify.engineering/mcp-ui-breaking-the-text-wall

throwanem10mo ago

So, this gets to a fundamental or "death of the author" ie philosophical difference in how we define what an API is "for." Do I as its publisher have final say, to the extent of forbidding mechanically permissible uses? Or may I as the audience, whom the publisher exists to serve, exercise the machine to its not intentionally destructive limit, trusting its maker to prevent normal operation causing (even economic) harm?

The answer of course depends on the context and the circumstance, admitting no general answer for every case though the cognitively self-impoverishing will as ever seek to show otherwise. What is undeniable is that if you didn't specify your reservations API to reject impermissible or blackout dates, sooner or later whether via AI or otherwise you will certainly come to regret that. (Date pickers, after all, being famously among the least bug-prone of UI components...)

kordlessagain10mo ago

All you need is AHP: https://ahp.nuts.services

kylecazar10mo ago

Separating presentation layer from business logic has always been a best practice

iregina10mo ago

Insightful!!

j / k navigate · click thread line to collapse

53 comments

32 comments · 9 top-level

showerst10mo ago· 13 in thread

I really vehemently disagree with the 'feedforward, tolerance, feedback' pattern.

If the LLM isn't doing a good job calling your api, then make the LLM get smarter or rebuild the api, don't make the API looser.

mort9610mo ago

I'm not sure it's possible to have a technology that's user-facing with multiple competing implementations, and not also, in some way, "liberal in what it accepts".

chowells10mo ago

The only difference between that and not being lenient in the first place is a whole lot more complex logic in the specification.

1 more reply

lucideer10mo ago

1 more reply

com2kid10mo ago

Except JSX is being used now all over the place and JSX is basically the return of XHTML! JSX is an XML schema with inline JavaScript.

We basically got full circle back to XHTML, but with a lot more complications and a worse user experience!

2 more replies

JimDabell10mo ago

> If XHTML really took off (as in a significant majority of web pages were XHTML), those XML parse error pages would become way more common

https://datatracker.ietf.org/doc/rfc9413/

1 more reply

Mikhail_Edoshin10mo ago

I remember "HTML 5 W3C Valid" buttons, proudly displayed by web pages. This was considered cool, so why doing the same for XHTML wouldn't be same?

pwdisswordfishz10mo ago

lol CVE-2020-26870

arscan10mo ago

> Protocols and standards like HTML built around "be liberal with what you accept" have turned out to be a real nightmare.

This feels a bit like the setup to the “But you have heard of me” joke in Pirates of the Caribbean [2003].

paulddraper10mo ago

Or "There are only two kinds of languages: the ones people complain about and the ones nobody uses."

1 more reply

dathinab10mo ago

oh yes so true, but I would generalize it to "to flexible"

- content type sniffing spawned a whole class of attacks, and should have been unnecessary

- a ton of historic security issues where related to html parsing being too flexible, or some JS parts being to flexible (e.g. Array prototype override)

- or login flows being too flexible creating a easy to overlook way to bypass (part of) login checks

- (more then strictly needed) cipher flexibility is by now widely accepted to have been an anti pattern

wvenable10mo ago

HTML being lenient is what made progressive enhancement possible -- right down the original <img> tag. The web would not have existed at all if HTML was strict right from the start.

dathinab10mo ago

> progressive enhancement possible

no not at all extensible isn't the same as lenient

having a Content-Type header where you can put in new media types (e.g. for images) once browsers support it is extensibility

sniffing the media type instead of strictly relying on the Content-Type header is leniency and had been the source of a lot of security vulnerabilities over the years

same for SAML, its use of XML made it extensible, but they way it leniently encoded XML for signing happened to be a security nightmare

1 more reply

arccy10mo ago

That's poor reasoning. The web now counts as strict but still extensible: you just have to clearly define how to handle unknown input. The web treats all unknowns as a div.

1 more reply

metayrnc10mo ago· 6 in thread

bubblyworld10mo ago

In theory AI can talk to you too but with current interfaces that's quite painful (and LLMs are notoriously bad at admitting they need help).

zahlman10mo ago

freedomben10mo ago

The idea of writing docs for AI (but not humans) does feel a little reflexively gross, but as Spock would say, it does seem logical

arscan10mo ago

jnmandal10mo ago

righthand10mo ago

We are only willing to have the Llm generate it for AI. Don’t worry people are writing and editing less.

And all those tenets of building good APIs, documentation, and code are opposite the incentive of building enshittified APIs, documentation, and code.

cco10mo ago· 2 in thread

We recently released isagent.dev [1] exactly for this reason!

Super early days on it, happy to hear others are thinking about the same problem/opportunity.

[1] GitHub here: http://github.com/stytchauth/is-agent

gavmor10mo ago

I feel OP is addressing the complementary, opposite use case in which behavior is to be unified across user agents.

cco10mo ago

I read this to mean that they're thinking about all three experiences (interfaces) on equal footing, not as a unified one experience for all three user agents.

darepublic10mo ago· 1 in thread

jnmandal10mo ago

jngiam110mo ago· 1 in thread

https://mcpui.dev/ is worth checking out, really nice project; get the tools to bring dynamic ui to the agents.

iregina10mo ago

You're spot on! We (Shopify) are using it now: https://shopify.engineering/mcp-ui-breaking-the-text-wall

throwanem10mo ago

kordlessagain10mo ago

All you need is AHP: https://ahp.nuts.services

kylecazar10mo ago

Separating presentation layer from business logic has always been a best practice

iregina10mo ago

Insightful!!

j / k navigate · click thread line to collapse