You're missing his point. He's saying if you make a program, you expect it to do X reliably. X may include "send an email, or kick off this workflow, or add this to the log, or crash" but you don't expect it to, for example, "delete system32 and shut down the computer". LLMs have essentially unconstrained outputs where the above mentioned program
couldn't possibly delete anything or shut down your computer because nothing even close to that is in the code.
Please do not confuse this example with agentic AI losing the plot, that's not what I'm trying to say.
Edit: a better example is that when you build an autocomplete plugin for your email client, you don't expect it to also be able to play chess. But look what happened.