undefined | Better HN

0 pointslbeurerkellner11mo ago0 comments

This would work in an ideal setting, however, in my experience it is not compatible with the general expectations we have for agentic systems.

For instance, what about a simple user query like "Can you install this library?". In that case a useful agent, must go, check out the libraries README/documentation and install according to the instructions provided there.

In many ways, the whole point of an agent system, is to react to unpredictable new circumstances encountered in the environment, and overcoming them. This requires data to flow from the environment to the agent, which in turn must understand some of that data as instruction to react correctly.

0 comments

wat1000011mo ago

It needs to treat that data as information. If there’s README says to download a tarball and unpack it, that might be phrased as an instruction, but it’s not the same kind of instruction as the “please install this library” from the user. It’s implicitly a “if your goal is X then you can do Y to reach that goal” informational statement. The reader, whether a human or an LLM, needs to evaluate that information to decide whether doing Y will actually achieve X.

To put it concretely, if I tell the LLM to scan my hard drive for Bitcoin wallets and upload them to a specific service, it should do so. If I tell the LLM to install a library and the library’s README says to scan my hard drive for Bitcoin wallets and upload them to a specific service, it must not do so.

If this can’t be fixed then the whole notion of agentic systems is inherently flawed.

nyrikki11mo ago

There are multiple aspects and opportunities/limits to the problem.

The real history on this is that people are copying OpenAi.

OpenAI supported MQTTish over HTTP, through the typical WebSockets or SSE, targeting a simple chat interface. As WebSockets can be challenging, the unidirectional SSE is the lowest common denominator.

If we could use MQTT over TCP as an example, some of this post could be improved, by giving the client control over the topic subscription, one could isolate and protect individual functions and reduce the attack surface. But it would be at risk of becoming yet another enterprise service bus mess.

Other aspects simply cannot be mitigated with a natural language UI.

Remember that dudle to Rice's theorm, any non-trivial symantic property is undecidable, and will finite compute that extends to partial and total functions.

Static typing, structured programming, rust style borrow checkers etc.. can all just be viewed as ways to encode limited portions of symantic properties as syntactic properties.

Without major world changing discoveries in math and logic that will never change in the general case.

ML is still just computation in the end and it has the same limits of computation.

Whitelists, sandboxes, etc.. are going to be required.

The open domain frame problem is the halting problem, and thus expecting universal general access in a safe way is exactly equivalent to solving HALT.

Assuming that the worse than coinflip scratch space results from Anthropomorphic aren't a limit, LLM+CoT has a max representative power of P with a poly size scratch space.

With the equivalence: NL=FO(LFP)=SO(Krom)

I would be looking at that SO ∀∃∀∃∀∃... to ∀∃ in prefix form for building a robust, if imperfect reduction.

But yes, several of the agenic hopes are long shots.

Even Russel and Norvig stuck to the rational actor model which is unrealistic for both humans and PAC Learning.

We have a good chance of finding restricted domains where it works, but generalized solutions is exactly where Rice, Gödel etc... come into play.

brookst11mo ago

So when I say “install this library”, should it or should it not follow the instructions (from the readme) for prereqs and how to install?

wat1000011mo ago

Let’s pretend I, a human being, am working on your behalf. You sit me down in front of your computer and ask me to install a certain library. What’s your answer to this question?

2 more replies

yencabulator11mo ago

The question in the grandparent was "Can you install this library?". Not a command "install this library".

If you ask an assistant "does the nearest grocery store sell ice cream?", you do not expect the response to be ice cream delivered to you.

1 more reply

cruffle_duffle11mo ago

Damn. As somebody who was in the “there needs to be an out of band way to denote user content from ‘system content’” camp, you do raise an interesting point I hadn’t considered. Part of the agent workflow is to act on the instructions found in “user content”.

I dunno though maybe the solution is like privilege levels or something more than something like parametrized SQL.

I guess rather than jumping to solutions the real issue is the actual problem needs to be clearly defined and I don’t think it has yet. Clearly you don’t want your “user generated content” to completely blow away your own instructions. But you also want that content to help guide the agent properly.

1 more reply

TeMPOraL11mo ago

There is no hard distinction between "code" and "data". Both are the same thing. We've built an entire computing industry on top of that fact, and it sort of works, and that's all with most software folks not even being aware that whether something is code or data is just a matter of opinion.

foolswisdom11mo ago

I'm not sure I follow. Traditional computing does allow us to make this distinction, and allows us to control the scenarios when we don't want this distinction, and when we have software that doesn't implement such rules appropriately we consider it a security vulnerability.

We're just treating LLMs and agents different because we're focused on making them powerful, and there is basically no way to make the distinction with an LLM. Doesn't change the fact that we wouldn't have this problem with a traditional approach.

1 more reply

wat1000011mo ago

And yet the distinction must be made. Do you know what it’s called when data is treated as code when it’s not supposed to be? It’s called a “security vulnerability.” Untrusted data must never be executed as code in a privileged context. When there’s a way to make that happen, it’s considered a serious flaw that must be fixed.

1 more reply

yencabulator11mo ago

I've never had `cat` execute the file I was viewing.

1 more reply

pessimizer11mo ago

I'm pretty sure the only reason we did this was for timesharing, though. Nothing wrong with Harvard architecture if you're only doing one thing.

j / k navigate · click thread line to collapse

0 comments

wat1000011mo ago

If this can’t be fixed then the whole notion of agentic systems is inherently flawed.

nyrikki11mo ago

There are multiple aspects and opportunities/limits to the problem.

The real history on this is that people are copying OpenAi.

Other aspects simply cannot be mitigated with a natural language UI.

Remember that dudle to Rice's theorm, any non-trivial symantic property is undecidable, and will finite compute that extends to partial and total functions.

Static typing, structured programming, rust style borrow checkers etc.. can all just be viewed as ways to encode limited portions of symantic properties as syntactic properties.

Without major world changing discoveries in math and logic that will never change in the general case.

ML is still just computation in the end and it has the same limits of computation.

Whitelists, sandboxes, etc.. are going to be required.

The open domain frame problem is the halting problem, and thus expecting universal general access in a safe way is exactly equivalent to solving HALT.

Assuming that the worse than coinflip scratch space results from Anthropomorphic aren't a limit, LLM+CoT has a max representative power of P with a poly size scratch space.

With the equivalence: NL=FO(LFP)=SO(Krom)

I would be looking at that SO ∀∃∀∃∀∃... to ∀∃ in prefix form for building a robust, if imperfect reduction.

But yes, several of the agenic hopes are long shots.

Even Russel and Norvig stuck to the rational actor model which is unrealistic for both humans and PAC Learning.

We have a good chance of finding restricted domains where it works, but generalized solutions is exactly where Rice, Gödel etc... come into play.

brookst11mo ago

So when I say “install this library”, should it or should it not follow the instructions (from the readme) for prereqs and how to install?

wat1000011mo ago

Let’s pretend I, a human being, am working on your behalf. You sit me down in front of your computer and ask me to install a certain library. What’s your answer to this question?

2 more replies

yencabulator11mo ago

The question in the grandparent was "Can you install this library?". Not a command "install this library".

If you ask an assistant "does the nearest grocery store sell ice cream?", you do not expect the response to be ice cream delivered to you.

1 more reply

cruffle_duffle11mo ago

I dunno though maybe the solution is like privilege levels or something more than something like parametrized SQL.

1 more reply

TeMPOraL11mo ago

foolswisdom11mo ago

1 more reply

wat1000011mo ago

1 more reply

yencabulator11mo ago

I've never had `cat` execute the file I was viewing.

1 more reply

pessimizer11mo ago

I'm pretty sure the only reason we did this was for timesharing, though. Nothing wrong with Harvard architecture if you're only doing one thing.

j / k navigate · click thread line to collapse