undefined | Better HN

story

0 pointssimonw6mo ago0 comments

Sure, you're not going to get anything close to a Claude Code style agent from a local model (unless you shell out $10,000+ for a 512GB Mac Studio or similar).

This post isn't about building Claude Code - it's about hooking up an LLM to one or two tool calls in order to run something like ping. For an educational exercise like that a model like Qwen 4B should still be sufficient.

0 comments

robot-wrangler6mo ago

The expectation that reasonable people have isn't fully local claude code, that's a strawman. But it's also not ping tools or the simple weather agent that tutorials like to use. It's somewhere in between, isn't that obvious? If you're into evangelism, acknowledging this and actually taking a measured stance would help prevent light skeptics from turning into complete AI-deniers. If you mislead people about one thing, they will assume they are being misled about everything

simonwOP6mo ago

I don't think I was being misleading here.

https://fly.io/blog/everyone-write-an-agent/ is a tutorial about writing a simple "agent" - aka a thing that uses an LLM to call tools in a loop - that can make a simple tool call. The complaint I was responding to here was that there's no point trying this if you don't want to be hooked on expensive APIs. I think this is one of the areas where the existence of tiny but capable local models is relevant - especially for AI skeptics who refuse to engage with this technology at all if it means spending money with companies they don't like.

robot-wrangler6mo ago

I think it is misleading to suggest today that tool-calling for nontrivial stuff really works with local models. It just works in demos because those tools always accept one or two arguments, usually string literals or numbers. In the real world functions take more complex arguments, many arguments, or take a single argument that's an object with multiple attributes, etc. You can begin to work around this stuff by passing function signatures, typing details, and JSON-schemas to set expectations in context, but local models tend to fail at handling this kind of stuff long before you ever hit limits in the context window. There's a reason demos are always using 1 string literal like hostname, or 2 floats like lat/long. It's normal that passing a dictionary with a few strict requirements might need 300 retries instead of 3 to get a tool call that's syntactically correct and properly passed arguments. Actually `ping --help` for me shows like 20 options, and for any attempt to 1:1 map things with more args I think you'd start to see breakdown pretty quickly.

Zooming in on the details is fun but doesn't change the shape of what I was saying before. No need to muddy the water; very very simple stuff still requires very big local hardware or a SOTA model.

1 more reply

j / k navigate · click thread line to collapse

0 comments

robot-wrangler6mo ago

simonwOP6mo ago

I don't think I was being misleading here.

robot-wrangler6mo ago

Zooming in on the details is fun but doesn't change the shape of what I was saying before. No need to muddy the water; very very simple stuff still requires very big local hardware or a SOTA model.

1 more reply

j / k navigate · click thread line to collapse