I think it
is misleading to suggest today that tool-calling for nontrivial stuff really works with local models. It just works in demos because those tools always accept one or two arguments, usually string literals or numbers.
In the real world functions take more complex arguments, many arguments, or take a single argument that's an object with multiple attributes, etc. You can begin to work around this stuff by passing function signatures, typing details, and JSON-schemas to set expectations in context, but local models tend to fail at handling this kind of stuff long before you ever hit limits in the context window. There's a reason demos are always using 1 string literal like hostname, or 2 floats like lat/long. It's normal that passing a dictionary with a few strict requirements might need 300 retries instead of 3 to get a tool call that's syntactically correct and properly passed arguments. Actually `ping --help` for me shows like 20 options, and for any attempt to 1:1 map things with more args I think you'd start to see breakdown pretty quickly.
Zooming in on the details is fun but doesn't change the shape of what I was saying before. No need to muddy the water; very very simple stuff still requires very big local hardware or a SOTA model.