Apideck CLI – An AI-agent interface with much lower context consumption than MCP (opens in new tab)

(apideck.com)

137 pointsgertjandewilde3mo ago123 comments

123 comments

100 comments · 30 top-level

caust1c3mo ago· 27 in thread

I'm getting tired of everyone saying "MCP is dead, use CLIs!".

Yes, MCP eats up context windows, but agents can also be smarter about how they load the MCP context in the first place, using similar strategy to skills.

The problem with tossing it out entirely is that it leaves a lot more questions for handling security.

When using skills, there's no implicit way to be able to apply policies in the sane way across many different servers.

MCP gives us a registry such that we can enforce MCP chain policies, i.e. no doing web search after viewing financials.

Doing the same with skills is not possible in a programatic and deterministic way.

There needs to be a middle ground instead of throwing out MCP entirely.

ewild3mo ago

I feel like I don't fully understand mcp. I've done research on it but I definitely couldn't explain it. I get lost on the fact that to my knowledge it's a server with API endpoints that are well defined into a json schema then sent the to LLM and the LLM parses that and decides which endpoints to hit (I'm aware some llms use smart calling now so they load the tool name and description but nothing else until it's called). How exactly are you doing the process of stopping the LLM from using web search after it hits a certain endpoint in your MCP server? Or is this referring strictly to when you own the whole workflow where you can then deny websearch capabilities on the next LLM step?

Are there any good docs youve liked to learn about it, or good open source projects you used to get familiar? I would like to learn more

brabel3mo ago

You need to go back to LLM tools. Before MCP, you could write tools for your LLM to use by normally using Python, something like this:

    @tool def do_great_thing(arg: string) -> string:
        // todo

The LLM now understands that to do the great thing, it can just call this function and get some result back that - which it will use to answer some query from the user.

Notice that the tool uses structured inputs/outputs (the types - they can also be "dictionaries", or objects in most languages - giving the LLM powerful capabilities).

Now, imagine you want to write this in any language. What do you do?

Normally, you create some sort of API for that. Something like good old RPC. Which is essentially what MCP does: it defines a JSON-RPC API for tools, but it also adds some useful stuff, like access to static resources, elicitation (ask user for input outside of the LLM's chat) and since the MCP auth spec, an unified authorization system based on OAuth. This gives you a lot of advantages over a CLI, as well as some disadvantages. Both make sense to use. For example, for web usage, you just want the LLM to call Curl! No point making that a MCP server (except perhaps if you want to authorize access to URLs?). However, if you have an API that exposes a lot of stuff (e.g. JIRA) you definitely want a MCP for that. Not only does it get only the access you want to give the LLM instead of using your own credentials directly, now you can have a company wide policy for what can be done by agents when accessing your JIRA (or whatever) system.

A big disadvantage of MCP is that all the metadata to declare the RPC API take a lot of context, but recently agents are smart about that and load that partially and lazily as required, which should fix the problem.

In summary: whatever you do, you'll end up with something like MCP once you introduce "enterprise" users and not just yolo kids giving the LLM access to their browsers with their real credentials and unfiltered access to all their passwords.

user39393823mo ago

For my requirements, over 90% of the LLM integrations and rollouts have it exactly backwards. The only thing you want these agents doing is building modular, testable traditional CLI tools which can then be scripted as easily by a human or agent with almost no context/learning required. Humans must distill the probabalism of agent output into composable deterministic functions.

Pushing opaque probabalistic black boxes into the execution of your day to day operations, communications, whatever it is, is horrible even if it works. At best it’s a pyrrhic victory. I see startups using these agents to mitigate healthcare disputes.

There’s no such thing as a domain that resists modeling but for which you could accept a probabilistic result. Probabilistic must also mean probabilistically acceptable. We have words for the only counter examples: drafting, brainstorming, maybe triage.

1 more reply

thamer3mo ago

There is not a lot to learn to understand the basics, but maybe one step that's not necessarily documented is the overall workflow and why it's arranged this way. You mentioned the LLM "using web search" and it's a related idea: LLMs don't run web searches themselves when you're using an MCP client, they ask the client to do it.

You can think of an MCP server as a process exposing some tools. It runs on your machine communicating via stdin/stdout, or on a server over HTTP. It exposes a list of tools, each tool has a name and named+typed parameters, just like a list of functions in a program. When you "add" an MCP server to Claude Code or any other client, you simply tell this client app on your machine about this list of tools and it will include this list in its requests to the LLM alongside your prompt.

When the LLM receives your prompt and decides that one of the tools listed alongside would be helpful to answer you, it doesn't return a regular response to your client but a "tool call" message saying: "call <this tool> with <these parameters>". Your client does this, and sends back the tool call result to the LLM, which will take this into account to respond to your prompt.

That's pretty much all there is to it: LLMs can't connect to your email or your GitHub account or anything else; your local apps can. MCP is just a way for LLMs to ask clients to call tools and provide the response.

1. You: {message: "hey Claude, how many PRs are open on my GitHub repo foo/bar?", tools: [... github__pr_list(org:string, repo:string) -> [PullRequest], ...] } 2. Anthropic API: {tool_use: {id: 123, name: github__pr_list, input:{org: foo, repo: bar}}} 3. You: {tool_result: {id: 123, content: [list of PRs in JSON]} } 4. Anthropic API: {message: "I see 3 PRs in your repo foo/bar"}

that's it.

If you want to go deeper the MCP website[1] is relatively accessible, although you definitely don't need to know all the details of the protocol to use MCP. If all you need is to use MCP servers and not blow up your context with a massive list of tools that are included with each prompt, I don't think you need to know much more than what I described above.

[1] https://modelcontextprotocol.io/docs/learn/architecture

pmontra3mo ago

Maybe it's because of the example, but if the LLM knows the GitHub CLI and I bet it knows it, shouldn't it be able to run the commands (or type them for us) to count the open PRs on foo/bar?

However I see the potential problem of the LLM not knowing an obscure proprietary API. The traditional solution has been writing documentation, maybe on a popular platform like Postman. In that case the URL of the documentation could be enough, or an export in JSON. It usually contains examples too. I dread having to write and maintain both the documentation for humans and the MCP server for bots.

1 more reply

Toby113mo ago

LLM is not doing the work.. your code is doing the work, LLM is just telling you which of the functions (aka tools) you should run.

web search is also another tool and you can gate it with logic so LLMs don’t go rogue.

that’s kinda simplest explanation i guess

ewild3mo ago

Ok so in a situation like regular orchestration you would essentially layout all possible steps the LLM can take in your code in a big orchestration layer, and if it hits the sensitive endpoint the orchestration that can occur past that will block off web search. In the design that is. But for something like a manus style agent where you're outsourcing all the work but allowing it to hit your MCP it just becomes a regular API the LLM can call

yoyohello133mo ago

It is a weird trend. I see the appeal of Skills over MCP when you are just a solo dev doing your work. MCP is incredibly useful in an organization context when you need to add controls and process. Both are useful. I feel like the anti-MCP push is coming from people who don't need to work in a large org.

9rx3mo ago

> I feel like the anti-MCP push is coming from people who don't need to work in a large org.

Any kind of social push like that is always understood to be something to ignore if you understand why you need to ignore it. Do you agree that a typical solo dev caught in the MCP hype should run the other way, even if it is beneficial to your unique situation?

yoyohello133mo ago

Id agree solo devs can lean toward skills. I liken skills to a sort of bash scripts directory. And for personal stuff I generally use skills only.

krzyk3mo ago

Not sure. Our big org, banned MCPs because they are unsafe, and they have no way to enforce only certain MCPs (in github copilot).

thenewnewguy3mo ago

But skills where you tell the LLM to shell out to some random command are safe? I'm not sure I understand the logic.

3 more replies

mbreese3mo ago

Isn’t it possible to proxy LLM communication and strip out unwanted MCP tool calls from conversations? I mean if you’re going to ban MCPs, you’re probably banning any CLI tooling too, right?

2 more replies

thecopy3mo ago

Shameless plug: im working on a product that aims to solve this: https://www.gatana.ai/

1 more reply

yoyohello133mo ago

We only allow custom MCP servers.

CuriouslyC3mo ago

Skills are just prompts, so policy doesn't apply there. MCP isn't giving you any special policy control there, it's just a capability border. You could do the same thing with a service mesh or any other capability compartmentalization technique.

The only value in MCP is that it's intended "for agents" and it has traction.

consumer4513mo ago

> Yes, MCP eats up context windows, but agents can also be smarter about how they load the MCP context in the first place, using similar strategy to skills.

I have been keeping an eye on MCP context usage with Claude Code's /context command.

When I ran it a couple months ago, supabase used 13.2k tokens all the time, with the search_docs tool using 8k! So, I disabled that tool in my config.

I just ran /context now, and when not being used it uses only ~300 tokens.

I have a question. Does anyone know a good way to benchmark actual MCP context usage in Claude Code now? I just tried a few different things and none of them worked.

robot-wrangler3mo ago

> I'm getting tired of everyone saying "MCP is dead, use CLIs!".

The people saying this and attacking it should first agree about the question.

Are you combining a few tools in the training set into a logical unit to make a cohesive tool-suite, say for reverse engineering or network-debugging? Low stakes for errors, not much on-going development? Great, you just need a thin layer of intelligence on top of stack-overflow and blog-posts, and CLI will probably do it.

Are you trying to weld together basically an AI front-end for an existing internal library or service? Is it something complex enough that you need to scale out and have modular access to? Is it already something you need to deploy/develop/test independently? Oops, there's nothing quite like that in the training set, and you probably want some guarantees. You need a schema, obviously. You can sort of jam that into prompts and prayers, hope for the best with skills, skip validation and risk annotations being ignored, trust that future opaque model-change will be backwards compatible with how skills are even selected/dispatched. Or.. you can use MCP.

Advocating really hard for one or the other in general is just kind of naive.

1 more reply

novok3mo ago

IMO if you want a metadata registry of how actions work so you can make complicated, fragile, ACL rule systems of actions, then make that. That doesn't need to be loaded into a context window to make that work and can be expanded to general API usage, tool usage, cli usage, and so on. You can load a gh cli metadata description system and so on.

MCPs are clunky, difficult to work with and token inefficient and security orgs often have bad incentive design to mostly ignore what the business and devs need to actually do their job, leading to "endpoint management" systems that eat half the system resources and a lot of fig leaf security theatre to systematically disable whatever those systems are doing so people can do their job in an IT equivalent that feels like the TSA.

Thank god we moving away from giving security orgs these fragile tools to attach ball and chains to everyone.

polynomial3mo ago

This is the right framing. The chain policy problem is what happens when you ask the registry to be the entitlement layer.

Here's a longer piece on why the trust boundary has to live at the runtime level, not the interface level, and what that means for MCP's actual job: https://forestmars.substack.com/p/twilight-of-the-mcp-idols

0x0083mo ago

> MCP gives us a registry such that we can enforce MCP chain policies

Do you have some more info on it?

looking up "registry" in the mcp spec will just describe a centrally hosted, npm-like package registry[^1]

[^1]: The MCP Registry is the official centralized metadata repository for publicly accessible MCP servers, backed by major trusted contributors to the MCP ecosystem such as Anthropic, GitHub, PulseMCP, and Microsoft.

skybrian3mo ago

Towards the end of the article, they do write about some things that MCP does better.

il3mo ago

Tool search pretty much completely negates the MCP context window argument.

siva73mo ago

Evidence?

mvrckhckr3mo ago

I agree, and it's context-dependent when to use what (the author mentions use cases for other solutions). I'm glad there are multiple solutions to choose from.

j453mo ago

MCPs are handy in their place. Agents calling CLI locally is much more efficient.

golergka3mo ago

it's not about the tokens. agents can script clis.

hparadiz3mo ago· 13 in thread

10 years from now: "Can you believe they did anything with such a small context window?"

this_user3mo ago

More likely: "Can you believe they were actually trying to use LLMs for this?"

rib3ye3mo ago

OSes and software engs did not end up using less RAM.

1 more reply

lionkor3mo ago

10 years from now: "The next big thing: HENG - Human Engineers! These make mistakes, but when they do, they can just learn from it and move on and never make it again! It's like magic! Almost as smart as GPT-63.3-Fast-Xtra-Ultra-Google23-v2-Mem-Quantum"

agoodusername633mo ago

I would love to live in a world where my coworkers learn from their mistakes

is this Human 2.0? I only have 1.0a beta in the office.

I get the joke but it really does highlight how flimsy the argument is for humans. IME humans frequently make simple errors everywhere they don’t learn from and get things right the first time very rarely. Damn. Sounds like LLMs. And those are only getting better. Humans aren’t.

strbean3mo ago

> Did you know if you ask <X> a question and it doesn't know the answer, sometimes it just makes something up?!

I think maybe a lot of us live in a bubble where the above statement is less frequently true of our peers than average.

cheevly3mo ago

Imagine believing humans don’t make the same mistakes. You live in a different universe than me buddy.

2 more replies

MattGaiser3mo ago

I am kind of already at that point. For all the complaining about context windows being stuffed with MCPs, I am curious what they are up to and how many MCPs they have that this is a problem.

mbreese3mo ago

10 years from now: “what’s a context window?”

sghiassy3mo ago

10 years from now: “come with me if you want to live”

Terminator 2 Clip: https://youtu.be/XTzTkRU6mRY?t=72&si=dmfLNDqpDZosSP4M

berziunas3mo ago

“640K ought to be enough for anybody”

hparadiz3mo ago

I dunno why you're getting down voted. This is funny.

whoisstan3mo ago

Very!

smrtinsert3mo ago

"That was back when models were so slow and weighty they had to use cloud based versions. Now the same LLM power is available in my microwave"

dend3mo ago· 8 in thread

One of the MCP Core Maintainers here, so take this with a boulder of salt if you're skeptical of my biases.

The debate around "MCP vs. CLI" is somewhat pointless to me personally. Use whatever gets the job done. MCP is much more than just tool calling - it also happens to provide a set of consistent rails for an agent to follow. Besides, we as developers often forget that the things we build are also consumed by non-technical folks - I have no desire to teach my parents to install random CLIs to get things done instead of plugging a URI to a hosted MCP server with a well-defined impact radius. The entire security posture of "Install this CLI with access to everything on your box" terrifies me.

The context window argument is also an agent harness challenge more than anything else - modern MCP clients do smart tool search that obviates the entire "I am sending the full list of tools back and forth" mode of operation. At this point it's just a trope that is repeated from blog post to blog post. This blog post too alludes to this and talks about the need for infrastructure to make it work, but it just isn't the case. It's a pattern that's being adopted broadly as we speak.

JyB3mo ago

> modern MCP clients do smart tool search that obviates the entire "I am sending the full list of tools back and forth" mode of operation

This has always surprised me as this always comes up in MCP discussions. To me, it just seem like a matter of updating the protocol to not have that context hungry behaviour. Doesn't seem like an insurmountable problem technically.

Glad you say it has already been addressed. Was the protocol itself updated to reflect that? Or are you just referring to off-spec implementations?

tylerburnam3mo ago

Anthropic solved this problem like 3 AI years ago: https://www.anthropic.com/engineering/code-execution-with-mc...

stavros3mo ago

How come there isn't an mcp://add?url=https://username:password@mcpserver.com/path URL so my browser can open my client to auto-install the MCP yet? I shouldn't have to mess with config files to install an MCP, I should be able to just click a button on the site and have my client pop up and ask if I want to install it.

o_____________o3mo ago

> modern MCP clients do smart tool search that obviates the entire "I am sending the full list of tools back and forth" mode of operation

How, "Dynamic Tool Discovery"? Has this been codified anywhere? I've only see somewhat hacky implementations of this idea

https://github.com/modelcontextprotocol/modelcontextprotocol...

Or are you talking about the pressure being on the client/harnesses as in,

https://platform.claude.com/docs/en/agents-and-tools/tool-us...

dend3mo ago

More of the latter than the former. The protocol itself is constrained to a set of well-defined primitives, but clients can do a bunch of pre-processing before invoking any of them.

polynomial3mo ago

Fully agree.

If you don't change your approach but just use CLI "intead of" MCP, you'll end up with a new spin on the same problems. The guardrails MCP provides (identity, entitlement, multi-principal trust boundaries) still need to exist somewhere.

https://forestmars.substack.com/p/twilight-of-the-mcp-idols

kordlessagain3mo ago

Screw MCP. It’s not even a protocol. It’s a strongly worded suggestion at best.

amzil3mo ago

The post isn't MCP vs CLI. It covers where MCP wins.

> The entire security posture of "Install this CLI with access to everything on your box" terrifies me This is fair for hosted MCPs, However I'm not claiming the CLI is universally more secure. users needs to know what they're doing.

Honestly though, after 20 years of this, the whole thread is debating the wrong layer. A well-designed API works through CLI, MCP, whatever. A bad one won't be saved by typed schemas.

> At this point it's just a trope that is repeated from blog post to blog post

Well, "Use whatever gets the job done" and "it's just a trope" can't both be true. If the CLI gets the job done for some use cases, it's not a trope. It's an option. And I'd argue what's happening is the opposite of a trope. Nobody's hyping CLIs because they're exciting. There's no protocol foundation, no spec committee, no ecosystem to sell into. CLIs are 40-year-old boring technology. When multiple teams independently reach for the boring tool, that's a signal, not a meme.

> This blog post too alludes to this and talks about the need for infrastructure to make it work

When tool search is baked into Claude Code, that's Anthropic building and maintaining the infrastructure for you. The search index, ranking, retrieval pipeline, caching. It didn't disappear. It moved.

And it only works in clients that support it. Try using tool search from a custom Python agent, a bash script, or a CI/CD pipeline. You're back to loading everything.

A CLI doesn't need the client to do anything special. `--help` works everywhere. That's the difference between infrastructure that's been abstracted away for some users and infrastructure that's genuinely not needed.

zamalek3mo ago· 5 in thread

What I've done with my MCPs is turning them into a CLI, except there's still an MCP server that only has the instructions to tell the the agent about the CLI.[1] Claude and GLM-5 seem to have no problems with it.

As a bonus, the entire thing now works as a plain old CLI too - which it honestly should have from the beginning.

[1]: https://github.com/jcdickinson/ferrisfetch/blob/main/cmd/mcp...

patates3mo ago

You don't need a whole server to tell agents that, I think you can just write a skill file or two and be done with it.

zamalek3mo ago

That would work, and is still an option. However I think this makes deployment/installation simpler (which is also why I write my MCPs in go).

antihero3mo ago

Can the MCP tell them how to use the CLI? Surely that would mean less time wasted on discovering it each time.

Going to try this with fastmail-cli and see what happens.

zamalek3mo ago

Yes, MCP instructions are a blob that is injected at the start of the context. That file is prefixed to the agent specific help[1] for the instructions (that is returned if --help is invoked with CLAUDE=1 or AGENT=1).

[1]: https://github.com/jcdickinson/ferrisfetch/blob/main/cmd/age...

antihero3mo ago

Ah cool, that said: If your tool is primarily for Claude Desktop (or mobile if hosted), that surely needs the MCP to actually do anything right?

1 more reply

gertjandewildeOP3mo ago· 5 in thread

We built a unified API with a large surface area and ran into a problem when building our MCP server: tool definitions alone burned 50,000+ tokens before the agent touched a single user message.

The fix that worked for us was giving agents a CLI instead. ~80 tokens in the system prompt, progressive discovery through --help, and permission enforcement baked into the binary rather than prompts.

The post covers the benchmarks (Scalekit's 75-run comparison showed 4-32x token overhead for MCP vs CLI), the architecture, and an honest section on where CLIs fall short (streaming, delegated auth, distribution).

OsrsNeedsf2P3mo ago

How is progressive discovery not more expensive due to the increased number of steps?

BeefySwain3mo ago

I assume because the discovery is branching. If the an agent using the CLI for for GitHub needs to make an issue, it can check the help message for the issue sub-command and go from there, doesn't need to know anything about pull requests, or pipelines, or account configuration, etc, so it doesn't query those subcommands.

Compare this to an MCP, where my understanding is that the entire API usage is injected into the context.

lelanthran3mo ago

> How is progressive discovery not more expensive due to the increased number of steps?

Why not run the discovery (whether MCP or CLI) in a subagent that returns only the relevant tools. I mean, discovery can be done on a local model, right?

zamalek3mo ago

In short: JSON. Plan prose or markdown is way more token efficient than JSON. I think that responding in JSON was always a mistake in the spec; it should have been free-form text (which could then be JSON if required).

iamjackg3mo ago

It depends on what your "currency" is: inference cost vs. models getting dumber/slower with a fuller context.

nicoritschel3mo ago· 3 in thread

While I generally prefer CLI over MCP locally, this is bad outdated information.

The major harnesses like Claude Code + Codex have had tool search for months now.

injidup3mo ago

Can you explain how to take advantage. Is there any specific info from anthropic with regards to context window size and not having to care about MCP?

amzil3mo ago

Fair point on tool search. Claude Code and Codex do have it.

But tool search is solving the symptom, not the cause. You still pay the per-tool token cost for every tool the search returns. And you've added a search step (with its own latency and token cost) before every tool call.

With a CLI, the agent runs `--help` and gets 50-200 tokens of exactly what it needs. No search index, no ranking, no middleware. The binary is the registry.

Tool search makes MCP workable. CLIs make the search unnecessary.

cruffle_duffle3mo ago

Let me guess the command: [error]

Wait, better check help. is it -h? [error]

Nope? Lemme try —-help. [error]

Nope.

How about just “help” [error]

Let me search the web [tons of context and tool calls]

nzoschke3mo ago· 2 in thread

The industry is talking in circles here. All you need is "composability".

UNIX solved this with files and pipes for data, and processes for compute.

AI agents are solving this this with sub-agents for data, and "code execution" for compute.

The UNIX approach is both technically correct and elegant, and what I strongly favor too.

The agent + MCP approach is getting there. But not every harness has sub-agents, or their invocation is non-deterministic, which is where "MCP context bloat" happens.

Source: building an small business agent at https://housecat.com/.

We do have APIs wrapped in MCP. But we only give the agent BASH, an CLI wrapper for the MCPs, and the ability to write code, and works great.

"It's a UNIX system! I know this!"

dirk940183mo ago

Unix approach can be surprisingly powerful.

https://linuxtoaster.com/blog/gradientdescentforcode.html

ycombiredd3mo ago

What's interesting to me is that while it was obvious to all of us who came to think in the Unix Way, that insofar as composability, usage discoverability, and gobs of documentation in posts and man pages that are hugely represented in training corpora for LLMs, that the CLI is a great fit for LLM tool use, it seems only a recent trend to acknowledge this (and also the next hype wave, perhaps.)

Also interesting that while the big vendors are following this trend and are now trying to take a lead in it, they still suggest things like "but use a JSON schema" (the linked article does a bit of the same - acknowledging that incremental learning via `--help` is useful AND can be token-conserving (exception being that if they already "know" the correct pattern, they wouldn't need to use tokens to learn it, so there is a potential trade-off), they are also suggesting that LLMs would prefer to receive argument knowledge in json rather than in plain language, even though the entire point of an LLM is for understand and create plain language. Seemed dubious to me, and a part of me wondered if that advice may be nonsense motivated by desire to sell more token use. I'm only partially kidding and I'm still dubious of the efficacy.

* Here's a TL;DR for anyone who wants to skip the rest of this long message: I ran an LLM CLI eval in the form of a constructed CTF. Results and methodology are in the two links in the section linked: https://github.com/scottvr/jelp?tab=readme-ov-file#what-else

Anyhow... I had been experimenting with the idea of having --help output json when used by a machine, and came up with a simple module that exposes `--help` content as json, simply by adding a `--jelp` argument to any tool that already uses argparse.

In the process, I started testing, to see if all this extra machine-readable content actually improved performance, what it did to token use, etc. While I was building out test, trying to settle on legitimate and fair ways to come to valid conclusions, I learned of the OpenCLI schema draft, so I altered my `jelp` output to fit that schema, and set about documenting the things I found lacking from the schema draft, meanwhile settling to include these arg-related items as metadata in the output.

I'll get to the point. I just finished cleaning the output up enough to put it in a public repo, because my intent is to share my findings with the OpemCLI folks, in hopes that they'll consider the gaps in their schema compared to what's commonly in use, but at the same time, what came as a secondary thought in service of this little tool I called "jelp", is a benchmarking harness (and the first publishable results from it), the to me, are quite interesting and I would be happy if others found it to be and added to the existing test results with additional runs, models, or ideas for the harness, or criticism about the validity of the method, etc.

The evaluation harness uses constructed CLI fixtures arranged as little CLI CTF's, where the LLMs demonstrate their ability to use an unknown CLI be capturing a "flag" that they'll need to discover by using the usage help, and a trail of learned arguments.

My findings at first confirmed my intuitions, which was disappointing but unsurprising. When testing with GPT-4.1-mini, no manner of forcing them to receive info about the CLI via json was more effective than just letting them use the human-friendly plain English output of --help, and in all cases the JSON versions burned more tokens. I was able to elicit better performance by some measurements from 5.1-mini, but again the tradeoff was higher token burn.

I'll link straight to the part of the README that shows one table of results, and contains links to the LLM CLI CTF part of the repo, as well as the generated report after the phase-1 runs; all the code to reproduce or run your own variation is there (as well as the code for the jelp module, if there is any interest, but it's the CLI CTF eval that I expect is more interesting to most.)

https://github.com/scottvr/jelp?tab=readme-ov-file#what-else

binarymax3mo ago· 2 in thread

Why not use skills? They follow a three-tier loading approach, and you can stick an MCP as part of the toolset for the skills, so it will only load it when the skill is selected.

See the progressive disclosure section in the skills docs: https://agentskills.io/what-are-skills

forrestthewoods3mo ago

Three tiers? I thought it was two?

binarymax3mo ago

Discovery, Activation, Execution as per the linked doc

machinecontrol3mo ago· 2 in thread

The trend is obviously towards larger and larger context windows. We moved from 200K to 1M tokens being standard just this year.

This might be a complete non issue in 6 months.

amzil3mo ago

Context windows getting bigger doesn't make the economics go away. Tokens still cost money. 50K tokens of schemas at 1M context is the same dollar cost as 50K tokens at 200K context, you just have more room left over.

The pattern with every resource expansion is the same: usage scales to fill it. Bigger windows mean more integrations connected, not leaner ones. Progressive disclosure is cheaper at any window size.

magospietato3mo ago

Context caching deals with a lot of the cost argument here.

1 more reply

kristjansson3mo ago· 1 in thread

CLIs are great for some applications! But 'progressive disclosure' means more mistakes to be corrected and more round trips to the model - every time[1] you use the tool in a new thread. You're trading latency for lower cost/more free context. That might be great! But it might not be, and the opposite trade (more money/less context for lower latency) makes a lot of sense for some applications. esp. if the 'more money' part can be amortized over lots of users by keeping the tool definitions block cached.

[1]: one might say 'of course you can just add details about the CLI to the prompt' ... which reinvents MCP in an ad hoc underspecified non-portable mode in your prompt.

amzil3mo ago

This is a fair trade-off and the post should probably be more explicit about it. You're right that progressive disclosure trades latency for cost and context space. For some workloads that's the wrong trade.

The amortization point is interesting too. If you're running a support agent that calls the same 5 tools thousands of times a day, paying the schema cost once and caching it makes total sense. The post covers this in the "tightly scoped, high-frequency tools" section but your framing of it as a caching problem is cleaner.

On the footnote: guilty as charged, partially. The ~80 token prompt is a minimal bootstrap, not a full schema. It tells the agent how to discover, not what to call. But yeah, the moment you start expanding that prompt with specific flags and patterns, you're drifting toward a hand-rolled tool definition. The difference is where you stop. 80 tokens of "here's how to explore" is different from 10,000 tokens of "here's everything you might ever need." But the line between the two is blurrier than the post implies. Fair point.

bkummel3mo ago· 1 in thread

There's already an open source tool that does exactly the same thing: https://github.com/knowsuchagency/mcp2cli

amzil3mo ago

Great tool, however we went to a dedicated CLI client (think gh, aws, stripe) in Go.

ekropotin3mo ago· 1 in thread

Let me guess - another article about how CLI s are superior to MCP?

kayig3mo ago

I k know

agentictrustkit3mo ago

I like that everyone keeps separating "capability" from "authority" because they get conflated in a lot of agent-centered tooling.

CLI vs MCP choice mostly changes the HOW as a side effect. It doesn't answer the bigger question and probably harder one: who delegated the rigtht to cause that effect, for how long, and with what scope? Just like with people, you need a policy decision that's independent. It should be revocable and auditable.

One way that I look at it is with these long-running agents should look less like a script and more like an employee. You wouldn't give them the master key hoping they behave well. You'd give specific access and in stages probably. That's what I think we're missing with our agents is giving them appropriate authority, delegated by an owner with a audit trail

robot-wrangler3mo ago

> Limit integrations → agent can only talk to a few services

The idea that people see this as one horn of a trilemma instead of just good practice is a bit strange. Who would complain that every import isn't a star-import? Bring in what you need at first, then load new things dynamically with good semantics for cascade / drill-down. Let's maybe abandon simple classics like namespacing and the unix philsophy for the kitchen-sink approach after the kitchen-sink thing is shown to work.

dtraub3mo ago

"You'll end up with something like MCP once you introduce enterprise users" - yeah. The token efficiency debate is a single-developer optimization. The moment you introduce teams or compliance requirements, the question shifts to who manages the credentials.

With CLI, it's your machine, your keys. With direct API calls, keys live wherever the agent runs. Both work until a contractor leaves and their laptop still has active keys for your repos, your internal docs, and your CRM.

Remote MCP over streamable HTTP gives you a centralized auth layer. One SSO integration, one revocation point, one audit trail.

I wrote about this angle here: https://dev.to/dennistraub/missing-from-the-mcp-debate-who-h...

1 more reply

TheTaytay3mo ago

I’m a huge fan of CLIs over MCP for many things, and I love asking Claude Code to take an API, wrap it in a CLI, and make a skill for me. The ergonomics for the agent and the human are fantastic, and you get all of the nice composability of command line Unix build in.

However, MCPs have some really nice properties that CLIs generally don’t, or that are harder to solve for. Most notably, making API secrets available to the CLI, but not to the agent, is quite tricky. Even in this example, the options are env variables (which are a prompt injection away from dumping), or a credentials file (better, but still very much accessible to the agent if it were asked).

MCPs give you a “standard” way of loading and configuring a set of tools/capabilities into a running MCP server (locally or remotely), outside of the agent’s process tree. This allows you to embed your secrets in the MCP server, via any method you choose, in a way that is difficult or impossible for the agent to dump even if it goes rogue.

My efforts to replicate that secure setup for a CLI have either made things more complicated (using a different user for running CLIs so that you can rely upon Linux file permissions to hide secrets), or start to rhyme with MCP (a memory-resident socket server started before the CLI that the CLI can talk to, much like docker.sock or ssh-agent)

dirk940183mo ago

MCP is a bit of a rube goldberg machine. Unix solved that problem. Pipe text in, get text out, discover capabilities incrementally. The fact that we need benchmarks to prove CLIs use fewer tokens than dumping 55k of JSON schema upfront is embarrassing. toast is just pipe stuff to an LLM and let stdin/stdout be the protocol. No schema tax, no connection lifecycle, no tool registry middleware to manage your middleware. The thing is that AIs are just not good at outputting structure, like people, json isn't natural.

tacone3mo ago

Actually what it seems to tackle at its core is discoverability. Which should be built in in each MCP server as it's not that difficult, instead, we see MCP servers with 50+ methods.

Much easier:

    { action: 'help' }

    { action: 'projects.help' }

    { action: 'projects.get', payload: { id: xxxx-xx-x } }

And you get the very same discoverability.

There are other interesting capabilities though, like built in permissions based on HTTP verb, that might be useful to someone.

bazhand3mo ago

I ran into this exact problem building a MCP server. 85 tools in experimental mode, ~17k tokens just for the tool manifest before any work starts.

The fix I (well Codex actually) landed on was toolset tiers (minimal/authoring/experimental) controlled by env var, plus phase-gating, now tools are registered but ~80% are "not connected" until you call _connect. The effective listed surface stays pretty small.

Lazy loading basically, not a new concept for people here.

BTAQA3mo ago

Using MCP daily as a solo founder with Claude Code. The "consistent rails" point resonates. The value isn't just tool calling, it's that the agent knows how to behave within a defined boundary. The security posture argument is underrated too. Giving a CLI unrestricted box access vs a hosted MCP server with scoped permissions is a completely different risk profile.

sim04ful3mo ago

Graphql introspection queries would be a really neat application for LLM calls

austinhutch3mo ago

> Not a protocol error, not a bad tool call. The connection never completed.

Very interesting topic, but this LLM structure is instant anthema I just have to stop reading once I smell it.

drewbitt3mo ago

https://github.com/RhysSullivan/executor

enraged_camel3mo ago

With context windows starting to get much larger (see the recent 1M context size for Claude models), I think this will be a non-issue very soon.

techcam3mo ago

We ran into something similar with API costs — small changes in behavior can have surprisingly large downstream effects.

m3kw93mo ago

The thing with CLIs is that you also need to return results efficiently. It if both MCP and CLI return results efficiently, CLI wins

Havoc3mo ago

Getting LLMs to reliably trigger CLI functions is quite hard in my experience though especially if it’s a custom tool

kayig3mo ago

Hey

mt42or3mo ago

Tired of this shit. Be less stupid.

rirze3mo ago

At this point, I feel like MCP servers are just not feasible at the current level of context windows and LLMs. Good idea, but we're way too early.

j / k navigate · click thread line to collapse

123 comments

100 comments · 30 top-level

caust1c3mo ago· 27 in thread

I'm getting tired of everyone saying "MCP is dead, use CLIs!".

Yes, MCP eats up context windows, but agents can also be smarter about how they load the MCP context in the first place, using similar strategy to skills.

The problem with tossing it out entirely is that it leaves a lot more questions for handling security.

When using skills, there's no implicit way to be able to apply policies in the sane way across many different servers.

MCP gives us a registry such that we can enforce MCP chain policies, i.e. no doing web search after viewing financials.

Doing the same with skills is not possible in a programatic and deterministic way.

There needs to be a middle ground instead of throwing out MCP entirely.

ewild3mo ago

Are there any good docs youve liked to learn about it, or good open source projects you used to get familiar? I would like to learn more

brabel3mo ago

You need to go back to LLM tools. Before MCP, you could write tools for your LLM to use by normally using Python, something like this:

    @tool def do_great_thing(arg: string) -> string:
        // todo

The LLM now understands that to do the great thing, it can just call this function and get some result back that - which it will use to answer some query from the user.

Notice that the tool uses structured inputs/outputs (the types - they can also be "dictionaries", or objects in most languages - giving the LLM powerful capabilities).

Now, imagine you want to write this in any language. What do you do?

user39393823mo ago

1 more reply

thamer3mo ago

that's it.

[1] https://modelcontextprotocol.io/docs/learn/architecture

pmontra3mo ago

Maybe it's because of the example, but if the LLM knows the GitHub CLI and I bet it knows it, shouldn't it be able to run the commands (or type them for us) to count the open PRs on foo/bar?

1 more reply

Toby113mo ago

LLM is not doing the work.. your code is doing the work, LLM is just telling you which of the functions (aka tools) you should run.

web search is also another tool and you can gate it with logic so LLMs don’t go rogue.

that’s kinda simplest explanation i guess

ewild3mo ago

yoyohello133mo ago

9rx3mo ago

> I feel like the anti-MCP push is coming from people who don't need to work in a large org.

yoyohello133mo ago

Id agree solo devs can lean toward skills. I liken skills to a sort of bash scripts directory. And for personal stuff I generally use skills only.

krzyk3mo ago

Not sure. Our big org, banned MCPs because they are unsafe, and they have no way to enforce only certain MCPs (in github copilot).

thenewnewguy3mo ago

But skills where you tell the LLM to shell out to some random command are safe? I'm not sure I understand the logic.

3 more replies

mbreese3mo ago

Isn’t it possible to proxy LLM communication and strip out unwanted MCP tool calls from conversations? I mean if you’re going to ban MCPs, you’re probably banning any CLI tooling too, right?

2 more replies

thecopy3mo ago

Shameless plug: im working on a product that aims to solve this: https://www.gatana.ai/

1 more reply

yoyohello133mo ago

We only allow custom MCP servers.

CuriouslyC3mo ago

The only value in MCP is that it's intended "for agents" and it has traction.

consumer4513mo ago

> Yes, MCP eats up context windows, but agents can also be smarter about how they load the MCP context in the first place, using similar strategy to skills.

I have been keeping an eye on MCP context usage with Claude Code's /context command.

When I ran it a couple months ago, supabase used 13.2k tokens all the time, with the search_docs tool using 8k! So, I disabled that tool in my config.

I just ran /context now, and when not being used it uses only ~300 tokens.

I have a question. Does anyone know a good way to benchmark actual MCP context usage in Claude Code now? I just tried a few different things and none of them worked.

robot-wrangler3mo ago

> I'm getting tired of everyone saying "MCP is dead, use CLIs!".

The people saying this and attacking it should first agree about the question.

Advocating really hard for one or the other in general is just kind of naive.

1 more reply

novok3mo ago

Thank god we moving away from giving security orgs these fragile tools to attach ball and chains to everyone.

polynomial3mo ago

This is the right framing. The chain policy problem is what happens when you ask the registry to be the entitlement layer.

0x0083mo ago

> MCP gives us a registry such that we can enforce MCP chain policies

Do you have some more info on it?

looking up "registry" in the mcp spec will just describe a centrally hosted, npm-like package registry[^1]

skybrian3mo ago

Towards the end of the article, they do write about some things that MCP does better.

il3mo ago

Tool search pretty much completely negates the MCP context window argument.

siva73mo ago

Evidence?

mvrckhckr3mo ago

I agree, and it's context-dependent when to use what (the author mentions use cases for other solutions). I'm glad there are multiple solutions to choose from.

j453mo ago

MCPs are handy in their place. Agents calling CLI locally is much more efficient.

golergka3mo ago

it's not about the tokens. agents can script clis.

hparadiz3mo ago· 13 in thread

10 years from now: "Can you believe they did anything with such a small context window?"

this_user3mo ago

More likely: "Can you believe they were actually trying to use LLMs for this?"

rib3ye3mo ago

OSes and software engs did not end up using less RAM.

1 more reply

lionkor3mo ago

agoodusername633mo ago

I would love to live in a world where my coworkers learn from their mistakes

is this Human 2.0? I only have 1.0a beta in the office.

strbean3mo ago

> Did you know if you ask <X> a question and it doesn't know the answer, sometimes it just makes something up?!

I think maybe a lot of us live in a bubble where the above statement is less frequently true of our peers than average.

cheevly3mo ago

Imagine believing humans don’t make the same mistakes. You live in a different universe than me buddy.

2 more replies

MattGaiser3mo ago

I am kind of already at that point. For all the complaining about context windows being stuffed with MCPs, I am curious what they are up to and how many MCPs they have that this is a problem.

mbreese3mo ago

10 years from now: “what’s a context window?”

sghiassy3mo ago

10 years from now: “come with me if you want to live”

Terminator 2 Clip: https://youtu.be/XTzTkRU6mRY?t=72&si=dmfLNDqpDZosSP4M

berziunas3mo ago

“640K ought to be enough for anybody”

hparadiz3mo ago

I dunno why you're getting down voted. This is funny.

whoisstan3mo ago

Very!

smrtinsert3mo ago

"That was back when models were so slow and weighty they had to use cloud based versions. Now the same LLM power is available in my microwave"

dend3mo ago· 8 in thread

One of the MCP Core Maintainers here, so take this with a boulder of salt if you're skeptical of my biases.

JyB3mo ago

> modern MCP clients do smart tool search that obviates the entire "I am sending the full list of tools back and forth" mode of operation

Glad you say it has already been addressed. Was the protocol itself updated to reflect that? Or are you just referring to off-spec implementations?

tylerburnam3mo ago

Anthropic solved this problem like 3 AI years ago: https://www.anthropic.com/engineering/code-execution-with-mc...

stavros3mo ago

o_____________o3mo ago

> modern MCP clients do smart tool search that obviates the entire "I am sending the full list of tools back and forth" mode of operation

How, "Dynamic Tool Discovery"? Has this been codified anywhere? I've only see somewhat hacky implementations of this idea

https://github.com/modelcontextprotocol/modelcontextprotocol...

Or are you talking about the pressure being on the client/harnesses as in,

https://platform.claude.com/docs/en/agents-and-tools/tool-us...

dend3mo ago

More of the latter than the former. The protocol itself is constrained to a set of well-defined primitives, but clients can do a bunch of pre-processing before invoking any of them.

polynomial3mo ago

Fully agree.

https://forestmars.substack.com/p/twilight-of-the-mcp-idols

kordlessagain3mo ago

Screw MCP. It’s not even a protocol. It’s a strongly worded suggestion at best.

amzil3mo ago

The post isn't MCP vs CLI. It covers where MCP wins.

Honestly though, after 20 years of this, the whole thread is debating the wrong layer. A well-designed API works through CLI, MCP, whatever. A bad one won't be saved by typed schemas.

> At this point it's just a trope that is repeated from blog post to blog post

> This blog post too alludes to this and talks about the need for infrastructure to make it work

And it only works in clients that support it. Try using tool search from a custom Python agent, a bash script, or a CI/CD pipeline. You're back to loading everything.

zamalek3mo ago· 5 in thread

As a bonus, the entire thing now works as a plain old CLI too - which it honestly should have from the beginning.

[1]: https://github.com/jcdickinson/ferrisfetch/blob/main/cmd/mcp...

patates3mo ago

You don't need a whole server to tell agents that, I think you can just write a skill file or two and be done with it.

zamalek3mo ago

That would work, and is still an option. However I think this makes deployment/installation simpler (which is also why I write my MCPs in go).

antihero3mo ago

Can the MCP tell them how to use the CLI? Surely that would mean less time wasted on discovering it each time.

Going to try this with fastmail-cli and see what happens.

zamalek3mo ago

[1]: https://github.com/jcdickinson/ferrisfetch/blob/main/cmd/age...

antihero3mo ago

Ah cool, that said: If your tool is primarily for Claude Desktop (or mobile if hosted), that surely needs the MCP to actually do anything right?

1 more reply

gertjandewildeOP3mo ago· 5 in thread

We built a unified API with a large surface area and ran into a problem when building our MCP server: tool definitions alone burned 50,000+ tokens before the agent touched a single user message.

OsrsNeedsf2P3mo ago

How is progressive discovery not more expensive due to the increased number of steps?

BeefySwain3mo ago

Compare this to an MCP, where my understanding is that the entire API usage is injected into the context.

lelanthran3mo ago

> How is progressive discovery not more expensive due to the increased number of steps?

Why not run the discovery (whether MCP or CLI) in a subagent that returns only the relevant tools. I mean, discovery can be done on a local model, right?

zamalek3mo ago

iamjackg3mo ago

It depends on what your "currency" is: inference cost vs. models getting dumber/slower with a fuller context.

nicoritschel3mo ago· 3 in thread

While I generally prefer CLI over MCP locally, this is bad outdated information.

The major harnesses like Claude Code + Codex have had tool search for months now.

injidup3mo ago

Can you explain how to take advantage. Is there any specific info from anthropic with regards to context window size and not having to care about MCP?

amzil3mo ago

Fair point on tool search. Claude Code and Codex do have it.

With a CLI, the agent runs `--help` and gets 50-200 tokens of exactly what it needs. No search index, no ranking, no middleware. The binary is the registry.

Tool search makes MCP workable. CLIs make the search unnecessary.

cruffle_duffle3mo ago

Let me guess the command: [error]

Wait, better check help. is it -h? [error]

Nope? Lemme try —-help. [error]

Nope.

How about just “help” [error]

Let me search the web [tons of context and tool calls]

nzoschke3mo ago· 2 in thread

The industry is talking in circles here. All you need is "composability".

UNIX solved this with files and pipes for data, and processes for compute.

AI agents are solving this this with sub-agents for data, and "code execution" for compute.

The UNIX approach is both technically correct and elegant, and what I strongly favor too.

The agent + MCP approach is getting there. But not every harness has sub-agents, or their invocation is non-deterministic, which is where "MCP context bloat" happens.

Source: building an small business agent at https://housecat.com/.

We do have APIs wrapped in MCP. But we only give the agent BASH, an CLI wrapper for the MCPs, and the ability to write code, and works great.

"It's a UNIX system! I know this!"

dirk940183mo ago

Unix approach can be surprisingly powerful.

https://linuxtoaster.com/blog/gradientdescentforcode.html

ycombiredd3mo ago

https://github.com/scottvr/jelp?tab=readme-ov-file#what-else

binarymax3mo ago· 2 in thread

Why not use skills? They follow a three-tier loading approach, and you can stick an MCP as part of the toolset for the skills, so it will only load it when the skill is selected.

See the progressive disclosure section in the skills docs: https://agentskills.io/what-are-skills

forrestthewoods3mo ago

Three tiers? I thought it was two?

binarymax3mo ago

Discovery, Activation, Execution as per the linked doc

machinecontrol3mo ago· 2 in thread

The trend is obviously towards larger and larger context windows. We moved from 200K to 1M tokens being standard just this year.

This might be a complete non issue in 6 months.

amzil3mo ago

The pattern with every resource expansion is the same: usage scales to fill it. Bigger windows mean more integrations connected, not leaner ones. Progressive disclosure is cheaper at any window size.

magospietato3mo ago

Context caching deals with a lot of the cost argument here.

1 more reply

kristjansson3mo ago· 1 in thread

[1]: one might say 'of course you can just add details about the CLI to the prompt' ... which reinvents MCP in an ad hoc underspecified non-portable mode in your prompt.

amzil3mo ago

bkummel3mo ago· 1 in thread

There's already an open source tool that does exactly the same thing: https://github.com/knowsuchagency/mcp2cli

amzil3mo ago

Great tool, however we went to a dedicated CLI client (think gh, aws, stripe) in Go.

ekropotin3mo ago· 1 in thread

Let me guess - another article about how CLI s are superior to MCP?

kayig3mo ago

I k know

agentictrustkit3mo ago

I like that everyone keeps separating "capability" from "authority" because they get conflated in a lot of agent-centered tooling.

robot-wrangler3mo ago

> Limit integrations → agent can only talk to a few services

dtraub3mo ago

Remote MCP over streamable HTTP gives you a centralized auth layer. One SSO integration, one revocation point, one audit trail.

I wrote about this angle here: https://dev.to/dennistraub/missing-from-the-mcp-debate-who-h...

1 more reply

TheTaytay3mo ago

dirk940183mo ago

tacone3mo ago

Actually what it seems to tackle at its core is discoverability. Which should be built in in each MCP server as it's not that difficult, instead, we see MCP servers with 50+ methods.

Much easier:

    { action: 'help' }

    { action: 'projects.help' }

    { action: 'projects.get', payload: { id: xxxx-xx-x } }

And you get the very same discoverability.

There are other interesting capabilities though, like built in permissions based on HTTP verb, that might be useful to someone.

bazhand3mo ago

I ran into this exact problem building a MCP server. 85 tools in experimental mode, ~17k tokens just for the tool manifest before any work starts.

Lazy loading basically, not a new concept for people here.

BTAQA3mo ago

sim04ful3mo ago

Graphql introspection queries would be a really neat application for LLM calls

austinhutch3mo ago

> Not a protocol error, not a bad tool call. The connection never completed.

Very interesting topic, but this LLM structure is instant anthema I just have to stop reading once I smell it.

drewbitt3mo ago

https://github.com/RhysSullivan/executor

enraged_camel3mo ago

With context windows starting to get much larger (see the recent 1M context size for Claude models), I think this will be a non-issue very soon.

techcam3mo ago

We ran into something similar with API costs — small changes in behavior can have surprisingly large downstream effects.

m3kw93mo ago

The thing with CLIs is that you also need to return results efficiently. It if both MCP and CLI return results efficiently, CLI wins

Havoc3mo ago

Getting LLMs to reliably trigger CLI functions is quite hard in my experience though especially if it’s a custom tool

kayig3mo ago

Hey

mt42or3mo ago

Tired of this shit. Be less stupid.

rirze3mo ago

At this point, I feel like MCP servers are just not feasible at the current level of context windows and LLMs. Good idea, but we're way too early.

j / k navigate · click thread line to collapse