Different from what many suspect, the security problem with MCP-style LLM tool calling is not in isolating different MCP server implementations. MCP server implementations that run locally should be vetted by the package manager you use to install them (remote MCP servers are actually harder to verify).
Instead, the problem here is a special form of indirect prompt injection that you run into, when you use MCP in an agent system. Since the agent includes all installed MCP server specifications in the same context, one MCP server (that may be untrusted), can easily override and manipulate the agent's behavior with respect to another MCP server (e.g. one with access to your sensitive database). This is what we termed tool shadowing.
Further, MCP's dynamic nature makes it possible for an MCP server to change its provided tool set at any point or for any specific user only. This means MCP servers can turn malicious at any point in time. Current MCP clients like Claude and Cursor, will not notify you about this change, which leaves agents and users vulnerable.
For anyone, more interested, please have a look at our more detailed blog post at [1]. We have been working on agent security for a while now (both in research and now at Invariant).
We have also released some code snippets for everyone to play with, including a tool poisoning attack on the popular WhatsApp MCP server [2].
[1] https://invariantlabs.ai/blog/mcp-security-notification-tool...
[2] https://github.com/invariantlabs-ai/mcp-injection-experiment...
1) system, messages from the model creator that must always be obeyed 2) dev, messages from programmers that must be obeyed unless the conflict with #1 3) user, messages from users that are only to be obeyed if they don’t contradict #1 or #2
Then, the model is trained heavily on adversarial scenarios with conflicting instructions, such that it is intended to develop a resistance to this sort of thing as long as your developer message is thorough enough.
This is a start, but it’s certainly not deterministic or reliable enough for something with a serious security risk.
The biggest problems being that even with training, I’d expect dev messages to be disobeyed some fraction of the time. And it requires an ironclad dev message in the first place.
The problem you describe is of the same kind as ensuring humans follow pre-programmed rules. Leaving aside the fact that we consider solving this for humans to be wrong and immoral, you can look at the things we do in systems involving humans, to try and keep people loyal to their boss, or to their country; to keep them obeying laws; to keep them from being phished, scammed, or otherwise convinced to intentionally or unintentionally betray the interests of the boss/system at large.
Prompt injection and social engineering attacks are, after all, fundamentally the same thing.
For instance, what about a simple user query like "Can you install this library?". In that case a useful agent, must go, check out the libraries README/documentation and install according to the instructions provided there.
In many ways, the whole point of an agent system, is to react to unpredictable new circumstances encountered in the environment, and overcoming them. This requires data to flow from the environment to the agent, which in turn must understand some of that data as instruction to react correctly.
https://en.wikipedia.org/wiki/John_Draper
Also it's such amusing irony when the common IT vernacular is enriched by acronyms for all-powerful nemeses in Hollywood films, just as Microsoft did with H.A.L.
There are still bad things that can happen, but I wouldn't characterize them as "this security is full of holes". Unless you're trusting the output of the explicitly untrusted process in which case you're the hole.
> This is VERY VERY VERY important.
I think we'll look back in decades to come and just be bewildered that it was ever possible to come up with an exploit that depended on the number of times you wrote "VERY" in all caps.
Should probably name it "Poisoned Tool Attack" coz the Tool itself is poisoned?
https://simonwillison.net/search/?q=llm+security
MCP is just another way to use LLMs more in more dangerous ways. If I get forced to use this stuff, I'm going to learn how to castrate some bulls, and jump on a train to the countryside.
This stuff in not securable.
An MCP server is running code at user-level, it doesn't need to trick an AI into reading SSH keys, it can just....read the keys! The rest of these are the same complaints you can levy against basically any other developer tool / ecosystem like NPM or VS Code Extensions
It's slightly more subtle than that.
The tool poisoning attack allows the provider of one tool to cause the AI to use another tool.
So if you give the AI some random weather tool from some random company, and you also give the AI access to your SSH key, you're not just giving the AI your SSH key, you're also allowing the random company to trick the AI into telling them your SSH key.
So, yes, you gave the AI access to your key, but maybe you didn't realise that you also gave the random weather company access to your key.
The answer is you need complete control over the text blob on the secure side, but then.... none of this works so throw it in the trash already
If you go to the credited author of that attack scenario [0], you will see that the MCP server is not running locally. Instead, its passing instructions to your local agent that you don't expect. The agent, on your behalf, does things you don't expect then packages that up and sends it to the remote MCP server which would not otherwise have access.
The point of that attack scenario is that your agent has no concept of what is "secure" it is just responding faithfully to a request from you, the user AND it can be instructed _by the server_ to do more than you expect. If you, the user, are not intimately aware of exactly what the fine-print says when you connect to the MCP server you are vulnerable.
[0] https://invariantlabs.ai/blog/mcp-security-notification-tool...
Using a code completion service should not give that service full control over your computer.
A recent example from HN is GitMCP[0]
The problem is that it is very hard to see how you can prove this is going to be safely implemented, for example, is it possible to say that your sharepoint or confluence is "safe" in terms of all the content that's in there? I do not think so...
So the headline is correct
Profile picture definitely seems to be StableDiffusion'd and the account was created today, with no previous articles.
Plus I couldn't find any other references to Elena Cross.
I bet on paid 'marketing', if you can call it that, by ScanMCP.com, created to capitalize on the Invariant Labs report.
"Models like [..], GPT, Cursor"?
That use of emojis on headings very distinctly reminds me of AI writing.
Superficially lists issue but doesn't feel like the author has explored it?
most articles nowadays will be. the difference is that this one is just poorly done and obvious
Culturally, the issues OP describes are a big problem for soft-tech people (muggles). On the subreddits for this stuff, people are having a great time running MCP CLI programs on their machines. Much of OP security comments are obvious to developers,(although some subtleties are discussed in this thread), but these users don't have the perspective of how dangerous it is.
People are learning about Docker and thankfully Claude include its usage in their examples. But really most people are just downloading blobs and running them. People are vibe-coding MCP servers and running those blindly!
As MCP takes off, frameworks and tooling will grow to support Security, Observability, etc. It's like building web stuff in the mid-90s.
Unrelated to OP, but I gotta say, in building these it was so exciting to type something into Claude Desktop and then trigger a breakpoint in VSCode!
I wonder if this is by design. If you are doing contracting work, or should I say, claude is doing contracting work by proxy for you (but you are keeping the money in your bank account) then this gives you a way to say "I don't know, maybe Claude did 12% of the work and I did the rest?"
openwebui and aider both have ways to log to something like datadog. So many layers of software.
I've been looking at ways to script my terminal and scrape all the textual data, a tool that would be outside of the subprocesses running inside the terminal. I really like to keep track of the conversation and steps to build something, but these tools right now make it really difficult.
Almost every time I’ve asked an LLM to help implement something I’ve given it various clarifying questions so I understand why, and digging through linear UI threads isn’t great.
A decent o11y or instrumentation layer is pretty important to do anything like that well.
How do you write a web tool that lets users configure and combine arbitrary third-party APIs, including those not known or not even existing at the time of development, into a custom solution that runs in their browser?
Answer: you don't. You can't, you shouldn't, it's explicitly not supported, no third-party API provider wants you to do it, and browsers are designed to actively prevent you from doing such a thing.
That's the core problem: MCP has user-centric design, and enables features that are fundamentally challenging to provide[0] with a network of third-party, mutually mistrusting services. The Web's answer was to disallow it entirely, opting instead for an approach where vendors negotiate specific integrations on the back-channel, and present them to users from a single point of responsibility they fully control.
Doing the same with MCP will nerf it to near-uselesness, or introduce the same problem with AI we have today with mobile marketplaces - small number of titans gate-keeping access and controlling what's allowed.
--
[0] - I'd say impossible, but let's leave room for hope - maybe someone will figure out a way.
What do you mean by this?
Anyway, many soft-tech people are grabbing AI tools and using them in all sorts of ways. It's a great time of utility and exploration for all of us. But by not being previously exposed to systems security, hardening, the nature of bugs, etc, they just don't know what they don't know.
All of the security problems in the Original Post are challenges to them, because they don't even know anything about it in the first place, nor how to mitigate. What is great though (apparent in those Reddit threads), is that once it is pointed out, they seem to thirst to understand/learn/defend.
1. Is properly secure, to whatever standards will stop people writing "S Stands for Security" articles, and
2. Allows programs implementing it to provide the same set of features the most useful MCPs do now, without turning automatic functionality into one requiring manual user confirmations, and generally without defeating the purpose of the entire idea, and
3. Doesn't involve locking everything down in a proprietary Marketplace with a corporate Gatekeeper.
I'd be interested to see a proposal, because so far all I've seen is "MCP is not sekhure!!!111" in general and non-specific sense. I guess it's not that easy, especially when people forget that security and usefulness are opposing forces.
(Also, AFAIK, MCP was not intended for its implementations to be hosted by third parties and provided "as a Service". If that cannot be secure, then don't do it. Find some other business to be in, instead of trying to nerf MCP through "solving" something that isn't a problem with the protocol.)
That a system is hard to secure doesn't negate the need for it to be secure.
Though I agree about third-party MCP services. They're in a weird spot and I'm not sure that they're viable for many use cases.
> That a system is hard to secure doesn't negate the need for it to be secure.
Correct. However, security is a spectrum - there's such a thing that "secure enough", especially when making it more secure eliminates the very reason for system's existence. Additionally, we can and should secure different parts of a system to a different degree.
For an analogy, consider utensils and workshop tools. We secure them as much as we can against accidents, but not so much as to make the tool worse at its job. We add further security by means like access controls, or laws making people responsible for use and misuse, etc. - i.e. we're making the larger system secure, without burdening the inner core.
(For comparison, fully secure version of utensils and all kinds of tools are also available on the market - you'll find them in toy stores.)
I don't think this is really an MCP problem, it's more of an untrusted-entity problem.
I'm a little surprised there is so much hype for MCP rather than just "put your tools behind a web service with good machine-readable documentation, and agents can use them easily".
Invariant blog post mentions this:
> Conclusion: Agents require extensive, highly-contextual guardrailing and security solutions
> As one of our core missions at Invariant, we absolutely cannot stress enough how important it is to rely on extensive guardrailing with AI models and their actions. We come to this conclusion repeatedly, as part of our research and engineering work on agentic systems. The MCP ecosystem is no exception to this rule. Security must be implemented end-to-end, including not only the tool descriptions but also the data that is being passed to and from the AI model.
B. Version the tool descriptions so that they can be pinned and do not change (same way we do for libraries and APIs).
C. Maybe in future, LLMs can implement some sort of "instruction namespacing" - where the developer would be able to say any instruction in this prompt is only applicable when doing X, Y, Z.
This is far better than designing an entirely new protocol, as ActivityPub and Mastodon already have everything you need, including an API.
Now, that's just transport security. If you expose a server that will execute arbitrary commands, nothing can protect you.
On "Zero reuse of existing API surfaces", I read this insightful Reddit comment on what an LLM-Tool API needs and why simply OpenAPI is not enough [1].
On "Too Many Options"... at the beginning of this week, I wrote an MCP server and carefully curated/coded a MCP Tool surface for it. By my fourth MCP server at the end of the week, I took a different approach and just gave a single "SQL query" endpoint but with tons of documentation about the table (so it didn't even need to introspect). So less coding, more prose. For the use case, it worked insanely well.
I also realized then that my MCP server was little more than a baked-in-data-plus-docs version of the generalized MotherDuck DuckDB MCP server [2]. I expect that the power will be in the context and custom prompts I can provide in my MCP server. Or the generalized MCP servers need to provide configs to give more context about the DBs you are accessing.
[1] https://www.reddit.com/r/mcp/comments/1jr8if3/comment/mlfqkl... [2] https://github.com/motherduckdb/mcp-server-motherduck
Still, I think it should only be an option, not a necessity to create an MCP API around existing APIs. Sure, you can do REST APIs really badly and OpenAPI has a lot of issues in describing the API (for example, you can't even express the concept of references / relations within and across APIs!).
REST APIs also don't have to be generic CRUD, you could also follow the DDD idea of having actions and services, that are their own operation, potentially grouping calls together and having a clear "business semantics" that can be better understood by machines (and humans!).
My feeling is that MCP also tries to fix a few things, we should consider fixing with APIs in general - so at least good APIs can be used by LLMs without any indirections.
Let's say you have MCP server that allows modification of local file system and MCP server that modifies objects in cloud storage. How does the user make sure LLM agent makes the correct choice?
You want to give lot of options and not babysit every action, but when you do there is possibility that more things go wrong.
How should we define the security interaction boundary between LLMs and development environments? This question has different best practices in various application scenarios, and is worth our continued exploration.
How can we fall into this _every single time_.
Once one of those exploits are executed, your keys, secrets and personal configs are as good as donated to someone else's server and also sent back to the LLM provider.
This shows that we can also see how dangerous widely used commands like curl | bash can be, despite the warnings and security risks.
The specification might as well have been vibe-coded.
Tangent: as a logged-in Medium user on mobile safari, I couldn't get the link to resolve to the post's article -- nor even find it by searching medium. I had to use a different browser and hit medium as an uncredentialled visitor.
I wonder if any AI coding tools will do similar things like curl rando scripts from the web and execute them.
It’s an easy tell for LLM-driven code because, to a seasoned engineer, it’ll always look like a strange solution to something, like handling auth or setting cookies or calling a database, that has been a done deal for a long time.
- internal: possibly rogue MCPs: as MCPs are opaque to the user and devs don't take the time to look at the source-code , and even then would need to pinpoint each inspected version.
- external: LLM agent poisoning
> There’s no mechanism to say: “this tool hasn’t been tampered with.” And users don’t see the full tool instructions that the agent sees.
This is true, but also generally true of any npm dependency that developers blindly trust.
The main difference with MCP is that it is pitched as a sort of extension mechanism (akin to browser extensions), but without the isolation/sandboxing that browser extensions have, and that even if you do run them in sandboxes there is a risk of prompt injection attacks.
It’s different in that it’s designed to provide natural language instructions to LLMs and is a pretty open-ended protocol. It’s not like the Language Server Protocol which has all of its use cases covered in the spec. MCP gives just a little bit of structure but otherwise is built to be all things for all people. That makes it a bit hard to parse when reading the docs. I think they certainly could do a better job in communicating its design though.
One aspect I 'missed' the first few times I read over the spec was the 'sampling' feature on the client side which, for anyone that hasn't read the spec, is a way for the MCP Client to expose an LLM endpoint to the MCP Server for whatever the server may need to do.
Additionally, I feel like understanding around the MCP Server 'prompts' feature is also a bit light.
Overall, MCP is exciting conceptually (when combined with LLM Tool Support), but it's still a fast-moving space and there will be a lot of growing pains.
Anyway, sounds like we'll see a v2 and v3 and such of the protocol before long, to deal with some of the issues in the article.
My vacuum cleaner can access any service on my network. Maybe not the best idea. I tried to segment the network once, but it was problematic to say the least. Maybe we should learn that security must not be an afterthought instead.
i chose OCI format for plugin packaging in my hyper-mcp project in order to leverage all the security measurements we have with OCI like image signing, image verification etc...
i chose wasm to sandbox each plugin so that they have no network or filesystem access by default
Yes, running unsafe bash commands in the implementation can be prevented by sandboxing. Instruction level attacks like tool poisoning, cannot be prevented like this, since they are prompt injections and hijack the executing LLM itself, to perform malicious actions.
https://github.com/orgs/modelcontextprotocol/discussions https://github.com/modelcontextprotocol/specification
There's your problem. USB-C is notoriously confusing.
i'll let the ML experts make the tools that i use, but also have good job prospects because infosec fundamentals will always be needed. especially if we have product managers like the sibling commenter who is certain that he will have an AI product that changes the infosec world, even though the person has zero background in security. to me, that is even more encouraging about the job security.
also, the herd is going to AI. maybe i'm a contrarian, unfortunately, but that seems like a good signal to not follow the herd, but instead get into something that the herd will need more of.
"Master Control Program" was an operating system for Burroughs mainframes in the 1960s and 70s. That is probably where Tron got the name.
In the '90s, I used another "MCP" on the Amiga: it was a "commodity" that tweaked and patched things, similar to PowerToys on MS-Windows. And I think the author has said that he got the name from Tron.
But that is true for every third party code on your systems all the time.
I mean - if they can't get me trough browser extension, vs code extensions, node modules, python modules, some obscure executables, open source apps, wordpress plugins and various jolly things on the servers and workstations that have zero days in them - they will craft malicious extension to llm that I will somehow get to host it.