MCP-B: A Protocol for AI Browser Automation (opens in new tab)

(mcp-b.ai)

336 pointsmiguelspizza11mo ago184 comments

184 comments

118 comments · 34 top-level

jacquesm11mo ago· 14 in thread

Prediction: this will go the same way as RSS. Companies don't like you to be in control of how you use their data.

Indeed. Though I guess a better example would be: it'll go the same way as REST APIs (which happen to be fundamentally the same thing as MCP anyway).

Remember the time when REST was the new hot thing, everyone started doing API-first design, and people thought it'll empower people by letting programs navigate services for them programmatically? Remember when "mashups" were the future?

It all died before it could come to pass, because businesses quickly remembered that all their money comes specifically from denying users those capabilities.

throwawaymaths11mo ago

REST did not die. it mostly became a mechanism for business managers to separate concerns between frontend and backend.

i wonder if mcp will become, "let the project people talk to the backend team and the frontend team separately and the AI will figure out the middle"

1 more reply

xienze11mo ago

> and people thought it'll empower people by letting programs navigate services for them programmatically?

I don’t think that concept died because of nefarious business-related reasons but rather that building true HATEOAS APIs is hard and the utility of “automatically navigable APIs” is quite limited. It’s a neat trick to point a generic API client at an API and crawl it automatically, but hardly anyone consumes APIs that way. We read the API docs and construct integrations suited to the task at hand, manually.

1 more reply

deepdarkforest11mo ago

I don't know about that. Zapier and automation apps were huge before agents, or even for integrations for Slack. There is definitely a big portion of tech products that have mutual benefits by providing good APIs to be in the same bubble

1 more reply

pyuser58311mo ago

How is MCP doing the same as REST?

I’m a REST developer learning MCP, and most of my effort is spent finding anything new to learn.

So I’m not suprised by this statement, but I’m a bit startled.

How are they the same thing?

sneak11mo ago

REST and MCP aren’t fundamentally the same thing. MCP is JSON-RPC, and includes special methods that allow you to enumerate the various functions and their signatures. REST apis have none of that, and use different verbs. JSON-RPC is always POST (which kills cacheability for common reads, unfortunately).

SquareWheel11mo ago

Isn't RSS a smashing success? I changed readers after Google Reader died, but otherwise, my feeds have been working seamlessly for nearly 20 years. I rarely meet a site with updates that doesn't support RSS.

bayindirh11mo ago

Until recently, many sites had RSS functionality because the infrastructure they are using provided some of automated RSS generation. Also, many sites stopped providing "full-content" RSS feeds, but gave pointers to the website itself to drive clicks.

"Real" RSS gives you the whole content. The blog platform I use does this, for example. They are not greedy people and just want to provide a blog platform so, they use the thing as it's supposed to be.

xnx11mo ago

Twitter, Instagram, TikTok, craigslist, eBay, Amazon, etc.

latexr11mo ago

> Prediction: this will go the same way as RSS.

Meaning what? RSS remains ubiquitous. It’s rare to find a website which doesn’t support it, even if the owners don’t realise it or link to it on their page. RSS remains as useful as it ever was. Even if some websites only share partial post content via RSS, it’s still useful to know when they are available (and can be used as an automation hook to get the full thing).

RSS is alive and well. It’s like if you wrote “this will go the same way as the microwave oven”.

57473m3n7Fur7h311mo ago

The built-in RSS reader in Firefox was removed. (But extensions exist to add RSS reader to Firefox.)

Google killed Google Reader. (Other products exist you can use instead.)

Facebook removed support for RSS feeds. (You can replace it with third party tools or API calls.)

It’s not dead dead, but it did seem to lose some momentum and support over time on several fronts.

2 more replies

wkat424211mo ago

It doesn't matter. Soon the AI will be able to click and scroll like a normal user. It's going to be another arms race.

theptip11mo ago

Maybe, but the market structure has inverted and the big guys now want to be in the intelligence layer, not content. (Content is being commoditized.)

Google can still sell ads as long as they own the eyeballs and the intelligence that’s engaging them.

Google did not want you using RSS because it cut out Google Search.

worldsayshi11mo ago

Unless it becomes useful enough that customers will go through the hassle of switching to companies that are "AI-ready".

fzysingularity11mo ago· 8 in thread

The contributions for the Github project is quite intriguing: https://github.com/MiguelsPizza/WebMCP/graphs/contributors

MiguelsPizza | 3 commits | 89++ | 410--

claude | 2 commits | 31,799++ | 0--

miguelspizzaOP11mo ago

I did some git history re-visioning when I closed sourced the extension for a bit. So these are not super accurate. Claude code did write about 85% of the code though.

Simon_O_Rourke11mo ago

How can you figure out that percentage? The commit logs?

1 more reply

fzysingularity11mo ago

Nice!

1 more reply

efitz11mo ago

You’re going to see this pattern a lot more in the future.

consumer45111mo ago

Claude's contributions graph is interesting. What is going on here? Does Claude Code commit as itself sometimes, but extremely rarely? I don't understand.

https://github.com/claude

handfuloflight11mo ago

If you ask it to commit it'll sign itself as the author.

2 more replies

gubicle11mo ago

That doesn't look right... if you look at the actual commits, they are all from

MiguelsPizza / Alex Nahas

https://github.com/MiguelsPizza/WebMCP/commits/main/

byteknight11mo ago

He rewrote history to hide it?

He admits it here https://news.ycombinator.com/item?id=44516104

throwanem11mo ago· 8 in thread

> If I asked you to build a table and gave you a Home Depot you probably would have a harder time than if I gave you a saw, a hammer and some nails.

I doubt that, first and not least because Home Depot stocks lumber.

bobmcnamara11mo ago

Home Depot also sells tables.

devoutsalsa11mo ago

Haha, that’s pretty close to any software I’ve ever written. Suck in a ton of open source dependencies, write comparatively little, and say “look what I made!” Buying a table, adorning it with vase of fake flowers, and claiming to be a Senior Woodworking Engineer sounds about right. I’ll be a Principal after buying a new bed frame & putting a mattress on it.

1 more reply

throwanem11mo ago

Not good ones. But in any case the spec was not to provide a table, was it?

stuartjohnson1211mo ago

import table

table()

leptons11mo ago

You're supposed to "hallucinate" the lumber.

latexr11mo ago

And, I imagine, Home Depot might have better and more precision tools available, plus professionals who know how to use them.

miguelspizzaOP11mo ago

Fixed. Nice catch

throwanem11mo ago

Well, I've built tables before.

slt202111mo ago· 6 in thread

Could all of this be replaced simply by publishing OpenAPI (Swagger) spec and using universal swagger mcp client ???

This basically leaves up to the user to establish authenticated session manually.

Assuming claude is smart enough to pick up API key from prompt/config, and can use swagger based api client, wouldnt that be the same?

miguelspizzaOP11mo ago

That was everyone's first thought when MCP came out. Turns out it doesn't work too well since there is generally too many tools. People are doing interesting work in this space though

randomaifreak11mo ago

Yeah agreed. Tool overload is quite problematic. And then having to interact with the api for each website and their tools and possibly clashing tool names isnt ideal.

nilslice11mo ago

pls don't put an api key in a prompt

loandbehold11mo ago

It may or may not be an issue. It's ok to give it API key for test/qa system but probably not for prod.

loandbehold11mo ago

I found i can have Claude Code consume API just by giving it link to swagger.json in CLAUDE.md. it's very useful for adhoc testing.

efitz11mo ago

Do it.

muratsu11mo ago· 6 in thread

This puts the burden on the website owner. If I go through the trouble of creating and publishing an MCP server for my website, I assume that through some directory or method I'll be able to communicate that with consumers (browsers & other clients). It would be much more valuable for website owners if you can automate the MCP creation & maintenance.

mindwok11mo ago

Pretty much every revolution in how we do things originates from the supplier. When websites became a thing the burden was on businesses to build them. Same with REST APIs. Same with mobile apps. As soon as there’s a competitive advantage to having the new thing, companies will respond if consumers demand it.

gavmor11mo ago

Am I going to start to choose products based on their compatibility with WebMCP?

1 more reply

miguelspizzaOP11mo ago

I think with AI tools you can pretty confidently build out an MCP server for your existing website. I plan to have good LLM docs for this very purpose.

For react in particular, lots of the form ecosystem (react hook form) can be directly ported to MCP tools. I am currently working on a zero config react hook form integration.

But yes, MCP-B is more "work" than having the agent use the website like a user. The admission here is that it's not looking like models will be able to reliably do browser automation like humans for a while. Thus, we need to make an effort to build out better tooling for them (at least in the short term)

rapind11mo ago

I think this is the practical way. The website owner (or rather the builder, since if you're running wordpress, we can assume MCP will be part of the package) is already responsible for the human interface across many devices, and also the search engine interface (robots.txt, sitemap.xml, metatags). Having a standard we can use to curate what the AI sees and how it can interact would be hugely beneficial.

There's space for both IMO. The more generic tool that figures it out on it's own, and the streamlined tool that accesses a site's guiderails. There's also the backend service of course which doesn't require the browser or UI, but as he describes this entails complexity around authentication and I would assume discoverability.

muratsu11mo ago

I agree with you that platforms like wordpress, shopify etc will likely ship MCP extensions to help with various use cases. Accompanied with a discovery standard similar to llms.txt, I think it will be beneficial too. My only argument is that platforms like this are also the most "templated" designs and it's already easy for AI to navigate them (since dom structure variance is small).

The bigger challenge I think is figuring out how to build MCPs easily for SaaS and other legacy portals. I see some push on the OpenAPI side of things which is promising but requires you to make significant changes to existing apps. Perhaps web frameworks (rails, next, laravel, etc) can agree on a standard.

1 more reply

mfrye011mo ago

I was thinking the same. Forward thinking sites might add this, but the vast majority of website owners probably wouldn't be able to figure this out.

Some middle ground where an agent reverse engineers the api as a starting point would be cool, then is promoted to use the "official" mcp api if a site publishes it.

mupuff123411mo ago· 6 in thread

I still don't understand MCP. If according to all the AI companies soon AI will replace devs than why bother with MCP?

qayxc11mo ago

Lock-in. LLMs are today's hammer: everything looks like a nail now. LLMs are super useful for certain tasks (generating boilerplate code, generating tests, providing examples for API usage, summarising etc.), but the demo to me just illustrates a solution in desperate search for a problem. "Create A TODO" using a chatbot? That's an example gone wrong in so many ways and goes to show what happens if you start with a solution and work your way backwards to a use case without actually thinking about it yourself...

dominicrose11mo ago

A todo list is already a productivity tool and not an essential one. When I hear about productivity I can't help but think "but be productive doing what?"

What do we have to do that's so important we need AI, and not a chat AI but AI on steroids (supposedly)?

tracerbulletx11mo ago

It's pretty much standardizing on a couple endpoints for providing a list of resources/actions/prompt templates and calling to fetch those resources/actions/templates and feed them to the model context. It's really kind of trivial, but it's nice there's a standard I guess so you can write a service that anyone can use in their favorite client.

mupuff123411mo ago

I think that also known by another word - "documentation".

volkandkaya11mo ago

Either AI will replace everyone and it doesn't matter what we did up until that point or it won't and building these systems will be useful.

What is your recommendation for companies? To take it to the extreme are you saying fire everyone and wait for AI?

surrealistic11mo ago

Because we're in the denial phase, doing expert systems all over again but this time on top of something that looks like NLP but isn't quite there.

mehdibl11mo ago· 4 in thread

From the blog post:

"The Auth problem At this point, the auth issues with MCP are well known. OAuth2.1 is great, but we are basically trying to re-invent auth for agents that act on behalf of the user. This is a good long term goal, but we are quickly realizing that LLM sessions with no distinguishable credentials of their own are difficult to authorize and will require a complete re-imagining of our authorization systems. Data leakage in multi-tenant apps that have MCP servers is just not a solved problem yet.

I think a very strong case for MCP is to limit the amount of damage the model can do and the amount of data it will ever have access to. The nice thing about client side APIs in multi-tenant apps is they are hopefully already scoped to the user. If we just give the model access to that, there's not much damage they can do.

It's also worth mentioning that OAuth2.1 is basically incompatible with internal Auth at Amazon (where I work). I won't go to much into this, but the implications of this reach beyond Amazon internal."

1. Oauth is not working in Amazon ==> need solution.

2. Oauth are difficult to authorize

3. limit the amount of damage the model can do WHILE "ulti-tenant apps is they are hopefully already scoped to the user".

I feel from a security side there is an issue here in this logic.

Oauth for apps can be far more tuned than current web user permission as usually, user have modification permission, that you may not want to provide.

Oauth not implemented in Amazon, is not really an issue.

Also this means you backdoor the App with another APP you establish trust with it. ==> This is a major no go for security as all actions on MCP app will be logged in the same scope as USER access.

You might just copy your session ID/ Cookie and do the same with an MCP.

I may be wrong the idea seem intersting but from a security side, I feel it's a bypass that will have a lot of issues with compliance.

arkh11mo ago

Wait, isn't it already part of Oauth?

> https://datatracker.ietf.org/doc/html/rfc8693#name-delegatio...

miguelspizzaOP11mo ago

Not sure I understand. The model has no more access than the user does. proper security implementation still lies with the website owner

tehryanx11mo ago

I think the point is that you shouldn't be giving the agent the same privileges as the user. This is one of the biggest issues with how people are using agents rn imo. The agent should be treated as an untrusted user in your client, given restricted privileges scoped to only the exact access they need to perform a given task.

2 more replies

ImPostingOnHN11mo ago

When you say "the user", do you mean that if Alice set up the MCP, and Bob, Charlie, and Dave all access it, the MCP will only execute commands as Bob, or Charlie, or Dave, depending on who is accessing it?

orliesaurus11mo ago· 4 in thread

I don't get it from the homepage, feels like Selenium on the browser, since you built it can you explain ?

miguelspizzaOP11mo ago

Similar but also very different. Playwright and Selenium are browser automation frameworks. There is a Playwright-MCP server which let's your agent use Playwright for browser automation.

MCP-B is a different approach. Website owners create MCP servers `inside` their websites, and MCP-B clients are either injected by browser extensions or included in the websites JS.

Instead of visual parsing like Playwright, you get standard deterministic function calls.

You can see the blog post for code examples: https://mcp-b.ai/blogs

mhio11mo ago

A playright-mcp server, or any bidi browser automation, should be equally capable of discovering/injecting and calling the same client JS exposed MCP-B site API?

It's like an OpenAPI definition but for JS/MCP? (outside of the extension to interact with that definition)

1 more reply

c0wb0yc0d3r11mo ago

What differentiates this from something like data-test-id attributes?

1 more reply

Nathanba11mo ago

what do you mean by "visual parsing like Playwright"? I'm pretty sure Playwright queries the DOM via js, there isn't inherently any visual parsing. Do you just mean that mcp-b has dedicated js APIs for each website? Your example is also pretty confusing, it looks like the website itself offers an "Increment by x" "tool" and then your first command to the website is to "subtract two from the count". So the AI model has to still understand the mcp tools offered by the website quite loosely and just calls them as needed? I suppose this is basically like using playwright except it doesn't have to parse the DOM (although it probably still does, I mean how else will it know that the "Increment by X" tool offered is in any way connected to the "count" you mention in your vague prompt. And then the additional benefit is that it can call a js function instead of having to generate the DOM/js playwright calls to do it.

I mean all this MCP stuff certainly seems useful even though this example isn't so good, the bigger uses will be when larger APIs and interactions are offered by the website like "Make a purchase" or "sort a table" and the AI would have to implement very complex set of DOM operations and XHR requests to make that happen and instead of flailing to do that, it can call an MCP tool which is just a js function.

1 more reply

ActorNightly11mo ago· 4 in thread

This MCP stuff is leading dev down the wrong path. We should be focusing on llms using self discovery to figure out information.

teruakohatu11mo ago

I had that opinion too.

You can ask an agent to browse a web page and click a button etc. They will work out how to use a browser automation library.

But it’s not worth the cost, time spent waiting or the inconsistency between implementations.

MCP just offloads that overload, much like how they can use bash tools when they are quite capable of writing an implementation of grep etc.

ActorNightly11mo ago

The whole point is that you shouldn't have to worry about implementation. AI should do it for you.

2 more replies

ashwinsundar11mo ago

    We should be focusing on llms using self discovery to figure out information.

Can you expand? What does that mean, and why is the right (or better) path

ActorNightly11mo ago

Manually coding things is not how we get better AI. For AI to be truly useful in the area of figuring things out (i.e actually reasoning), one of the core components of a model would be building its own knowledge trees across multi modal information. So when you ask a model to do something, it should figure out how to do it on its own.

2 more replies

Abishek_Muthian11mo ago· 3 in thread

I’ve haven’t used any MCP so far but as a disabled person I see use cases in accessibility for MCPs doing browser/smartphone automation.

But any accessibility tool will be exploited by nefarious actors so I wonder how many main stream websites/apps would implement these MCP.

Has anyone tried any MCP for improving accessibility?

krashidov11mo ago

> But any accessibility tool will be exploited by nefarious actors so I wonder how many main stream websites/apps would implement these MCP.

How so?

Abishek_Muthian11mo ago

Android smartphone bot farms use the accessibility service to automate usage of apps.

Audio captchas are often used by bots.

mattlondon11mo ago

Anything that makes it easier to automate will make bad actors more efficient.

So people like ticket sales sites, eBay etc. It will make it easier for those sites to have all the tickets purchased or for auctions to be sniped etc.

FWIU, these sort of sites actually (currently at least) put on measures to try and stop bots using them for these reasons.

1 more reply

Flux15911mo ago· 3 in thread

This is an interesting take since web developers could add mcp tools into their apps rather than having browser agents having to figure out how to perform actions manually.

Is the extension itself open source? Or only the extension-tools?

In theory I should be able to write a chrome extension for any website to expose my own custom tools on that site right (with some reverse engineering of their APIs I assume)?

miguelspizzaOP11mo ago

The extension should be open source. I had it as a private submodule until today. Let me figure out my it's not showing up and get back to you.

The extension itself is a MCP server which can be connected to by other extension over cross extension messaging. Since the extension is part of the protocol, I'd like for the community to pull from the same important parts of the extension (MCPHub, content script) so they are consistent across extension implementations.

miguelspizzaOP11mo ago

Ok it's open source now

Flux15911mo ago

Thanks! Took a very quick look. It seems like the extension exposes tools for all domains that support mcp-b looking at DomainToolManager - does this mean if I have two tabs for a single domain you'll have duplicate tools per tab?

Haven't had enough time to look through all the code there - interesting problem I guess since a single domain could have multiple accounts connected (ex: gmail w/ account 0 vs account 1 in different tabs) or just a single account (ex: HN).

1 more reply

handfuloflight11mo ago· 3 in thread

Would it be possible to do this with any arbitrary website since we can execute JS client side?

miguelspizzaOP11mo ago

Yup! You just declare a standard MCP server and attach a TabServerTransport to it. Any TabClientTransport in the same Tab will be able to connect to it.

The examples focus mostly on extensions injecting clients at website load time, but you can ship a client with your server javascript. That being said, if the client and server live in the the same script I recommend just using the InMemoryTransports from the official SDK.

imcritic11mo ago

Wouldn't sites be able to detect presence of scripts injected by your extension (to, say, refuse you services since site owner decided they would like their site to be used only by humans, not AI agents)?

1 more reply

gavmor11mo ago

Yeah, I am tempted to rig up a "generic" webpage MCP injected via greasemonkey just so I can use this UI for navigating the web.

ethanniser11mo ago· 2 in thread

this is super cool

wonder if it was inspired by `broadcast-mcp` [1] (hackathon project by me and a friend from may based on the same concept but not fleshed out)

1: https://x.com/RhysSullivan/status/1923956444153643443

miguelspizzaOP11mo ago

Ah no, first time seeing this. How were you interacting with the website server? Via extension or some way else?

ethanniser11mo ago

we would open the mcp site in a new tab or iframe then had a custom mcp transport based on `window.postMessage` just like you do https://github.com/RhysSullivan/broadcast-mcp/blob/main/pack...

this concept is awesome- glad someone really fleshed it out

TechDebtDevin11mo ago· 2 in thread

hmm, I have an MCP route, that fetches the page in a browser, returns and lets the LLM inject javascript onto the page to return whatever structured output it desires..Or whatever (kinda scarily). How is this different?

--Shoutout to Go-Rod https://pkg.go.dev/github.com/go-rod/rod@v0.116.2#Page

miguelspizzaOP11mo ago

Cool, I'll check it out!

I'll need to look a bit more, but at a glance, MCP-B is more putting the onus of browser automation (i.e. how the agent will interact with the web page) on the website owner. They get to expose exactly the functionality they want to the agent

TechDebtDevin11mo ago

Oh this is for the website owner. Yeah, mine is to make an arbitrary site interactable with an LLM. It can choose to get a map of the DOM/screenshot/extract by xml path/ and interact via a few different methods. But the PageEval() method from GO rod works pretty well

Would like to just provide a runtime for an LLM to solve captchas.

My main focus is (anti) bot detection.

cryptozeus11mo ago· 2 in thread

can someone explain like I am five?

lovelearning11mo ago

A website owner can publish their website's capabilities or data as "tools". AI agents and LLMs like ChatGPT, in response to user prompts, can consult these tools to figure out their next actions.

Example:

1. An author has a website for their self-published book. It currently checks book availability with their database when add to cart is clicked.

2. The website publishes "check book availability" and "add to cart" as "tools", using this MCP-B protocol.

3. A user instructs ChatGPT or some AI agent to "Buy 3 copies of author's book from https://theirbooksite"

4. The AI agent visits the site. Finds that it's MCP-B compliant. Using MCP-B, it gets the list of available tools. It finds a tool called "check book availability", and uses it to figure out if ordering 3 copies is possible. If yes, it'll next call "add to cart" tool on the website.

The website here is actively cooperating with the agent/LLM and supplying structured data. Instead of being a passive collection of UI elements that AI chatbots have to figure out based on UI layouts or UI captions, which are generally very brittle approaches.

volkandkaya11mo ago

Example:

You have google docs and CMS open in 2 tabs

1. Ask to take your google doc and add it to the CMS

2. MCP tool takes the data from Google docs

3. MCP tool to convert text to CMS item

4. MCP tool to insert that CMS item

With the above you can view unique UIs for each stage as well, such as generating a table with CMS fields before accepting.

SchemaLoad11mo ago· 1 in thread

Not sure who the intended user is here? For frontend testing you actually do somewhat want the tests to break when the UI changes in major ways. And for other automation you'd be better off providing an actual API to use.

nicman2311mo ago

scrappers and me buying milk with a vlm

lewisjoe11mo ago· 1 in thread

This looks promising - thanks for open-sourcing this. This addresses the gap that most work happens in browsers while MCP assumes that work happens with AI clients.

I have a fundamental question though: how is it different from directly connecting my web app's JS APIs with tool calling functions and talking directly with a LLM server with tool-call support?

Is it the same thing, but with a protocol? or am I missing the bigger picture?

miguelspizzaOP11mo ago

Np thanks for reading! The difference is with MCP-B you don't have to integrate or maintain any AI chat functionality yourself.

It's a protocol which allows the user to bring their own model to interact with the tools on your website

p0w3n3d11mo ago· 1 in thread

I can see with my prophetic/logic eyes that free models will start to require captcha because of people start using MCP to automate browsers to use free LLMs. But captchas are ineffective against LLM so LLMs will fight automated LLMs from using them...

Sounds like a very strange world of robots fighting robots

falcor8411mo ago

In the stories, the robots eventually realize that they actually share common goals ...

Johnny_Bonk11mo ago· 1 in thread

So if I'm using claude code and developing a web app, its running on localhost:3000, can I use claude code to basically get ui information, browser console logs and other web dev feedback and useful information? Cause I installed it and added that file but all I see is the 55 tools and 6 apis when i open the browser extension. not the stuff i need. and i also installed the extension tools i think it was called.

miguelspizzaOP11mo ago

Ah maybe I should make that more clear. The web app is an example of a MCP-B server and the extension is a client. When you visit MCP-b.ai with the extension, it's tools will register

miguelspizzaOP11mo ago· 1 in thread

Hey HN,

This was an idea I had while trying to build MCP servers internally at Amazon. Today I am open sourcing it. TLDR it's an extension of the Model Context Protocol which allows you to treat your website as an MCP server which can be discovered and called by MCP-B compliant web extensions.

You can read a more detailed and breakdown here (with gifs): https://mcp-b.ai/blogs

miguelspizzaOP11mo ago

Oh and the code is here: https://github.com/MiguelsPizza/WebMCP

netrem11mo ago· 1 in thread

The product seems interesting, but the landing page I found very chaotic and gave up reading it. The individual pieces of information are fine I think, but the flow is poor and some info repeats. Was it AI generated?

miguelspizzaOP11mo ago

Yes it was mostly AI generated. I'm much more of a dev than a writer/marketer. Hopefully if this gains some traction I can pay someone to clean it up

metta2uall11mo ago· 1 in thread

Looks great. I love ideas that increase efficiency and reduce electricity usage.

Only nitpick is that the home page says "cross-browser" at the bottom but the extension is only available for Chrome..

miguelspizzaOP11mo ago

Ah yea I'll fix that. Nice catch

rapind11mo ago· 1 in thread

This looks great. I'd really like to add something like this to my application (public and admin side). I have users, especially on the admin side, that could really benefit.

miguelspizzaOP11mo ago

Thanks! I'd be happy to help onboard. Let me know!

hereforcomments11mo ago· 1 in thread

RIP QA engineers

volkandkaya11mo ago

Sounds like more work needed for QA engineers, they will have to test that MCP-B works well with a bunch of other sites.

abrookewood11mo ago

Looks similar to Elixir's Tidewave MCP server, which currently also supports Ruby: https://tidewave.ai/

Paraphrasing: Connect your editor's assistant to your web framework runtime via MCP and augment your agentic workflows and chats with: Database integration; Logs and runtime introspection; Code evaluation; and Documentation context.

Edit: Re-reading MCP-B docs, that is more geared towards allowing visitors to your site to use MCP, while Tidewave is definitely focussed on Developers.

xnx11mo ago

AI automation is exciting because it doesn't require any cooperation from the site.

It's nice when a site is user friendly (RSS, APIs, obvious JSON, etc.) but it is more powerful to be self sufficient.

nurettin11mo ago

This gave me an idea. Instead of writing/maintaining servers and whatnot, why not just open the browser and give [$LLM] access to the development port and let it rip using the puppeteer protocol?

devops00011mo ago

I still don’t understand. The browser is made for human. We already invented the API to comunicare between machines. Why a machine should use a UI?

_1tem11mo ago

The entire point of AI Agents is that they should "just work" for websites that don't have APIs. Lots of websites simply have no incentive or resources to provide a good API.

damnever11mo ago

As far as I can tell, "API" and the webpages often have different authentication methods.

ge9611mo ago

Ultimate test for me, make me a payment system where I put in $1 and it gives me $2 back

calrain11mo ago

Do you really want to change 'everything'?

bpiroman11mo ago

Vite ...

roundrobins11mo ago

It's not every day that I catch tomorrow's huge hit today on a random HN post.

Better get ready to quit your day job and get funded buddy, as my 30 years worth of tech instincts tell me this will take off vertically!

1 more reply

j / k navigate · click thread line to collapse

184 comments

118 comments · 34 top-level

jacquesm11mo ago· 14 in thread

Prediction: this will go the same way as RSS. Companies don't like you to be in control of how you use their data.

TeMPOraL11mo ago

Indeed. Though I guess a better example would be: it'll go the same way as REST APIs (which happen to be fundamentally the same thing as MCP anyway).

It all died before it could come to pass, because businesses quickly remembered that all their money comes specifically from denying users those capabilities.

throwawaymaths11mo ago

REST did not die. it mostly became a mechanism for business managers to separate concerns between frontend and backend.

i wonder if mcp will become, "let the project people talk to the backend team and the frontend team separately and the AI will figure out the middle"

1 more reply

xienze11mo ago

> and people thought it'll empower people by letting programs navigate services for them programmatically?

1 more reply

deepdarkforest11mo ago

1 more reply

pyuser58311mo ago

How is MCP doing the same as REST?

I’m a REST developer learning MCP, and most of my effort is spent finding anything new to learn.

So I’m not suprised by this statement, but I’m a bit startled.

How are they the same thing?

sneak11mo ago

SquareWheel11mo ago

bayindirh11mo ago

xnx11mo ago

Twitter, Instagram, TikTok, craigslist, eBay, Amazon, etc.

latexr11mo ago

> Prediction: this will go the same way as RSS.

RSS is alive and well. It’s like if you wrote “this will go the same way as the microwave oven”.

57473m3n7Fur7h311mo ago

The built-in RSS reader in Firefox was removed. (But extensions exist to add RSS reader to Firefox.)

Google killed Google Reader. (Other products exist you can use instead.)

Facebook removed support for RSS feeds. (You can replace it with third party tools or API calls.)

It’s not dead dead, but it did seem to lose some momentum and support over time on several fronts.

2 more replies

wkat424211mo ago

It doesn't matter. Soon the AI will be able to click and scroll like a normal user. It's going to be another arms race.

theptip11mo ago

Maybe, but the market structure has inverted and the big guys now want to be in the intelligence layer, not content. (Content is being commoditized.)

Google can still sell ads as long as they own the eyeballs and the intelligence that’s engaging them.

Google did not want you using RSS because it cut out Google Search.

worldsayshi11mo ago

Unless it becomes useful enough that customers will go through the hassle of switching to companies that are "AI-ready".

fzysingularity11mo ago· 8 in thread

The contributions for the Github project is quite intriguing: https://github.com/MiguelsPizza/WebMCP/graphs/contributors

MiguelsPizza | 3 commits | 89++ | 410--

claude | 2 commits | 31,799++ | 0--

miguelspizzaOP11mo ago

I did some git history re-visioning when I closed sourced the extension for a bit. So these are not super accurate. Claude code did write about 85% of the code though.

Simon_O_Rourke11mo ago

How can you figure out that percentage? The commit logs?

1 more reply

fzysingularity11mo ago

Nice!

1 more reply

efitz11mo ago

You’re going to see this pattern a lot more in the future.

consumer45111mo ago

Claude's contributions graph is interesting. What is going on here? Does Claude Code commit as itself sometimes, but extremely rarely? I don't understand.

https://github.com/claude

handfuloflight11mo ago

If you ask it to commit it'll sign itself as the author.

2 more replies

gubicle11mo ago

That doesn't look right... if you look at the actual commits, they are all from

MiguelsPizza / Alex Nahas

https://github.com/MiguelsPizza/WebMCP/commits/main/

byteknight11mo ago

He rewrote history to hide it?

He admits it here https://news.ycombinator.com/item?id=44516104

throwanem11mo ago· 8 in thread

> If I asked you to build a table and gave you a Home Depot you probably would have a harder time than if I gave you a saw, a hammer and some nails.

I doubt that, first and not least because Home Depot stocks lumber.

bobmcnamara11mo ago

Home Depot also sells tables.

devoutsalsa11mo ago

1 more reply

throwanem11mo ago

Not good ones. But in any case the spec was not to provide a table, was it?

stuartjohnson1211mo ago

import table

table()

leptons11mo ago

You're supposed to "hallucinate" the lumber.

latexr11mo ago

And, I imagine, Home Depot might have better and more precision tools available, plus professionals who know how to use them.

miguelspizzaOP11mo ago

Fixed. Nice catch

throwanem11mo ago

Well, I've built tables before.

slt202111mo ago· 6 in thread

Could all of this be replaced simply by publishing OpenAPI (Swagger) spec and using universal swagger mcp client ???

This basically leaves up to the user to establish authenticated session manually.

Assuming claude is smart enough to pick up API key from prompt/config, and can use swagger based api client, wouldnt that be the same?

miguelspizzaOP11mo ago

That was everyone's first thought when MCP came out. Turns out it doesn't work too well since there is generally too many tools. People are doing interesting work in this space though

randomaifreak11mo ago

Yeah agreed. Tool overload is quite problematic. And then having to interact with the api for each website and their tools and possibly clashing tool names isnt ideal.

nilslice11mo ago

pls don't put an api key in a prompt

loandbehold11mo ago

It may or may not be an issue. It's ok to give it API key for test/qa system but probably not for prod.

loandbehold11mo ago

I found i can have Claude Code consume API just by giving it link to swagger.json in CLAUDE.md. it's very useful for adhoc testing.

efitz11mo ago

Do it.

muratsu11mo ago· 6 in thread

mindwok11mo ago

gavmor11mo ago

Am I going to start to choose products based on their compatibility with WebMCP?

1 more reply

miguelspizzaOP11mo ago

I think with AI tools you can pretty confidently build out an MCP server for your existing website. I plan to have good LLM docs for this very purpose.

For react in particular, lots of the form ecosystem (react hook form) can be directly ported to MCP tools. I am currently working on a zero config react hook form integration.

rapind11mo ago

muratsu11mo ago

1 more reply

mfrye011mo ago

I was thinking the same. Forward thinking sites might add this, but the vast majority of website owners probably wouldn't be able to figure this out.

Some middle ground where an agent reverse engineers the api as a starting point would be cool, then is promoted to use the "official" mcp api if a site publishes it.

mupuff123411mo ago· 6 in thread

I still don't understand MCP. If according to all the AI companies soon AI will replace devs than why bother with MCP?

qayxc11mo ago

dominicrose11mo ago

A todo list is already a productivity tool and not an essential one. When I hear about productivity I can't help but think "but be productive doing what?"

What do we have to do that's so important we need AI, and not a chat AI but AI on steroids (supposedly)?

tracerbulletx11mo ago

mupuff123411mo ago

I think that also known by another word - "documentation".

volkandkaya11mo ago

Either AI will replace everyone and it doesn't matter what we did up until that point or it won't and building these systems will be useful.

What is your recommendation for companies? To take it to the extreme are you saying fire everyone and wait for AI?

surrealistic11mo ago

Because we're in the denial phase, doing expert systems all over again but this time on top of something that looks like NLP but isn't quite there.

mehdibl11mo ago· 4 in thread

From the blog post:

1. Oauth is not working in Amazon ==> need solution.

2. Oauth are difficult to authorize

3. limit the amount of damage the model can do WHILE "ulti-tenant apps is they are hopefully already scoped to the user".

I feel from a security side there is an issue here in this logic.

Oauth for apps can be far more tuned than current web user permission as usually, user have modification permission, that you may not want to provide.

Oauth not implemented in Amazon, is not really an issue.

Also this means you backdoor the App with another APP you establish trust with it. ==> This is a major no go for security as all actions on MCP app will be logged in the same scope as USER access.

You might just copy your session ID/ Cookie and do the same with an MCP.

I may be wrong the idea seem intersting but from a security side, I feel it's a bypass that will have a lot of issues with compliance.

arkh11mo ago

Wait, isn't it already part of Oauth?

> https://datatracker.ietf.org/doc/html/rfc8693#name-delegatio...

miguelspizzaOP11mo ago

Not sure I understand. The model has no more access than the user does. proper security implementation still lies with the website owner

tehryanx11mo ago

2 more replies

ImPostingOnHN11mo ago

orliesaurus11mo ago· 4 in thread

I don't get it from the homepage, feels like Selenium on the browser, since you built it can you explain ?

miguelspizzaOP11mo ago

Similar but also very different. Playwright and Selenium are browser automation frameworks. There is a Playwright-MCP server which let's your agent use Playwright for browser automation.

MCP-B is a different approach. Website owners create MCP servers `inside` their websites, and MCP-B clients are either injected by browser extensions or included in the websites JS.

Instead of visual parsing like Playwright, you get standard deterministic function calls.

You can see the blog post for code examples: https://mcp-b.ai/blogs

mhio11mo ago

A playright-mcp server, or any bidi browser automation, should be equally capable of discovering/injecting and calling the same client JS exposed MCP-B site API?

It's like an OpenAPI definition but for JS/MCP? (outside of the extension to interact with that definition)

1 more reply

c0wb0yc0d3r11mo ago

What differentiates this from something like data-test-id attributes?

1 more reply

Nathanba11mo ago

1 more reply

ActorNightly11mo ago· 4 in thread

This MCP stuff is leading dev down the wrong path. We should be focusing on llms using self discovery to figure out information.

teruakohatu11mo ago

I had that opinion too.

You can ask an agent to browse a web page and click a button etc. They will work out how to use a browser automation library.

But it’s not worth the cost, time spent waiting or the inconsistency between implementations.

MCP just offloads that overload, much like how they can use bash tools when they are quite capable of writing an implementation of grep etc.

ActorNightly11mo ago

The whole point is that you shouldn't have to worry about implementation. AI should do it for you.

2 more replies

ashwinsundar11mo ago

    We should be focusing on llms using self discovery to figure out information.

Can you expand? What does that mean, and why is the right (or better) path

ActorNightly11mo ago

2 more replies

Abishek_Muthian11mo ago· 3 in thread

I’ve haven’t used any MCP so far but as a disabled person I see use cases in accessibility for MCPs doing browser/smartphone automation.

But any accessibility tool will be exploited by nefarious actors so I wonder how many main stream websites/apps would implement these MCP.

Has anyone tried any MCP for improving accessibility?

krashidov11mo ago

> But any accessibility tool will be exploited by nefarious actors so I wonder how many main stream websites/apps would implement these MCP.

How so?

Abishek_Muthian11mo ago

Android smartphone bot farms use the accessibility service to automate usage of apps.

Audio captchas are often used by bots.

mattlondon11mo ago

Anything that makes it easier to automate will make bad actors more efficient.

So people like ticket sales sites, eBay etc. It will make it easier for those sites to have all the tickets purchased or for auctions to be sniped etc.

FWIU, these sort of sites actually (currently at least) put on measures to try and stop bots using them for these reasons.

1 more reply

Flux15911mo ago· 3 in thread

This is an interesting take since web developers could add mcp tools into their apps rather than having browser agents having to figure out how to perform actions manually.

Is the extension itself open source? Or only the extension-tools?

In theory I should be able to write a chrome extension for any website to expose my own custom tools on that site right (with some reverse engineering of their APIs I assume)?

miguelspizzaOP11mo ago

The extension should be open source. I had it as a private submodule until today. Let me figure out my it's not showing up and get back to you.

miguelspizzaOP11mo ago

Ok it's open source now

Flux15911mo ago

1 more reply

handfuloflight11mo ago· 3 in thread

Would it be possible to do this with any arbitrary website since we can execute JS client side?

miguelspizzaOP11mo ago

Yup! You just declare a standard MCP server and attach a TabServerTransport to it. Any TabClientTransport in the same Tab will be able to connect to it.

imcritic11mo ago

1 more reply

gavmor11mo ago

Yeah, I am tempted to rig up a "generic" webpage MCP injected via greasemonkey just so I can use this UI for navigating the web.

ethanniser11mo ago· 2 in thread

this is super cool

wonder if it was inspired by `broadcast-mcp` [1] (hackathon project by me and a friend from may based on the same concept but not fleshed out)

1: https://x.com/RhysSullivan/status/1923956444153643443

miguelspizzaOP11mo ago

Ah no, first time seeing this. How were you interacting with the website server? Via extension or some way else?

ethanniser11mo ago

we would open the mcp site in a new tab or iframe then had a custom mcp transport based on `window.postMessage` just like you do https://github.com/RhysSullivan/broadcast-mcp/blob/main/pack...

this concept is awesome- glad someone really fleshed it out

TechDebtDevin11mo ago· 2 in thread

--Shoutout to Go-Rod https://pkg.go.dev/github.com/go-rod/rod@v0.116.2#Page

miguelspizzaOP11mo ago

Cool, I'll check it out!

TechDebtDevin11mo ago

Would like to just provide a runtime for an LLM to solve captchas.

My main focus is (anti) bot detection.

cryptozeus11mo ago· 2 in thread

can someone explain like I am five?

lovelearning11mo ago

A website owner can publish their website's capabilities or data as "tools". AI agents and LLMs like ChatGPT, in response to user prompts, can consult these tools to figure out their next actions.

Example:

1. An author has a website for their self-published book. It currently checks book availability with their database when add to cart is clicked.

2. The website publishes "check book availability" and "add to cart" as "tools", using this MCP-B protocol.

3. A user instructs ChatGPT or some AI agent to "Buy 3 copies of author's book from https://theirbooksite"

volkandkaya11mo ago

Example:

You have google docs and CMS open in 2 tabs

1. Ask to take your google doc and add it to the CMS

2. MCP tool takes the data from Google docs

3. MCP tool to convert text to CMS item

4. MCP tool to insert that CMS item

With the above you can view unique UIs for each stage as well, such as generating a table with CMS fields before accepting.

SchemaLoad11mo ago· 1 in thread

nicman2311mo ago

scrappers and me buying milk with a vlm

lewisjoe11mo ago· 1 in thread

This looks promising - thanks for open-sourcing this. This addresses the gap that most work happens in browsers while MCP assumes that work happens with AI clients.

I have a fundamental question though: how is it different from directly connecting my web app's JS APIs with tool calling functions and talking directly with a LLM server with tool-call support?

Is it the same thing, but with a protocol? or am I missing the bigger picture?

miguelspizzaOP11mo ago

Np thanks for reading! The difference is with MCP-B you don't have to integrate or maintain any AI chat functionality yourself.

It's a protocol which allows the user to bring their own model to interact with the tools on your website

p0w3n3d11mo ago· 1 in thread

Sounds like a very strange world of robots fighting robots

falcor8411mo ago

In the stories, the robots eventually realize that they actually share common goals ...

Johnny_Bonk11mo ago· 1 in thread

miguelspizzaOP11mo ago

Ah maybe I should make that more clear. The web app is an example of a MCP-B server and the extension is a client. When you visit MCP-b.ai with the extension, it's tools will register

miguelspizzaOP11mo ago· 1 in thread

Hey HN,

You can read a more detailed and breakdown here (with gifs): https://mcp-b.ai/blogs

miguelspizzaOP11mo ago

Oh and the code is here: https://github.com/MiguelsPizza/WebMCP

netrem11mo ago· 1 in thread

miguelspizzaOP11mo ago

Yes it was mostly AI generated. I'm much more of a dev than a writer/marketer. Hopefully if this gains some traction I can pay someone to clean it up

metta2uall11mo ago· 1 in thread

Looks great. I love ideas that increase efficiency and reduce electricity usage.

Only nitpick is that the home page says "cross-browser" at the bottom but the extension is only available for Chrome..

miguelspizzaOP11mo ago

Ah yea I'll fix that. Nice catch

rapind11mo ago· 1 in thread

This looks great. I'd really like to add something like this to my application (public and admin side). I have users, especially on the admin side, that could really benefit.

miguelspizzaOP11mo ago

Thanks! I'd be happy to help onboard. Let me know!

hereforcomments11mo ago· 1 in thread

RIP QA engineers

volkandkaya11mo ago

Sounds like more work needed for QA engineers, they will have to test that MCP-B works well with a bunch of other sites.

abrookewood11mo ago

Looks similar to Elixir's Tidewave MCP server, which currently also supports Ruby: https://tidewave.ai/

Edit: Re-reading MCP-B docs, that is more geared towards allowing visitors to your site to use MCP, while Tidewave is definitely focussed on Developers.

xnx11mo ago

AI automation is exciting because it doesn't require any cooperation from the site.

It's nice when a site is user friendly (RSS, APIs, obvious JSON, etc.) but it is more powerful to be self sufficient.

nurettin11mo ago

This gave me an idea. Instead of writing/maintaining servers and whatnot, why not just open the browser and give [$LLM] access to the development port and let it rip using the puppeteer protocol?

devops00011mo ago

I still don’t understand. The browser is made for human. We already invented the API to comunicare between machines. Why a machine should use a UI?

_1tem11mo ago

The entire point of AI Agents is that they should "just work" for websites that don't have APIs. Lots of websites simply have no incentive or resources to provide a good API.

damnever11mo ago

As far as I can tell, "API" and the webpages often have different authentication methods.

ge9611mo ago

Ultimate test for me, make me a payment system where I put in $1 and it gives me $2 back

calrain11mo ago

Do you really want to change 'everything'?

bpiroman11mo ago

Vite ...

roundrobins11mo ago

It's not every day that I catch tomorrow's huge hit today on a random HN post.

Better get ready to quit your day job and get funded buddy, as my 30 years worth of tech instincts tell me this will take off vertically!

1 more reply

j / k navigate · click thread line to collapse