Show HN: Browser MCP – Automate your browser using Cursor, Claude, VS Code (opens in new tab)

(browsermcp.io)

616 pointsnamuorg1y ago217 comments

217 comments

rmac1y ago

[!warning!]

1) this projects' chrome extension sends detailed telemetry to posthog and amplitude:

- https://storage.googleapis.com/cobrowser-images/telemetry.pn...

- https://storage.googleapis.com/cobrowser-images/pings.png

2) this project includes source for the local mcp server, but not for its chrome extension, which is likely bundling https://github.com/ruifigueira/playwright-crx without attribution

super suss

namuorgOP1y ago

Hey, creator of Browser MCP here.

1. Yes, the extension uses an anonymous device ID and sends an analytics event when a tool call is used. You can inspect the network traffic to verify that zero personalized or identifying information is sent.

I collect anonymized usage data to get an idea of how often people are using the extension in the same way that websites count visitors. I split my time between many projects and having a sense of how many active users there are is helpful for deciding which ones to focus on.

2. The extension is completely written by me, and I wrote in this GitHub issue why the repo currently only contains the MCP server (in short, I use a monorepo that contains code used by all my extensions and extracting this extension and maintaining multiple monorepos while keeping them in sync would require quite a bit of work): https://github.com/BrowserMCP/mcp/issues/1#issuecomment-2784...

I understand that you're frustrated with the way I've built this project, but there's really nothing nefarious going on here. Cheers!

asaddhamani1y ago

Hey, as a maker, I get it. You spent time building something, and you want to understand how it gets used. If you're not collecting personal info, there is nothing wrong with this.

Knee-jerk reactions aren't helpful. Yes, too much tracking is not good, but some tracking is definitely important to improving a product over time and focusing your efforts.

Trias111y ago

When people see “I collect” they won’t even bother reading further.

This is showstopper.

Noble reasons won’t matter.

Spyware perception.

wyldberry1y ago

This seems to be the opposite of what happens in reality.

nlarew1y ago

"detailed" is an anonymized deviceId and a counter of tool calls? Heaven forbid an app want to get some basic insights into how people use it.

tomrod1y ago

Correct. Telemetry should _always_ be opt-in and explicitly an easy choice to not engage.

Any other mode of operation is morally bankrupt.

nlarew1y ago

Really? The hyperbole does not help anyone here.

I don't sign a term sheet when I order at McDonalds but you can be damn sure they count how many big macs I order. Does that make them morally bankrupt? Or is it just a normal business operation that is actually totally reasonable?

2 more replies

observationist1y ago

This automatic sense of entitlement to surveil users is the absolute embodiment of the banality of evil.

It's 2025 - we want informed consent and voluntary participation with the default assumption that no, we do not want you watching over our shoulders, and no, you are not entitled to covertly harvest all the data you want and monetize that without notifying users or asking permissions. The whole ToS gotcha game is bullshit, and it's way past time for this behavior to stop.

Ignorance and inertia bolstering the status quo doesn't make it any less wrong to pile more bullshit like this onto the existing massive pile of bullshit we put up with. It's still bullshit.

nlarew1y ago

You're making a huge jump from "gathering anonymous counters to understand how many people use the thing" to "harvest all the data you want and monetize it".

If they were tracking my identity across sites and actually selling it to the highest bidder that's one thing that we'll definitely agree on. This is so so far from that.

You're welcome to build and use your own MCP browser automation if you're so hostile to the developer that built something cool and free for you to use.

1 more reply

arresin1y ago

The only chrome extensions you should install are ones you can build yourself from source.

neycoda1y ago

... And have reviewed and understand completely

EGreg1y ago

So ... pretty much none

Keep in mind, extensions can update themselves at any time, including when they're bought out by someone else. In fact, I bet that's a huge draw... imagine buying an extension that "can read and modify data on all your websites" and then pushing an update that, oh I dunno, exfiltrates everyone's passwords from their gmail. How would most people even catch that?

DO NOT have any extensions running by default except "on click".

There should be at least some kind of static checker of extensions for their calls to fetch or other network APIs. The Web is just too permissive with updating code, you've got eval and much more. It would be great if browsers had only a narrow bottleneck through which code could be updated, and would ask the user first.

(That wouldn't really solve everything since there can be sleeper code that is "switched on" with certain data coming over the wire, but better than what we have now.)

3 more replies

bhouston1y ago

So the website claims:

"Avoids bot detection and CAPTCHAs by using your real browser fingerprint."

Yeah, not really.

I've used a similar system a few weeks back (one I wrote myself), having AI control my browser using my logged in session, and I started to get Captcha's during my human sessions in the browser and eventually I got blocked from a bunch of websites. Now that I've stopped using my browser session in that way, the blocks eventually went away, but be warned, you'll lose access yourself to websites doing this, it isn't a silver bullet.

tempest_1y ago

The caveat with these things is usually "when used with high quality proxies".

Also I assume this extension is pretty obvious so it wont take long for CF bot detection to see it the same as playwrite or whatever else.

unixfox1y ago

The extension enable debugging in your browser (a banner appears telling you about automation). It's possible to detect that in JavaScript.

Hence why projects like this exist: https://github.com/Kaliiiiiiiiii-Vinyzu/patchright. They hide the debugging part from JavaScript.

DeathArrow1y ago

It might depend on the speed with which you click on the elements on the website.

SSLy1y ago

it does, CF bans my own honest to God clicks if I do them too fast.

5 more replies

SkyBelow1y ago

What do you think they might be looking for that could be detected pretty quickly? I'm wondering if it is something like they can track mouse movement and calculate when a mouse is moving too cleanly, so adding some more human like noise to the mouse movement can better bypass the system. Others have mentioned doing too many actions too fast, but what about potential timing between actions. Even if every click isn't that fast, if they have a very consistent delay that would be another non-human sign.

tempoponet1y ago

Modern captchas use a number of tools including many of the approaches you mentioned. This why you might sometimes see a CloudFlare "I am not a robot" checkbox that checks itself and moves along before you have much time to even react. It's looking at a number of signals to determine that you're probably human before you've even checked the box.

dalemhurley1y ago

When I am using keyboard navigation, shortcuts and autofills, I seem to get mistaken for a bot a lot. These Captchas are really bad at detecting bots and really good at falsely labelling humans as bots.

4 more replies

kmacdough1y ago

> I'm wondering if it is something like they can track mouse movement

Yes, this is a big signal they use.

> adding some more human like noise to the mouse

Yes, this is a standard avoidance strategy. Easier said than done. For every new noise generation method, they work on detection. They also detect more global usage patterns and other signals, so you'd need to immitate the entire workflow of being human. At least within the noise of their current models.

econ1y ago

Have a lot of small things count towards the result. Users behave quite linearly, extra points if they act differently all of a sudden.

mrweasel1y ago

There's also the whole issue of captchas being in place because people cannot be trusted to behave appropriately with automation tools.

"Avoids bot detection and CAPTCHAs" - Sure asshole, but understand that's only in place because of people like you. If you truly need access to something, ask for an API, may you need to pay for it, maybe you don't. May you get it, maybe the site owner tells you to go pound sand and you should take that as you're behaviour and/or use case is not wanted.

TeMPOraL1y ago

Actually, the CAPTCHAs are in place mostly because of assholes like you abusing other assholes like you[0].

Most of the automated misbehavior is businesses doing it to other businesses - in many cases, it's direct competition, or a third party the competition outsources it to. Hell, your business is probably doing it to them too (ask the marketing agency you're outsourcing to).

> If you truly need access to something, ask for an API, may you need to pay for it, maybe you don't.

Like you'd give it to me when you know I want it to skip your ads, or plug it to some automation or a streamlined UI, so I don't have to waste minutes of my life navigating your bloated, dog-slow SPA? But no, can't have users be invisible in analytics and operate outside your carefully designed sales funnel.

> May you get it, maybe the site owner tells you to go pound sand and you should take that as you're behaviour and/or use case is not wanted.

Like they have a final say in this.

This is an evergreen discussion, and well-trodden ground. There is a reason the browser is also called "user agent"; there is a well-established separation between user's and server's zone of controls, so as a site owner, stop poking your nose where it doesn't belong.

[0] - Not "you" 'mrweasel personally, but "you" the imaginary speaker of your second paragraph.

mrweasel1y ago

It seems that we have very different types of businesses in mind. I really didn't consider tracking users and displaying ads, but I also don't think this is where these types of tools would be used. Well, they might, but that's as part of some content farm, undesirable bots and downright scams, so nothing of value is really lost if this didn't exist.

If you have a sales funnel, as in you take orders and ship something to a customer, consumer or business, I almost guarantee you that you can request an API, if the company you want to purchase from is large enough. They'll probably give you the API access for free, or as part of a signup fee and give you access to discounts. Sometimes that API might be an email, or a monthly Excel dump, but it's an API.

When we're talking site that purely survive on tracking users and reselling their data, then yes, they aren't going to give you API access. Some sites, like Reddit does offer it I think, but the price is going to be insane, reflecting their unwillingness to interact with users in this way.

> Not "you" 'mrweasel personally

Understood, but thank you :-)

1 more reply

StevenNunez1y ago

I feel like I slept for a day and now MCPs are everywhere... I don't know what MCPs are and at this point I'm too afraid to ask.

oulipo1y ago

It's just a way to provide a "library of methods" / API that the LLM models can "call", so basically giving them method names, their parameters, the type of the output, and what they are for,

and then the LLM model will ask the MCP server to call the functions, check the result, call the next function if needed, etc

Right now if you go to ChatGPT you can't really tell it "open Google maps with my account, search for bike shops near NYC, and grab their phone numbers", because all he can do is reply in text or make images

with a "browser MCP" it is now possible: ChatGPT has a way to tell your browser "open Google maps", "show me a screenshot", "click at that position", etc

mattfrommars1y ago

Isn't the idea of AI agent talking to each by telling LLM model to reply say in, JSON and with some parameter value map to, say function in Python code? That in retrospect, given context {prompt} to LLM will be able to call said function code?

Is this what 'calling' is?

oulipo1y ago

Yes exactly. MCP just formalize this a bit better

throwaway3141551y ago

> with a "browser MCP" it is now possible: ChatGPT has a way to tell your browser "open Google maps", "show me a screenshot", "click at that position", etc

It seems strange to me to focus on this sort of standard well in advance of models being reliable enough to, ya know, actually be able perform these operations on behalf of the user with any sort of strong reliability that you would need for widespread adoption to be successful.

Cryptocurrency "if you build it they'll come" vibes.

2 more replies

dimitri-vs1y ago

You actually can, its called Operator and its a complete waste of time, just like 99% of agents/MCPs.

oulipo1y ago

Operator is basically MCP...

jastuk1y ago

And the worst part is that it opens a pandora's box of potential exploits; https://elenacross7.medium.com/%EF%B8%8F-the-s-in-mcp-stands...

TeMPOraL1y ago

That's not fault of MCP though, that's the fault of vendors peddling their MCPs while clinging to the SaaS model.

Yes, MCP is a way to streamline giving LLMs ability to run arbitrary code on your machine, however indirectly. It's meant to be used on "your side of the airlock", where you trust the things that run. Obviously it's too powerful for it to be used with third-party tools you neither trust nor control; it's not that different than downloading random binaries from the Internet.

I suppose it's good to spell out the risks, but it doesn't make sense blaming MCP itself, because those risks are fundamental aspects of the features it provides.

kmacdough1y ago

It's not blame, but it's a striking reality that needs to be kept at the forefront.

It introduces a substantial set of novel failure modes, like cross-tool shadowing, which aren't obvious to most folks. Making use of any externally developed tooling — even open source tools on internal architecture — requires more careful consideration and analysis than most would expect. Despite the warnings, there will certainly be major breaches on these lines.

joshwarwick151y ago

Most of these are not a real concern with remote servers with Oauth. If you install the PayPal MCP MCP server from im-deffo-not-hacking-you.com than https://mcp.paypal.com/sse its the same sec model as anything else online...

The article also reeks of LLM ironically

tuananh1y ago

it still is. if user has 1 bad tool, it's done!

https://invariantlabs.ai/blog/mcp-security-notification-tool...

1 more reply

halJordan1y ago

At the risk of it sounding like i support theft; the automobile, you know, enabled the likes of Bonnie and Clyde and that whole era of lawlessness. Until the fbi and crossing county lines became a thing.

So im not sure id give up the sum total progress of the automobile just because the first decade was a bad one

orbital-decay1y ago

MCP is a standard to plug useful tools into AI models so they can use them. The concept looks confusingly reversed and non-obvious to a normal person, although devs don't see this because it looks like their tooling.

hedgehog-ai1y ago

I know what you mean, I think MCP is being widely adopted but it's not grassroots.. its a quick entry to this market by an established AI company trying to dominate the mind/market share of developers before consensus can be reached developers.

whalesalad1y ago

It’s RPC specifically for an LLM. But yes it’s the new soup de jour trend sweeping the globe.

andy_ppp1y ago

When I go to a shopping website I want to be able to tell my browser "hey please go through all the sideboards on this list and filter out for the ones that are larger than 155cm and smaller than 100cm, prioritise the ones with dark wood and space for vinyl records which are 31.43cm tall" for example.

Is there any browser that can do this yet as it seems extremely useful to be able to extract details from the page!

mfkhalil1y ago

Hey, we’re working on MatterRank which is pretty similar to this but currently works on web search. (e.g. I want to prioritize results that talk about X and have Y bias and I want to deprioritize those that are trying to sell me something). Feel free to try it out at https://matterrank.ai

Would also be interested in hearing more about what you’re envisioning for your use case. Are you thinking a browser extension that acts on sites you’re already on, or some sort of shopping aggregator that lets you do this, or something else entirely?

Niksko1y ago

Not OP but I definitely sympathise with them. I don't know how practical it is to implement or how profitable it would be, but the problem I often have is this: * I have something I want to buy and have specific needs for it (height, color, shape, other properties) * I know that there's a good chance the website I'm on sells a product that meets those needs (or possibly several such that I'd want to choose from) * my criteria are more specific than the filters available on the site e.g. I want a specific length down to a few cm because I want the biggest thing that will fit in a fixed space * crucially for an AI use case: the information exists on the individual product pages. They all list dimensions and specifications. I just don't want to have to go through them all.

Example: find me all of the desks on IKEA that come in light coloured wood, are 55 inches wide, and rank them from deepest to shallowest. Oh, and make sure they're in stock at my nearest IKEA, or are delivering within the next week.

unixfox1y ago

You could do that with browser-use: https://browser-use.com/

bravura1y ago

When doing interior decoration, I am definitely interested in finding objects that fit very specific prompts.

neilellis1y ago

Well done, just tested on Claude Desktop and it worked smoothly and a lot less clunky than playwright. This is the right direction to go in.

I don't know if you've done it already, but it would be great to pause automation when you detect a captcha on the page and then notify the user that the automation needs attention. Playwright keeps trying to plough through captchas.

thenaturalist1y ago

Crazy, in looking up some info on the web and creating a Spreadsheet on Google Sheets to insert the results, it worked almost perfectly the first time and completely failed subsequently on 8-10 different tries.

Is there an issue with the lag between what is happening in the browser and the MCP app (in my case Claude Desktop)?

I have a feeling the first time I tried it, I was fast enough clicking the "Allow for this chat" permissions, whereas by the time I clicked the permission on subsequent chats, the LLM just reports "It seems we had an issue with the click. Let me try again with a different reference.".

Actions which worked flawlessly the first time (rename a Google spreadsheet by clicking on the title and inputting the name) fail 100% of subsequent attempts.

Same with identifying cells A1, B1, etc. and inserting into the rows.

Almost perfect on 1st try, not reproducible in 100% of attempts afterwards.

Kudos to how smooth this experience is though, very nice setup & execution!

EDIT 2: The lag & speed to click the allow action make it seemingly unusable in Claude Desktop. :(

otherayden1y ago

Such a rich UI like google sheets seems like a bad use case for such a general "browser automation" MCP server. Would be cool to see an MCP server like this, but with specific tools that let the LLM read and write to google sheets cells. I'm sure it would knock these tasks out of the park if it had a more specific abstraction instead of generally interacting with a webpage

mkummer1y ago

Agreed, I'd been working on a Google Sheets specific MCP last week – just got it published here: https://github.com/mkummer225/google-sheets-mcp

rahimnathwani1y ago

This is cool. You should submit this as a 'Show HN'.

Also consider publishing it so people can use it without having to use git.

1 more reply

xingwu1y ago

I have worked on a google sheets MCP, for data scraping it worked pretty well leveraging Claude's built-in search functionalities.

example: https://x.com/xing101/status/1903391600040083488 set up: https://github.com/xing5/mcp-google-sheets

throwaway3141551y ago

What you're experiencing is commonly referred to as "luck". It's the same reason people consistently think newer versions of ChatGPT are nerfed in some way. In reality, people just got lucky originally and have unrealistic expectations based on this originally positive outcome.

There's no bug or glitch happening. It's just statistically unlikely to perform the action you wanted and you landed a good dice roll on your first turn.

weq1y ago

haha yeh as someone who has built automation for years i can agree with this. You cant just click on something in a script, you need to reliably click on something. As a user, its very easy for you to make adjustments like clicking twice on a link if it doesnt load in time. Thats pretty much what your automation suite needs to end up with. A series of a functions to emulate user actions. You then combine that together with your scripts to create reliable scripts that can run in different conditions. LLMs wont do that for you, u need to instruct them specifically.

lizardking1y ago

For me it can't click anywhere on google sheets. I get the following error

--Error: Cannot access a chrome-extension:// URL of different extension

nonethewiser1y ago

Stuff like this makes me giddy for manual tasks like reimbursement requests. Its such a chore (and it doesnt help our process isnt great).

Every month, go to service providers, log in, find and download statement, create google doc with details filled in, download it, write new email and upload all the files. Maybe double chek the attachments are right but that requires downloading them again instead of being able to view in email).

Automating this is already possible (and a real expense tracking app can eliminate about half of this work) but I think AI tools have the potential to elminate a lot of the nittier-grittier specification of it. This is especially important because these sorts of workflows are often subject to little changes.

doug_life1y ago

This may be obvious to most here, but you need Node.js installed for the MCP server to run. This critical detail is not in the set up instructions.

namuorgOP1y ago

Added!

https://docs.browsermcp.io/setup-server#node-js

serverlessmania1y ago

Did something similar but controls a hardware synth, allowing me to do sound design without touching the physical knobs: https://github.com/zerubeus/elektron-mcp

dmix1y ago

Oh good idea.

Imagine it controlling plugins remotely, have an LLM do mastering and sound shaping with existing tools. The complex overly-graphical UIs of VSTs might be a barrier to performance there, but you could hook into those labeled midi mapping interfaces to control the knobs and levels.

Gehinnn1y ago

Would be nice if it could use the Accessibility Tree from chrome dev tools to navigate the page instead of relying on screenshots (https://developer.chrome.com/blog/full-accessibility-tree)

mgraczyk1y ago

In fact you have it backwards. It has no screenshots at the moment, only the accessibility tree

amendegree1y ago

So is MCP the new RPA (Robotics Process Automation)? Like generic yahoo pipes?

spmurrayzzz1y ago

I just view it as a relative minor convenience, but it's not some game-changer IMO.

The tool use / function calling thing far predates Anthropic releasing the MCP specification and it really wasn't that onerous to do before either. You could provide a json schema spec and tell the model to generate compliant json to pass to the API in question. MCP doesn't inherently solve any of the problems that come up in that sort of workflow, but it does provide an idiomatic approach for it (so there's a non-zero value there, but not much).

PantaloonFlames1y ago

It seems the benefit of MCP is for Anthropic to enlist the community in building integrations for Claude desktop, no?

And if other vendors sign on to support MCP, then it becomes a self reinforcing cycle of adoption.

3 more replies

kmangutov1y ago

The interesting thing about MCP as a tool use protocol is the traction that it has garnered in terms of clients and servers supporting it.

wonderwhyer1y ago

I would probably call it shipping containers for LLM tool integrations.

Containers are not a big deal when viewed in isolation. But when its common size/standard for all kinds of ships, cranes and trucks, it is a big deal then.

In that sense its more about gathering community around one way to do things.

In theory there are REST APIs and OpenAPI standard, but those were not made for LLMs but code. So you usually need some kind of friendly wrapper(like for candy) on top of REST API.

It really starts to feel like a a big deal when you work in integrating LLMs with tools.

tmvphil1y ago

I'm a bit stuck on this, maybe you can explain why an LLM would have any difficulty writing REST API calls? Seems like it should be no problem.

ajcp1y ago

No, since MCP is just an interface layer it is to AI what REST API is to DPA and COM/App DLLs are to RPA.

APA (Agentic Process Automation) is the new RPA, and this is definitely one example of it.

XCSme1y ago

But AI already supported function calling, and you could describe them in various ways. Isn't this just a different way to define function calling?

cadence-1y ago

Doesn't work on Windows:

2025-04-07T18:43:26.537Z [browsermcp] [info] Initializing server... 2025-04-07T18:43:26.603Z [browsermcp] [info] Server started and connected successfully 2025-04-07T18:43:26.610Z [browsermcp] [info] Message from client: {"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"claude-ai","version":"0.1.0"}},"jsonrpc":"2.0","id":0} node:internal/errors:983 const err = new Error(message); ^

Error: Command failed: FOR /F "tokens=5" %a in ('netstat -ano ^| findstr :9009') do taskkill /F /PID %a at genericNodeError (node:internal/errors:983:15) at wrappedFn (node:internal/errors:537:14) at checkExecSyncError (node:child_process:882:11) at execSync (node:child_process:954:15)

namuorgOP1y ago

Can you try again?

There was another comment that mentioned that there's an issue with port killing code on Windows: https://news.ycombinator.com/item?id=43614145

I just published a new version of the @browsermcp/mcp library (version 0.1.1) that handles the error better until I can investigate further so it should hopefully work now if you're using @browsermcp/mcp@latest.

FWIW, Claude Desktop currently has a bug where it tries to start the server twice, which is why the MCP server tries to kill the process from a previous invocation: https://github.com/modelcontextprotocol/servers/issues/812

cadence-1y ago

It's working now with the 0.1.0 for me. But I will let you know if I experience any issues once I get updated to 0.1.1.

Thanks, great job! I like it overall, but I noticed it has some issues entering text in forms, even on google.com. It's able to find a workaround and insert the searched text in the URL, but it would be nice if the entry into forms worked well for UI testing.

cadence-1y ago

I was able to make it work like this:

1. Kill your Claude Desktop app

2. Click "Connect" in the browser extension.

3. Quickly start your Calude Desktop app.

It will work 50% of the time - I guess the timing must be just right for it to work. Hopefully, the developers can improve this.

Now on to testing :)

josefrichter1y ago

What I used this for:

"Go to https://news.ycombinator.com/upvoted?id=josefrichter, summarize what topics I am interested in, and then from the homepage pick articles I might be interested in."

Works like a charm.

washedDeveloper1y ago

Can you add a license to your code along with open sourcing the chrome extension?

makingstuffs1y ago

I don't see how an MCP can be useful for browsing the net and doing things like shopping as has been suggested. Large companies such as CloudFlare have spent millions on, and made a business from, bot detection and blocking.

Do we suppose they will just create a backdoor to allow _some_ bots in? If they do that how long will it be before other bots impersonate them? It seems like a bit of a fad from my small mind.

Suppose it does become a thing, what then? We end up with an internet which is heavily optimised for bots (arguably it already is to an extent) and unusable for humans?

Wild.

kraftman1y ago

There are already plenty of services that provide residential proxies and captcha bypass pretty cheaply.

https://brightdata.com/pricing/web-unlocker https://2captcha.com/pricing

TeMPOraL1y ago

> Suppose it does become a thing, what then? We end up with an internet which is heavily optimised for bots (arguably it already is to an extent) and unusable for humans?

As opposed to the Web we now have, which is heavily optimized for... wasting human life.

What you're asking for, what "large companies such as CloudFlare have spent millions on", is verifying that on the other end of the connection is a web browser, and behind that web browser there is a human being that's being made to needlessly suffer and waste their limited lifespans, as they tediously work their way through the UI maze like a good little lab rat, watching ads at every turn of the corridor, while being constantly surveilled.

Or do you believe there is some other reason why you should care about whether you're interacting with a "human" (really: an user agent called "web browser") vs. "not human" (really: any other user agent)?

The relationship between the commercial web and its users is antagonistic - businesses make money through friction, by making it more difficult for users to accomplish their goals. That's why we never got the era of APIs and web automation for users. That's why we're dealing with tons of bespoke shitty SPAs instead of consistent interfaces - because no store wants to make it easy for you to comparison-shop, or skip their upsells, or efficiently search through the stock; no news service wants you to skip ads or make focused searches, etc.

As users, we've lost the battle for APIs and continue to be forced to use the "manual web" (with active cooperation of the browser vendors, too). MCP feels promising because we're in a moment in time, however brief, where LLMs can navigate the "manual web" for us, shielding us from all the malicious bullshit (ads, marketing copy, funneling, call to actions, confusing design, dark patterns, less dark patterns, the fact that your store is a bloated SPA instead of an endpoint for a generic database querying frontend, and so on) while remaining mostly impervious to it. This will not last long - the vendors de-facto ruling the web have every reason to shut it down (or turn it around and use LLMs against us). But for now, it works.

Adversarial interoperability is the name of the game. LLMs, especially combined with tool use (and right tools), make it much easier and much more accessible than ever before. For however brief a moment.

makingstuffs1y ago

Sorry it wasn't entirely clear that I was by no means saying the web in its current form is anything close to what it could/should be. My main point was that, by making backdoors for MCPs there will be a new possible entry point for bad actors by exploiting said backdoor.

As for the optimisation to _waste human life_ I do agree but the reality is that the sites which waste the majority of human life/time are the ones which would not be automated by the MCP and would, ultimately, see more 'real' usage by virtue of the fact that your average human will have more time to mindlessly scroll their favourite echo-chamber.

Then we have the whole other debate of whether we really believe that the VC funders whom are largely responsible for the current state of the web will continue pumping money into something which would hurt their bottom line from another angle?

TeMPOraL1y ago

Fair enough. Thanks for clarifying. I agree with what you're saying in this comment.

On the topic of:

> whether we really believe that the VC funders whom are largely responsible for the current state of the web will continue pumping money into something which would hurt their bottom line from another angle?

No, I don't believe that at all - which is why I keep saying the current situation is an anomaly, a brief moment in time. LLMs deployed in form of general-purpose chatbots/agents are giving too much power to the people, which is already becoming disruptive to many businesses, so that power will be gradually taken away. Expect less general-purpose AI agents, and more "AI powered features" that shackle LLMs behind some limited UI, to ensure you can only get as much benefit from AI as it fits the vendors' business strategies.

jedimastert1y ago

Most thing that do this kind of fingerprinting bot detection aren't looking for a browser that's pretending to be a human, they're looking for other programs that are pretending to be a browser.

m11a1y ago

> Do we suppose they will just create a backdoor to allow _some_ bots in?

That, and maybe they will as CF seem quite big on MCP.[0] Or people just bypass the bot detection. It's already not terribly difficult to do; people in the sneaker bot and ticket scalping communities have long had bypasses for all the major companies.

I mean, we can all imagine bad use-cases of bots, but there's also the pros: the internet wastes loads of human time. I still remember needing to browse marketplaces real estate listings with terrible search and notification functionality to find a flat... shudders. Unbelievable amount of hours wasted.

If fewer people are able to build bots that can index a larger number of sites and give better searching capabilities, for instance, where sites are unable to provide this, I'm personally all for it. For many sites, it's that they lack the in-house development expertise and probably they wouldn't even mind.

[0]: https://developers.cloudflare.com/agents/model-context-proto... etc

hliyan1y ago

Ideally, shouldn't this be the native experience of most "sites" on the internet? We've built an entire user experience around serving users rich, two dimensional visual content that is not machine-readable and are now building a natural language command line layer on top of it. Why not get rid of the middleware and present users a direct natural language interface to the application layer?

buttofthejoke1y ago

Why use this over Puppeteer or Playwright extensions?

namuorgOP1y ago

The Puppeteer MCP server doesn't work well because it requires CSS selectors to interact with elements. It makes up CSS selectors rather than reading the page and generating working selectors.

The Playwright MCP server is great! Currently Browser MCP is largely an adaptation of the Playwright MCP server to use with your actual browser rather than creating a new one each time. This allows you to reuse your existing Chrome profile so that you don't need to log in to each service all over again and avoids bot detection which often triggers when using the fresh browser instances created by Playwright.

I also plan to add other useful tools (e.g. Browser MCP currently supports a tool to get the console logs which is useful for automated debugging) which will likely diverge from the Playwright MCP server features.

cAtte_1y ago

by the way, you can indeed access your personal context with Playwright. just `launchPersistentContext()` and set the userDataDir to that of your existing Chrome install:

https://playwright.dev/docs/api/class-browsertype#browser-ty...

buttofthejoke1y ago

Ooo, i like that. one of the most annoying points has been 'not sharing' the browser context. i'll def check it out

Fernicia1y ago

Any plans to make a Firefox version?

namuorgOP1y ago

Browser MCP uses the Chrome DevTools Protocol (CDP) to automate the browser so it currently only works for Chromium-based browsers.

Unfortunately, Firefox doesn't expose WebDriver BiDi (the standardized version of CDP) to browser extensions AFAIK (someone please correct me if I'm mistaken!), so I don't think I can support it even if I tried.

krono1y ago

Just found this[0] implementation roadmap on Mozilla's wiki, recently updated too! At least it's actively being worked on.

Not going to lie, this makes me happy.

[0]: https://wiki.mozilla.org/WebDriver/RemoteProtocol/WebDriver_...

DebtDeflation1y ago

In the Task Automation demo, how does it know all of the attributes of the motorcycle he is trying to sell? Is it relying on the underlying LLM's embedded knowledge? But then how would it know the price and mileage? Is there some underlying document not referenced in the demo? Because that information is not in the prompt.

pavelfeldman1y ago

I mean no disrespect, but this looks like an outdated clone of https://github.com/microsoft/playwright-mcp

https://github.com/microsoft/playwright-mcp/blob/main/src/to... https://github.com/BrowserMCP/mcp/blob/main/src/tools/tool.t...

namuorgOP1y ago

Hey Pavel, this is Namu, the creator of Browser MCP.

You’re right, this is an adaptation of Playwright MCP to automate the user’s local browser as mentioned in the GitHub README and here:

- https://github.com/BrowserMCP/mcp/blob/3e6824de6f36eba7d2d3b...

- https://news.ycombinator.com/item?id=43613905

Thanks for all your work to Playwright and Playwright MCP. I’m a big fan!

(For those not familiar, Pavel is the largest contributor to both Playwright and Playwright MCP: https://github.com/microsoft/playwright/graphs/contributors, https://github.com/microsoft/playwright-mcp/graphs/contribut...)

pavelfeldman1y ago

Hi Namu, all good! Feel free to send us the patches and work upstream, would be happy to see you on board!

marifjeren1y ago

From the Browser MCP README.md:

> Credits: Browser MCP was adapted from the Playwright MCP server

icelancer1y ago

I just run into a bunch of errors on my Windows machine + Chrome when connected over remote-ssh. Extension installed, tab enabled, npx updated/installed, etc.

2025-04-07 10:57:11.606 [info] rmcp: Starting new stdio process with command: npx @browsermcp/mcp@latest

2025-04-07 10:57:11.606 [error] rmcp: Client error for command spawn npx ENOENT

2025-04-07 10:57:11.606 [error] rmcp: Error in MCP: spawn npx ENOENT

2025-04-07 10:57:11.606 [info] rmcp: Client closed for command

2025-04-07 10:57:11.606 [error] rmcp: Error in MCP: Client closed

2025-04-07 10:57:11.606 [info] rmcp: Handling ListOfferings action

2025-04-07 10:57:11.606 [error] rmcp: No server info found

---

EDIT: Ended up fixing it by patching index.js. killProcessOnPort() was the problem. Can hit me up if you have questions, I cannot figure out how to put readable code in HN after all these years with the fake markdown syntax they use.

deathanatos1y ago

> I cannot figure out how to put readable code in HN after all these years with the fake markdown syntax they use.

Not that HN supports much in the way of markup, but code blocks are actually the same as Markdown: indent (by 2 spaces or more, in HN's syntax; Markdown calls for 4 or more, so they're compatible).

  print("Hello, world.")

namuorgOP1y ago

Thanks for the report and the update! I'd love to hear about what you changed — how can I get in touch? I didn't see anything in your HN profile. Feel free to email me at admin@browsermcp.io

sdotdev1y ago

Still slightly confused on what MCPs are but looking at this it does look useful

aryehof1y ago

A plugin protocol that allows “applications” to interact with LLMs.

darepublic1y ago

wouldn't it be for LLMs to interact with applications?

aryehof1y ago

Well they do work two ways. Extend an application to access an LLM, and allow an LLM to get context from an application.

For LLMs to interact with applications (without a two way protocol) is achievable just with tools/functions.

esafak1y ago

A protocol (the P in MCP) for LLMs to use tools.

BrandiATMuhkuh1y ago

This is really well done! Very cool.

I wonder if it's possible to add such plugins to election apps (e.g.: Slack). It would be such a nice experience if I could just connect my AI of choice to a local app.

decayiscreation1y ago

Good idea! I'm sure this is possible since it looks like playwright can control electron apps. https://playwright.dev/docs/api/class-electronapplication

chrisweekly1y ago

election -> Electron

wifipunk1y ago

Setting this up for claude desktop and cursor was alright. Works well out of the box with little setup, and I like that it attached to my active browser tab. Keep up the good work.

qwertox1y ago

MCP seems to be JavaScript's trojan horse into AI.

ketzo1y ago

"Trojan horse"? 95% of people currently access AI via web or mobile app; those are pretty JS-dominated, no?

otherayden1y ago

I literally started working on the same exact idea last night haha. Great work OP. I'm curious, how are you feeding the web data to the LLM? Are you just passing the entire page contents to it and then having it interact with the page based on CSS selectors/xpath? Also, what are your thoughts on letting it do its own scripting to automate certain tasks?

metadat1y ago

Bot Detection Evasion is becoming an increasingly relevant topic. Even for non-abusive automation, it's now a necessary consideration.

Interesting research and reading via the HN search portal: https://hn.algolia.com/?q=bot+detection

behnamoh1y ago

What I don't like about LLMs is that people keep re-inventing the wheel over and over. For example, we've been able to control browsers using GPT for about 2 years now:

- https://github.com/mayt/BrowserGPT

- https://github.com/TaxyAI/browser-extension

- https://github.com/browser-use/browser-use

- https://github.com/Skyvern-AI/skyvern

- https://github.com/m1guelpf/browser-agent

- https://github.com/richardyc/Chrome-GPT

- https://github.com/handrew/browserpilot

- https://github.com/ishan0102/vimGPT

- https://github.com/Jiayi-Pan/GPT-V-on-Web

ajcp1y ago

I think this is noteworthy in that it is using what is increasingly becoming the dominant API protocol for LLM.

Just because the wheel exists doesn't mean we shouldn't strive to make it better by applying new knowledge and technologies to it.

dumansizsercan1y ago

Competitors don’t just challenge you, they push you to deliver your best work.

darepublic1y ago

none of these have stuck right. And none of them work well enough that all web dev agencies no longer have to worry about e2e testing. (or do some of them? Maybe the market is simply that inefficient).

dvngnt_1y ago

I don't see this being a solution for full e2e regression testing. Having to run inference for each command/test seems expensive. I do think there's room for self-healing tests after failure.

darepublic1y ago

If this works well enough couldn't you save the selectors and only use inference running the test for the first time and when the UI has changed. Cheaper than dev?

1 more reply

dimgl1y ago

This is a bit disingenuous, no? None of these have actually taken off.

webprofusion1y ago

Or just use Playwright MCP: https://github.com/microsoft/playwright-mcp

rahimnathwani1y ago

This is cool. I'm curious why you chose to use an extension, rather than getting the user to run Chrome with remote debugging turned on?

namuorgOP1y ago

An extension is more user-friendly! I leave Chrome open basically 24/7 and having to create a new Chrome instance via the command line just to use Browser MCP just felt like too high of a barrier.

hannofcart1y ago

Not OP but I suspect it is because of this (mentioned on their page):

'Avoids bot detection and CAPTCHAs by using your real browser fingerprint.'

tylergetsay1y ago

I don't think remote debugging by itself on a normal chrome profile is detectable

omneity1y ago

Exposing Chrome CDP is a terrible idea from a security and privacy perspective. You get the keys to the whole kingdom (and expose them on a standard port with a well documented API). All security features of the web can be bypassed, and then some, as CDP exposes even more capabilities than chrome extensions and without any form of supervision.

1 more reply

parhamn1y ago

I'm sure its about the cookies/sessions but I do recall you can load cookies from another browser?

1010081y ago

Good, just what we needed. More bots browsing the internet. Somedays I think I am not 100% against of every website having a captcha...

handfuloflight1y ago

Not out of the realm of possibility that this very comment was written by a bot prompted to write a negative response to a given piece of content.

1010081y ago

Not, human tired of creating content to put online and being consumed not by people but by bots or any other form of mechanical consumption that I don't like. As the owner of the content I think I have the right to set that preference, don't you think?

brandensilva1y ago

Yeah this is definitely a bad English bot

mgraczyk1y ago

It's a developer tool

1010081y ago

Then it should be limited to localhost or something similar.

mgraczyk1y ago

It can be, just do that when you install it

dalemhurley1y ago

What if you are using domain names for your local environment or a cloud environment like IDX or you want to automate the testing of the UAT environment?

knes1y ago

This is great. Especially debugging frontend issue on localhost or staging.

Also works flawlessly with augment code.com too!

picardo1y ago

I like this. It would be interesting to use it for when I need to use authenticated browser sessions.

lxe1y ago

This one also uses aria snapshots formatted as yaml. This will quickly exceed context limits.

plessas1y ago

thank you for this. Using my own browser helps me automate tasks on sites I 'd typically get detected using automation. Works like a charm! Hope you continue to work on the repo.

jngiam11y ago

Pretty cool, do you know of a version of this that supports the new remote MCP protocol

omneity1y ago

We work on something similar and aim to be the huggingface hub for automations you can run in your browser[0], with built-in support for MCP SSE.

Use the pre-built Trails[1][2] as MCP servers or create and publish your own with a familiar puppeteer-like API, powered by your or your friends browsers.

0: https://herd.garden

1: https://herd.garden/trails/@herd/browser

2: https://herd.garden/trails/@omneity/serp

revskill1y ago

Can u expose the sdk as a react component to be used inside an app ?

mvdtnz1y ago

Is anyone successfully running MCPs / Claude Desktop on Linux?

iDon1y ago

I am running this OK in Ubuntu 2404 : https://github.com/aaddrick/claude-desktop-debian Claude Desktop for Debian-based Linux distributions

From Claude I have connected to these MCP servers OK : @modelcontextprotocol/server-filesystem, @executeautomation/playwright-mcp-server.

I have connected to OP's extension (browsermcp.io) from vsCode (and clicked 1 tab button OK), but not from Claude desktop so far (I get Cannot find module 'node:path'; which is require-d in npm/lib/cli.js; tried node 18,20,22; some suggestions here : https://medium.com/@aleksej.gudkov/error-cannot-find-module-... ).

pknerd1y ago

So why do I need an editor(Cusror)? How does a non-coder use it?

rahimnathwani1y ago

If you're a non-coder, use it with Claude Desktop.

xena1y ago

Do you respect robots.txt so administrators can block this tool?

canogat1y ago

Should I be blocked if I ask Claude Desktop to lower the prices in all of my Craigslist ads by 10%?

randunel1y ago

Do user agents doing work for users need to respect robots.txt? If yes, does chrome?

what1y ago

Any scraper is also a “user agent doing work for users”. Which ones should respect robots.tx?

randunel1y ago

Does the user agent fit the definition of a web crawler? If so, then observe robots.txt. This one does not, see https://en.m.wikipedia.org/wiki/Web_crawler

cadence-1y ago

How does this compare to Anthropic's Computer Use?

tuananh1y ago

i want to add this for my project (which use wasm) but rustlang/socket2 WASI support is not merged yet. after that rust CDP will work.

1 more reply

jayunit1y ago

awesome! For the Cursor / React / Click to Add 2 example, can we also have it write a unit/e2e regression test?

jayunit1y ago

author replied on Twitter:

> that's a great use case! the aria snapshot that browser mcp generates is enough to write tests for playwright using its role-based locators, but i may add a get_page_html tool in the same way that they're considering: https://github.com/microsoft/playwright-mcp/issues/103

https://x.com/roadtoramen/status/1909356255866733044

mrwww1y ago

How does it compare to playwright mcp?

graiz1y ago

works better than puppet mcp for me but having issues with keyboard events and actions on some websites.

johnpaulkiser1y ago

> Private > Since automation happens locally, your browser activity stays on your device and isn't sent to remote servers.

I think this is bullshit. Isn't the dom or whatever sent to the model api?

namuorgOP1y ago

Of course, you're sending data to the AI model, but the "private" aspect is contrasting automating using a local browser vs. automating using a remote browser.

When you automate using a remote browser, another service (not the AI model) gets all of the browsing activity and any information you send (e.g. usernames and passwords) that's required for the automation.

With Browser MCP, since you're automating locally, your sensitive data and browser activity (apart from the results of MCP tool calls that's sent to the AI model) stay on your device.

johnpaulkiser1y ago

I think we need to be very careful & intentional about the language we use with these kinds of tools, especially now that the MCP floodgates have been opened. You aren't just exposing the users browsing data to which ever model they are using, you are also exposing it any tools they may be allowing as well.

A lot of non technical people are using these tools to "vibe" their way to productivity. I would explicitly tell them that potentially "all" of their browsing data is going to be exposed to their LLM client and they need to use this at their own risk.

throwaway815231y ago

Can these things automatically solve recaptcha? That's the only AI browser feature that I have a real use for.

SparkyMcUnicorn1y ago

https://github.com/dessant/buster

tntpreneur1y ago

Thanks but idea is ok but it is not working smoothly.

justanotheratom1y ago

neat, but instead of asking me to install browser extension, can you just bundle a browser in the MCP server?

tigrezno1y ago

this is the way

ndr1y ago

WARNING for Cursor users:

Cursor is currently stuck using an outdated snapshot of the VSCode Marketplace, meaning several extensions within Cursor remain affected by high-severity CVEs that have already been patched upstream in VSCode. As a result, Cursor users unknowingly remain vulnerable to known security issues. This issue has been acknowledged but remains unresolved: https://github.com/getcursor/cursor/issues/1602#issuecomment...

Given Cursor's rising popularity, users should be aware of this gap in security updates. Until the Cursor team resolves the marketplace sync issue, caution is advised when using certain extensions.

I've flagged it here, apologies for the repost: https://news.ycombinator.com/item?id=43609572

rs1861y ago

I am surprised that the VSCode team hasn't gone after them for mirroring the marketplace, as the Visual Studio team made it very clear that they don't want anybody to do that -- it is their marketplace.

SSLy1y ago

It seems that there is one sane PM left at VScode who knows that such move would only lead to MSFT losing more PR. And anti-trust scrutiny?

JackYoustra1y ago

Why? This seems fine.

j / k navigate · click thread line to collapse

217 comments

rmac1y ago

[!warning!]

1) this projects' chrome extension sends detailed telemetry to posthog and amplitude:

- https://storage.googleapis.com/cobrowser-images/telemetry.pn...

- https://storage.googleapis.com/cobrowser-images/pings.png

2) this project includes source for the local mcp server, but not for its chrome extension, which is likely bundling https://github.com/ruifigueira/playwright-crx without attribution

super suss

namuorgOP1y ago

Hey, creator of Browser MCP here.

I understand that you're frustrated with the way I've built this project, but there's really nothing nefarious going on here. Cheers!

asaddhamani1y ago

Hey, as a maker, I get it. You spent time building something, and you want to understand how it gets used. If you're not collecting personal info, there is nothing wrong with this.

Knee-jerk reactions aren't helpful. Yes, too much tracking is not good, but some tracking is definitely important to improving a product over time and focusing your efforts.

Trias111y ago

When people see “I collect” they won’t even bother reading further.

This is showstopper.

Noble reasons won’t matter.

Spyware perception.

wyldberry1y ago

This seems to be the opposite of what happens in reality.

nlarew1y ago

"detailed" is an anonymized deviceId and a counter of tool calls? Heaven forbid an app want to get some basic insights into how people use it.

tomrod1y ago

Correct. Telemetry should _always_ be opt-in and explicitly an easy choice to not engage.

Any other mode of operation is morally bankrupt.

nlarew1y ago

Really? The hyperbole does not help anyone here.

2 more replies

observationist1y ago

This automatic sense of entitlement to surveil users is the absolute embodiment of the banality of evil.

Ignorance and inertia bolstering the status quo doesn't make it any less wrong to pile more bullshit like this onto the existing massive pile of bullshit we put up with. It's still bullshit.

nlarew1y ago

You're making a huge jump from "gathering anonymous counters to understand how many people use the thing" to "harvest all the data you want and monetize it".

If they were tracking my identity across sites and actually selling it to the highest bidder that's one thing that we'll definitely agree on. This is so so far from that.

You're welcome to build and use your own MCP browser automation if you're so hostile to the developer that built something cool and free for you to use.

1 more reply

arresin1y ago

The only chrome extensions you should install are ones you can build yourself from source.

neycoda1y ago

... And have reviewed and understand completely

EGreg1y ago

So ... pretty much none

DO NOT have any extensions running by default except "on click".

(That wouldn't really solve everything since there can be sleeper code that is "switched on" with certain data coming over the wire, but better than what we have now.)

3 more replies

bhouston1y ago

So the website claims:

"Avoids bot detection and CAPTCHAs by using your real browser fingerprint."

Yeah, not really.

tempest_1y ago

The caveat with these things is usually "when used with high quality proxies".

Also I assume this extension is pretty obvious so it wont take long for CF bot detection to see it the same as playwrite or whatever else.

unixfox1y ago

The extension enable debugging in your browser (a banner appears telling you about automation). It's possible to detect that in JavaScript.

Hence why projects like this exist: https://github.com/Kaliiiiiiiiii-Vinyzu/patchright. They hide the debugging part from JavaScript.

DeathArrow1y ago

It might depend on the speed with which you click on the elements on the website.

SSLy1y ago

it does, CF bans my own honest to God clicks if I do them too fast.

5 more replies

SkyBelow1y ago

tempoponet1y ago

dalemhurley1y ago

4 more replies

kmacdough1y ago

> I'm wondering if it is something like they can track mouse movement

Yes, this is a big signal they use.

> adding some more human like noise to the mouse

econ1y ago

Have a lot of small things count towards the result. Users behave quite linearly, extra points if they act differently all of a sudden.

mrweasel1y ago

There's also the whole issue of captchas being in place because people cannot be trusted to behave appropriately with automation tools.

TeMPOraL1y ago

Actually, the CAPTCHAs are in place mostly because of assholes like you abusing other assholes like you[0].

> If you truly need access to something, ask for an API, may you need to pay for it, maybe you don't.

> May you get it, maybe the site owner tells you to go pound sand and you should take that as you're behaviour and/or use case is not wanted.

Like they have a final say in this.

[0] - Not "you" 'mrweasel personally, but "you" the imaginary speaker of your second paragraph.

mrweasel1y ago

> Not "you" 'mrweasel personally

Understood, but thank you :-)

1 more reply

StevenNunez1y ago

I feel like I slept for a day and now MCPs are everywhere... I don't know what MCPs are and at this point I'm too afraid to ask.

oulipo1y ago

It's just a way to provide a "library of methods" / API that the LLM models can "call", so basically giving them method names, their parameters, the type of the output, and what they are for,

and then the LLM model will ask the MCP server to call the functions, check the result, call the next function if needed, etc

with a "browser MCP" it is now possible: ChatGPT has a way to tell your browser "open Google maps", "show me a screenshot", "click at that position", etc

mattfrommars1y ago

Is this what 'calling' is?

oulipo1y ago

Yes exactly. MCP just formalize this a bit better

throwaway3141551y ago

> with a "browser MCP" it is now possible: ChatGPT has a way to tell your browser "open Google maps", "show me a screenshot", "click at that position", etc

Cryptocurrency "if you build it they'll come" vibes.

2 more replies

dimitri-vs1y ago

You actually can, its called Operator and its a complete waste of time, just like 99% of agents/MCPs.

oulipo1y ago

Operator is basically MCP...

jastuk1y ago

And the worst part is that it opens a pandora's box of potential exploits; https://elenacross7.medium.com/%EF%B8%8F-the-s-in-mcp-stands...

TeMPOraL1y ago

That's not fault of MCP though, that's the fault of vendors peddling their MCPs while clinging to the SaaS model.

I suppose it's good to spell out the risks, but it doesn't make sense blaming MCP itself, because those risks are fundamental aspects of the features it provides.

kmacdough1y ago

It's not blame, but it's a striking reality that needs to be kept at the forefront.

joshwarwick151y ago

The article also reeks of LLM ironically

tuananh1y ago

it still is. if user has 1 bad tool, it's done!

https://invariantlabs.ai/blog/mcp-security-notification-tool...

1 more reply

halJordan1y ago

So im not sure id give up the sum total progress of the automobile just because the first decade was a bad one

orbital-decay1y ago

hedgehog-ai1y ago

whalesalad1y ago

It’s RPC specifically for an LLM. But yes it’s the new soup de jour trend sweeping the globe.

andy_ppp1y ago

Is there any browser that can do this yet as it seems extremely useful to be able to extract details from the page!

mfkhalil1y ago

Niksko1y ago

unixfox1y ago

You could do that with browser-use: https://browser-use.com/

bravura1y ago

When doing interior decoration, I am definitely interested in finding objects that fit very specific prompts.

neilellis1y ago

Well done, just tested on Claude Desktop and it worked smoothly and a lot less clunky than playwright. This is the right direction to go in.

thenaturalist1y ago

Is there an issue with the lag between what is happening in the browser and the MCP app (in my case Claude Desktop)?

Actions which worked flawlessly the first time (rename a Google spreadsheet by clicking on the title and inputting the name) fail 100% of subsequent attempts.

Same with identifying cells A1, B1, etc. and inserting into the rows.

Almost perfect on 1st try, not reproducible in 100% of attempts afterwards.

Kudos to how smooth this experience is though, very nice setup & execution!

EDIT 2: The lag & speed to click the allow action make it seemingly unusable in Claude Desktop. :(

otherayden1y ago

mkummer1y ago

Agreed, I'd been working on a Google Sheets specific MCP last week – just got it published here: https://github.com/mkummer225/google-sheets-mcp

rahimnathwani1y ago

This is cool. You should submit this as a 'Show HN'.

Also consider publishing it so people can use it without having to use git.

1 more reply

xingwu1y ago

I have worked on a google sheets MCP, for data scraping it worked pretty well leveraging Claude's built-in search functionalities.

example: https://x.com/xing101/status/1903391600040083488 set up: https://github.com/xing5/mcp-google-sheets

throwaway3141551y ago

There's no bug or glitch happening. It's just statistically unlikely to perform the action you wanted and you landed a good dice roll on your first turn.

weq1y ago

lizardking1y ago

For me it can't click anywhere on google sheets. I get the following error

--Error: Cannot access a chrome-extension:// URL of different extension

nonethewiser1y ago

Stuff like this makes me giddy for manual tasks like reimbursement requests. Its such a chore (and it doesnt help our process isnt great).

doug_life1y ago

This may be obvious to most here, but you need Node.js installed for the MCP server to run. This critical detail is not in the set up instructions.

namuorgOP1y ago

Added!

https://docs.browsermcp.io/setup-server#node-js

serverlessmania1y ago

Did something similar but controls a hardware synth, allowing me to do sound design without touching the physical knobs: https://github.com/zerubeus/elektron-mcp

dmix1y ago

Oh good idea.

Gehinnn1y ago

Would be nice if it could use the Accessibility Tree from chrome dev tools to navigate the page instead of relying on screenshots (https://developer.chrome.com/blog/full-accessibility-tree)

mgraczyk1y ago

In fact you have it backwards. It has no screenshots at the moment, only the accessibility tree

amendegree1y ago

So is MCP the new RPA (Robotics Process Automation)? Like generic yahoo pipes?

spmurrayzzz1y ago

I just view it as a relative minor convenience, but it's not some game-changer IMO.

PantaloonFlames1y ago

It seems the benefit of MCP is for Anthropic to enlist the community in building integrations for Claude desktop, no?

And if other vendors sign on to support MCP, then it becomes a self reinforcing cycle of adoption.

3 more replies

kmangutov1y ago

The interesting thing about MCP as a tool use protocol is the traction that it has garnered in terms of clients and servers supporting it.

wonderwhyer1y ago

I would probably call it shipping containers for LLM tool integrations.

Containers are not a big deal when viewed in isolation. But when its common size/standard for all kinds of ships, cranes and trucks, it is a big deal then.

In that sense its more about gathering community around one way to do things.

In theory there are REST APIs and OpenAPI standard, but those were not made for LLMs but code. So you usually need some kind of friendly wrapper(like for candy) on top of REST API.

It really starts to feel like a a big deal when you work in integrating LLMs with tools.

tmvphil1y ago

I'm a bit stuck on this, maybe you can explain why an LLM would have any difficulty writing REST API calls? Seems like it should be no problem.

ajcp1y ago

No, since MCP is just an interface layer it is to AI what REST API is to DPA and COM/App DLLs are to RPA.

APA (Agentic Process Automation) is the new RPA, and this is definitely one example of it.

XCSme1y ago

But AI already supported function calling, and you could describe them in various ways. Isn't this just a different way to define function calling?

cadence-1y ago

Doesn't work on Windows:

namuorgOP1y ago

Can you try again?

There was another comment that mentioned that there's an issue with port killing code on Windows: https://news.ycombinator.com/item?id=43614145

cadence-1y ago

It's working now with the 0.1.0 for me. But I will let you know if I experience any issues once I get updated to 0.1.1.

cadence-1y ago

I was able to make it work like this:

1. Kill your Claude Desktop app

2. Click "Connect" in the browser extension.

3. Quickly start your Calude Desktop app.

It will work 50% of the time - I guess the timing must be just right for it to work. Hopefully, the developers can improve this.

Now on to testing :)

josefrichter1y ago

What I used this for:

"Go to https://news.ycombinator.com/upvoted?id=josefrichter, summarize what topics I am interested in, and then from the homepage pick articles I might be interested in."

Works like a charm.

washedDeveloper1y ago

Can you add a license to your code along with open sourcing the chrome extension?

makingstuffs1y ago

Do we suppose they will just create a backdoor to allow _some_ bots in? If they do that how long will it be before other bots impersonate them? It seems like a bit of a fad from my small mind.

Suppose it does become a thing, what then? We end up with an internet which is heavily optimised for bots (arguably it already is to an extent) and unusable for humans?

Wild.

kraftman1y ago

There are already plenty of services that provide residential proxies and captcha bypass pretty cheaply.

https://brightdata.com/pricing/web-unlocker https://2captcha.com/pricing

TeMPOraL1y ago

> Suppose it does become a thing, what then? We end up with an internet which is heavily optimised for bots (arguably it already is to an extent) and unusable for humans?

As opposed to the Web we now have, which is heavily optimized for... wasting human life.

makingstuffs1y ago

TeMPOraL1y ago

Fair enough. Thanks for clarifying. I agree with what you're saying in this comment.

On the topic of:

jedimastert1y ago

Most thing that do this kind of fingerprinting bot detection aren't looking for a browser that's pretending to be a human, they're looking for other programs that are pretending to be a browser.

m11a1y ago

> Do we suppose they will just create a backdoor to allow _some_ bots in?

[0]: https://developers.cloudflare.com/agents/model-context-proto... etc

hliyan1y ago

buttofthejoke1y ago

Why use this over Puppeteer or Playwright extensions?

namuorgOP1y ago

The Puppeteer MCP server doesn't work well because it requires CSS selectors to interact with elements. It makes up CSS selectors rather than reading the page and generating working selectors.

cAtte_1y ago

by the way, you can indeed access your personal context with Playwright. just `launchPersistentContext()` and set the userDataDir to that of your existing Chrome install:

https://playwright.dev/docs/api/class-browsertype#browser-ty...

buttofthejoke1y ago

Ooo, i like that. one of the most annoying points has been 'not sharing' the browser context. i'll def check it out

Fernicia1y ago

Any plans to make a Firefox version?

namuorgOP1y ago

Browser MCP uses the Chrome DevTools Protocol (CDP) to automate the browser so it currently only works for Chromium-based browsers.

krono1y ago

Just found this[0] implementation roadmap on Mozilla's wiki, recently updated too! At least it's actively being worked on.

Not going to lie, this makes me happy.

[0]: https://wiki.mozilla.org/WebDriver/RemoteProtocol/WebDriver_...

DebtDeflation1y ago

pavelfeldman1y ago

I mean no disrespect, but this looks like an outdated clone of https://github.com/microsoft/playwright-mcp

https://github.com/microsoft/playwright-mcp/blob/main/src/to... https://github.com/BrowserMCP/mcp/blob/main/src/tools/tool.t...

namuorgOP1y ago

Hey Pavel, this is Namu, the creator of Browser MCP.

You’re right, this is an adaptation of Playwright MCP to automate the user’s local browser as mentioned in the GitHub README and here:

- https://github.com/BrowserMCP/mcp/blob/3e6824de6f36eba7d2d3b...

- https://news.ycombinator.com/item?id=43613905

Thanks for all your work to Playwright and Playwright MCP. I’m a big fan!

pavelfeldman1y ago

Hi Namu, all good! Feel free to send us the patches and work upstream, would be happy to see you on board!

marifjeren1y ago

From the Browser MCP README.md:

> Credits: Browser MCP was adapted from the Playwright MCP server

icelancer1y ago

I just run into a bunch of errors on my Windows machine + Chrome when connected over remote-ssh. Extension installed, tab enabled, npx updated/installed, etc.

2025-04-07 10:57:11.606 [info] rmcp: Starting new stdio process with command: npx @browsermcp/mcp@latest

2025-04-07 10:57:11.606 [error] rmcp: Client error for command spawn npx ENOENT

2025-04-07 10:57:11.606 [error] rmcp: Error in MCP: spawn npx ENOENT

2025-04-07 10:57:11.606 [info] rmcp: Client closed for command

2025-04-07 10:57:11.606 [error] rmcp: Error in MCP: Client closed

2025-04-07 10:57:11.606 [info] rmcp: Handling ListOfferings action

2025-04-07 10:57:11.606 [error] rmcp: No server info found

---

deathanatos1y ago

> I cannot figure out how to put readable code in HN after all these years with the fake markdown syntax they use.

Not that HN supports much in the way of markup, but code blocks are actually the same as Markdown: indent (by 2 spaces or more, in HN's syntax; Markdown calls for 4 or more, so they're compatible).

  print("Hello, world.")

namuorgOP1y ago

Thanks for the report and the update! I'd love to hear about what you changed — how can I get in touch? I didn't see anything in your HN profile. Feel free to email me at admin@browsermcp.io

sdotdev1y ago

Still slightly confused on what MCPs are but looking at this it does look useful

aryehof1y ago

A plugin protocol that allows “applications” to interact with LLMs.

darepublic1y ago

wouldn't it be for LLMs to interact with applications?

aryehof1y ago

Well they do work two ways. Extend an application to access an LLM, and allow an LLM to get context from an application.

For LLMs to interact with applications (without a two way protocol) is achievable just with tools/functions.

esafak1y ago

A protocol (the P in MCP) for LLMs to use tools.

BrandiATMuhkuh1y ago

This is really well done! Very cool.

I wonder if it's possible to add such plugins to election apps (e.g.: Slack). It would be such a nice experience if I could just connect my AI of choice to a local app.

decayiscreation1y ago

Good idea! I'm sure this is possible since it looks like playwright can control electron apps. https://playwright.dev/docs/api/class-electronapplication

chrisweekly1y ago

election -> Electron

wifipunk1y ago

Setting this up for claude desktop and cursor was alright. Works well out of the box with little setup, and I like that it attached to my active browser tab. Keep up the good work.

qwertox1y ago

MCP seems to be JavaScript's trojan horse into AI.

ketzo1y ago

"Trojan horse"? 95% of people currently access AI via web or mobile app; those are pretty JS-dominated, no?

otherayden1y ago

metadat1y ago

Bot Detection Evasion is becoming an increasingly relevant topic. Even for non-abusive automation, it's now a necessary consideration.

Interesting research and reading via the HN search portal: https://hn.algolia.com/?q=bot+detection

behnamoh1y ago

What I don't like about LLMs is that people keep re-inventing the wheel over and over. For example, we've been able to control browsers using GPT for about 2 years now:

- https://github.com/mayt/BrowserGPT

- https://github.com/TaxyAI/browser-extension

- https://github.com/browser-use/browser-use

- https://github.com/Skyvern-AI/skyvern

- https://github.com/m1guelpf/browser-agent

- https://github.com/richardyc/Chrome-GPT

- https://github.com/handrew/browserpilot

- https://github.com/ishan0102/vimGPT

- https://github.com/Jiayi-Pan/GPT-V-on-Web

ajcp1y ago

I think this is noteworthy in that it is using what is increasingly becoming the dominant API protocol for LLM.

Just because the wheel exists doesn't mean we shouldn't strive to make it better by applying new knowledge and technologies to it.

dumansizsercan1y ago

Competitors don’t just challenge you, they push you to deliver your best work.

darepublic1y ago

dvngnt_1y ago

I don't see this being a solution for full e2e regression testing. Having to run inference for each command/test seems expensive. I do think there's room for self-healing tests after failure.

darepublic1y ago

If this works well enough couldn't you save the selectors and only use inference running the test for the first time and when the UI has changed. Cheaper than dev?

1 more reply

dimgl1y ago

This is a bit disingenuous, no? None of these have actually taken off.

webprofusion1y ago

Or just use Playwright MCP: https://github.com/microsoft/playwright-mcp

rahimnathwani1y ago

This is cool. I'm curious why you chose to use an extension, rather than getting the user to run Chrome with remote debugging turned on?

namuorgOP1y ago

An extension is more user-friendly! I leave Chrome open basically 24/7 and having to create a new Chrome instance via the command line just to use Browser MCP just felt like too high of a barrier.

hannofcart1y ago

Not OP but I suspect it is because of this (mentioned on their page):

'Avoids bot detection and CAPTCHAs by using your real browser fingerprint.'

tylergetsay1y ago

I don't think remote debugging by itself on a normal chrome profile is detectable

omneity1y ago

1 more reply

parhamn1y ago

I'm sure its about the cookies/sessions but I do recall you can load cookies from another browser?

1010081y ago

Good, just what we needed. More bots browsing the internet. Somedays I think I am not 100% against of every website having a captcha...

handfuloflight1y ago

Not out of the realm of possibility that this very comment was written by a bot prompted to write a negative response to a given piece of content.

1010081y ago

brandensilva1y ago

Yeah this is definitely a bad English bot

mgraczyk1y ago

It's a developer tool

1010081y ago

Then it should be limited to localhost or something similar.

mgraczyk1y ago

It can be, just do that when you install it

dalemhurley1y ago

What if you are using domain names for your local environment or a cloud environment like IDX or you want to automate the testing of the UAT environment?

knes1y ago

This is great. Especially debugging frontend issue on localhost or staging.

Also works flawlessly with augment code.com too!

picardo1y ago

I like this. It would be interesting to use it for when I need to use authenticated browser sessions.

lxe1y ago

This one also uses aria snapshots formatted as yaml. This will quickly exceed context limits.

plessas1y ago

thank you for this. Using my own browser helps me automate tasks on sites I 'd typically get detected using automation. Works like a charm! Hope you continue to work on the repo.

jngiam11y ago

Pretty cool, do you know of a version of this that supports the new remote MCP protocol

omneity1y ago

We work on something similar and aim to be the huggingface hub for automations you can run in your browser[0], with built-in support for MCP SSE.

Use the pre-built Trails[1][2] as MCP servers or create and publish your own with a familiar puppeteer-like API, powered by your or your friends browsers.

0: https://herd.garden

1: https://herd.garden/trails/@herd/browser

2: https://herd.garden/trails/@omneity/serp

revskill1y ago

Can u expose the sdk as a react component to be used inside an app ?

mvdtnz1y ago

Is anyone successfully running MCPs / Claude Desktop on Linux?

iDon1y ago

I am running this OK in Ubuntu 2404 : https://github.com/aaddrick/claude-desktop-debian Claude Desktop for Debian-based Linux distributions

From Claude I have connected to these MCP servers OK : @modelcontextprotocol/server-filesystem, @executeautomation/playwright-mcp-server.

pknerd1y ago

So why do I need an editor(Cusror)? How does a non-coder use it?

rahimnathwani1y ago

If you're a non-coder, use it with Claude Desktop.

xena1y ago

Do you respect robots.txt so administrators can block this tool?

canogat1y ago

Should I be blocked if I ask Claude Desktop to lower the prices in all of my Craigslist ads by 10%?

randunel1y ago

Do user agents doing work for users need to respect robots.txt? If yes, does chrome?

what1y ago

Any scraper is also a “user agent doing work for users”. Which ones should respect robots.tx?

randunel1y ago

Does the user agent fit the definition of a web crawler? If so, then observe robots.txt. This one does not, see https://en.m.wikipedia.org/wiki/Web_crawler

cadence-1y ago

How does this compare to Anthropic's Computer Use?

tuananh1y ago

i want to add this for my project (which use wasm) but rustlang/socket2 WASI support is not merged yet. after that rust CDP will work.

1 more reply

jayunit1y ago

awesome! For the Cursor / React / Click to Add 2 example, can we also have it write a unit/e2e regression test?

jayunit1y ago

author replied on Twitter:

https://x.com/roadtoramen/status/1909356255866733044

mrwww1y ago

How does it compare to playwright mcp?

graiz1y ago

works better than puppet mcp for me but having issues with keyboard events and actions on some websites.

johnpaulkiser1y ago

> Private > Since automation happens locally, your browser activity stays on your device and isn't sent to remote servers.

I think this is bullshit. Isn't the dom or whatever sent to the model api?

namuorgOP1y ago

Of course, you're sending data to the AI model, but the "private" aspect is contrasting automating using a local browser vs. automating using a remote browser.

With Browser MCP, since you're automating locally, your sensitive data and browser activity (apart from the results of MCP tool calls that's sent to the AI model) stay on your device.

johnpaulkiser1y ago

throwaway815231y ago

Can these things automatically solve recaptcha? That's the only AI browser feature that I have a real use for.

SparkyMcUnicorn1y ago

https://github.com/dessant/buster

tntpreneur1y ago

Thanks but idea is ok but it is not working smoothly.

justanotheratom1y ago

neat, but instead of asking me to install browser extension, can you just bundle a browser in the MCP server?

tigrezno1y ago

this is the way

ndr1y ago

WARNING for Cursor users:

Given Cursor's rising popularity, users should be aware of this gap in security updates. Until the Cursor team resolves the marketplace sync issue, caution is advised when using certain extensions.

I've flagged it here, apologies for the repost: https://news.ycombinator.com/item?id=43609572

rs1861y ago

SSLy1y ago

It seems that there is one sane PM left at VScode who knows that such move would only lead to MSFT losing more PR. And anti-trust scrutiny?

JackYoustra1y ago

Why? This seems fine.

j / k navigate · click thread line to collapse