Social media can be intellectually stimulating and educational, but it's also easy to get sucked into ideological sniping and flamewars, even if you didn't go looking for it. The emotional and intellectual energy spent flaming strangers on the Internet is a complete waste of human capital.
With an API like this, I assume you could have a browser extension that could de-snarkify content before showing it to you. You could ask the LLM to preserve all factual content from the post, but to de-claw any aggressive or snarky language. If you really wanted to have fun, you could ask it to turn anything written in an aggressive tone into something that sounds absurd or incompetent, so that the more aggressive the post, the more it would make the author look silly.
This could have a double benefit. For the reader, it insulates them from the personal attacks of random strangers on the Internet. Don't get me wrong, there is a time and a place for real, charged arguments about important issues that affect us all. But there is little to be gained from having those fights with strangers; on the contrary, I think it poisons the body politic when strangers are screaming at each other.
For the writer, it takes away any incentive to be snarky or rude. If other people filter their content this way, there's no point in trying to be mean to them, and no "race to the bottom" for who can be more nasty.
I want the option to engage with the substance of new developments in the world, technology, etc. without the drama. I don't want to be drawn into the drama of strangers (who could, for all I know, just be bots or ragebaiting AIs).
If I want drama, there's plenty of it on TV, or I could talk to my friends about what is going on with people I actually know.
The anti-pattern, in my mind, is logging on to engage with substantive content and to be inadvertently drawn into flamewars with strangers.
Sure, you might say this sort of thing is boiling flavor out of your food, but... boiling the bacteria out of what you consume isn't a bad thing.
My wish list:
- Eliminate ALL clickbait titles and ads. I only want to see a dry factual title.
- For any given topic, I only care about the main article (with the option to only see a summary, unless its a high quality blog) and couple of substantive comments, rest is junk I don't want to see.
The current state of popular social media sites has meant that I don't use it at all (except HN, which is trending in the same direction due to saturation with AI), but every other week or so I end up wasting a few hours, which I'd like to avoid entirely.
Ideally this would lead to 98% of content filtered/summarised out, and over time only use the internet for looking things up with intention. I want this to remove majority of "entertainment" value from the internet (by default) so that time/energy can be refocused in real life and high quality sources (books) only.
DeArrow works for YouTube atleast. uBlock Origin or Brave browser works for ads. Not sure why you'd need an AI to remove ads...
I can manually “hold” emails so they don’t go in the “sort out my email” woodchipper. It’s been life-changing.
But... It's the type of idea that is unpredictable as it comes into contact with reality. If it works, it probably works very differently from the initial idea of how it will work.
I see the merit in such a proposal. It's the linguistic equivalent to boiling the food you consume, instead of eating it raw with all the associated bad stuff.
The problem is, as you said, that this plan is unlikely to be as rosy as it's portrayed and probably has a lot of drawbacks in real life.
Interesting to think about and explore, though.
I love this "de-snarkifier" idea and it seems to have broad interest. I couldn't resist hacking (well, vibe coding[1]) a "Snarknada" prototype to explore the viability, including patterns for low-latency and accuracy.
You’ve hit on exactly why we think on-device is the right move for this class of use cases. If you tried to "de-snark" an entire infinite-scrolling feed via a cloud API, the token costs would be astronomical for a developer. Plus, people (rightly) don't want to send their private social feeds or DMs to a third-party server just to clean up the tone.
Moving this to the device should make high-frequency "Semantic Mutation" financially and technically viable for the first time. If you (or anyone else) starts building this more seriously than my PM vibe coded toy, and hits specific friction points, I’d love to hear about them: it helps us prioritize the roadmap.
[1]: If you're using a coding agent (Cursor, Claude Code, etc.), I recommend pointing it to https://www.npmjs.com/package/built-in-ai-skills-md-agent-md. Most models were trained on the now-obsolete window.ai namespace, and this skill file helps them use the current APIs correctly.
it's something I feel is finally viable to combat at zero cost to the user.
This plus webmcp would allow it to serve as a form of automod too on websites that you authenticate with (imagine a world where your social media profile has an automod of its own powered locally. can use this to steer your feed or to mute/block/moderate as needbe). Even without WebMCP I have been working on making it autodetect html elements and extract UGC (comments/threads..etc) automatically to moderate (since my initial tests with a small group found some websites with frequent UI changes would break if hardcoded or if they did a lot of AB testing)
Even better, the concept would allow you to also use it to hide certain spoilers (imagine sports or new movies that just came out and you want to not have to hide away from all socials).
didn't find any contacts on your new HN account, but in a few weeks will be able to reach out to you with it fleshed out. :)
We have a community of nearly 14k that we will distribute this to
Also what is toxic to one person is not toxic to another depending on their subjective choices. How will you solve for this without everyone just seeing what they want to see even if reality is not like that? I feel that will just enhance the problems of social media than reduce it.
It kind of falls apart when you start to think of edge cases rather than "hey this tool will keep morons off my feed!" mentality
I agree that what is toxic to one person is not toxic to another, but think that this is largely because many people enjoy seeing their perceived enemies attacked. In other words, it comes down to a viewpoint bias: attacking my group/viewpoint is toxic, while attacking other groups/viewpoints is good and noble.
My ideal is that a de-snarkifier would be strongly instructed to be viewpoint neutral; to filter based on whether the comment is being respectful, without regard to the views being expressed.
My idea would backfire if other people program their filter to reinforce their own biases by favoring content that they agree with and creating or amplifying personal attacks on their perceived enemies. That would be unfortunate, but ultimately we can only control what we do; each person gets to make their own decision.
But keep in mind the actual experience for users is not great; the model download is orders of magnitude greater than downloading the browser itself, and something that needs to happen before you get your first token back. That's unfixable until operating systems start reliably shipping their own prebaked models that an API like this could plug into.
Maybe the next big thing will be some software subscription premium offers with a bunch of 5090s as an extra.
What's a bigger issue is that the models on most standard PCs are both tiny and slow. I was going to try using the Prompt API to change the text of (infocom) text adventures on the fly. But for many PCs, this will currently be too slow to be feasible.
With MoE models, you could fetch expert layers from the network on demand by issuing HTTP range queries for the corresponding offset, similar to how bittorrent downloads file chunks from multiple hosts. You'd still have to download shared layers, but time to first token would now be proportional to active-size rather than total-size. Of course this wouldn't be totally "offline" inference anymore, but for a web browser feature that's not a key consideration.
This is a common misconception, probably due to the unfortunate naming. Expert layers are not "expert" at any particular subject, and active-size only refers to the activated layers per token. You'd still need all (or most of all) the layers for any particular query, even if some layers have a very low chance of being activated.
All in all, you'd be better off with lazy loading the entire model, at least you'd know you have the capability to run inference from then on.
Here's to hoping that that dystopia will never happen.
fantastic!
> the model download is orders of magnitude greater than downloading the browser itself, and something that needs to happen before you get your first token back
sure but does this mean the model is lazily downloaded? that is, if I used this and I am the first time the model was called, the user would be waiting until the model was downloaded at that point?
that sounds like a horrible user experience - maybe chrome reduces the confusion by showing a download dialog status or similar?
also, any idea what the on disk impact is?
So it's once per browser, not once per site.
You can track the download state yourself and pop whatever UI you want.
If it turns out useful enough I'm sure browsers will just start including it as (perhaps optional?) part of installation.
This just exposes an API for sites to use. If they wanted to do the types of spying you're cynically suggesting, they could just add it without an API and you'd be none the wiser. Chrome contains closed source components so you wouldn't even know.
It’s a totally valid question, and transparency is the only way this can work. On-device processing is an important core design goal of these APIs.
There are NO logs of the input / output interactions sent to any server, not even for training purposes. The only metrics we have are on performance, stability, and other generic API usage signals like any other APIs. These are all controlled by existing user preferences in Chrome.
It would actually be pretty interesting to see if its possible to decentralize the compute to generate something useful from a larger prompt broken down and sent to a bunch of browsers using a subagent pattern or something like RLM, each working on a smaller part of the prompt
Plus even if you really wanted to do that, WebGPU exists and has for a while right?
> This feels like a lot of work for low reward
Low per-device reward combined with a high user count - either by large legitimate players or by botnets - has been the monetisation strategy of most online enterprises.There's a lot of ways this API could go, e.g. more powerful models eventually, or perhaps integration with cloud models. For example, I could see Google trying to default Gemini as the model for users signed into Chrome
Edit: simple example is a spam bot
- Gemini Nano-1: 46% MMLU, 1.8B
- Gemini Nano-2: 56% MMLU, 3.25B
- Gemma4 E2B: 60.0% MMLU, 2.3B
- Gemma4 E4B: 69.4% MMLU, 4.5B
Sources:
- https://huggingface.co/google/gemma-4-E2B-it
- https://android-developers.googleblog.com/2024/10/gemini-nan...
Note that the article here was last updated 2025-09-21, and as of that time it was already on Gemini Nano 3.
Yes; "With the Prompt API, you can send natural language requests to Gemini Nano in the browser."
And do you guys communicate between other browsers when doing something like this to try to settle on something common? I don't mean W3C but practically, it's a small world after all.
The target usage for the prompt API is anything that would benefit from the general capabilities of a language model, and can't be encompassed by the more-specific APIs for summarization/writing/rewriting. Realistic use cases currently are things like sentiment analysis, keyword extraction, etc. I have a number of ideas on how to integrate it into my current retirement project around Japanese flashcards, e.g. generating example sentences. If the small (~10 GiB) model class keeps getting smarter, the class of things possible on-device in this way gets larger and larger over time.
We definitely communicated with other browsers. There were the standing WebML Community Group meetings at the W3C every few weeks. There were async discussions like https://github.com/mozilla/standards-positions/issues/1213 and https://github.com/WebKit/standards-positions/issues/495 . (Side note, I love the contrast between Mozilla's helpful in-depth feedback and WebKit's... less helpful feedback.) There was also a bit of a debacle where the W3C Technical Architecture Group tried to give "feedback" but the feedback ended up being AI-generated slop... https://github.com/w3ctag/design-reviews/issues/1093 .
But overall, yeah, the goal with the prompt API, as with all web APIs, is to put something out there for discussion as early as possible, and get input from the broad community, especially including other browsers, to see if it's something that they are interested in collaborating on. https://www.chromium.org/blink/guidelines/web-platform-chang... (which I also wrote) goes into how the Chromium project thinks about such collaboration in general.
While many AI integrations are focused on text communication / chat style. A lot of software benefits from non-text interfaces.
I believe at some point OSes and browsers should provide an API to manage models so you'll have access to on-device/remote ones with a simplified interface for the app. Making something standardized that is cross-platform would be fantastic. It also needs to be on mobile devices, so the players that can easily make it happen are mostly Apple and Google. (Meta will follow or vice-versa I guess)
Key-point: it shouldn't be exclusive to promoted models.
(1) https://developer.apple.com/documentation/foundationmodels So the app would be able to query and get the right model(s).
I'm not particularly happy about that outcome as I wish we had more locally run AI models for reasons of privacy and efficiency, so this is more just a warning that at present there are some severe tradeoffs.
1 - https://sendcheckit.com/blog/ai-powered-subject-line-alterna...
Thanks for the write-up and the comparison, but more importantly for using the API in production!
You’re highlighting the "state of the art" gap we’re working to close. Cloud models will always have the advantage of massive parameter counts, but our bet is that for a huge class of simpler or high-volume tasks, the upsides of on-device (e.g. zero-cost, permission-less start with no quotas/infra, network-resilience, privacy) make it a compelling trade-off.
The models have been getting better at a rapid clip, and the team is heads-down on optimizing performance and reliability. To that end, we're always grateful for feedback. If you hit specific bugs, crashes, or quality regressions, filing a report with repro steps is the best way to help us improve. You can file those on crbug.com under the "Chromium > Blink > AI" component.
The parameters are not part of this initial release but can be added back with the origin trial you discovered.
It's a tiny script that looks up the rss feed and uses the content to generate summaries; quite a nice fit with our static site. Sometime I'd like to extend it to ask different questions about the content.
If you want to do anything interesting you need transformers.js and a decent mode. Qwen 0.9B is where things start working usefully
I haven't pushed out a full version[1] which uses ducklake-wasm + this to make a completely local SQL answering machine, but for now all it does is retype prompts in the browser.
I agree with others this fits better in the OS, or hey maybe Apple sells a time-machine sort of NAS with neural engine chips.
see: https://github.com/Arthur-Ficial/fenster
and: https://news.ycombinator.com/item?id=47923692
hard work so far
Who‘s gonna make it call tools?
Or a LocalNet API that integrates with trusted hardware devices on your local network. As a trial (Chrome beta programme — strictly limited but here’s 3x signup links to share with your friends) you can adjust your Google Next Mini underfloor heating directly from Chrome!
Or a DirectCast API that lets you stream <video> elements to a device of your choice even over a VPN. As a Chrome trial, you can use your Google Cloud account to stream directly from YouTube Premium to any linked Google Chromecast devices you own!