Computer use in Gemini 3.5 Flash (opens in new tab)

(blog.google)

155 pointsswolpers7h ago97 comments

97 comments

38 comments · 14 top-level

airstrike6h ago· 13 in thread

Computer use is such a terrible idea. It's slow, insecure, error prone, expensive.

I guess if you're trying to get people to tokenmaxx it may look like a valid strategy, but ain't no way this will be delightful to users.

I think it's a symptom of just not understanding how LLMs should interface with the OS because we're still in their early days.

Eventually there'll be an iPhone moment for the ergonomics of LLM usage outside of coding

gdudeman5h ago

Computer use is a great idea. It gets the job done when nothing else will.

If you're a person trying to get their job done at a big company, but half your job is in 1-2 proprietary tools or is stuck behind an API you can't program against, computer use can allow you, a non-techie, to do your job more efficiently.

I think it's an awesome way to circumvent gate keepers and the IT department to let people accomplish their goals.

uejfiweun4h ago

Yeah, it's not that computer use is the most theoretically optimal paradigm, but there's a reasonable case that given the constraints of modern software systems and how they're built, that it's the most realistically optimal paradigm.

thorum5h ago

The “correct”, elegant way for AI to interact with existing software would take decades and billions of dollars to build. Someone would have to do the hard work of building new APIs, solving decades of accessibility issues, etc.

Or you can show an AI screenshots and ask it where to click.

sarreph5h ago

I disagree if your application is networked. Most SaaS is built on RESTful APIs that can be converted trivially into interfaces / contracts for tool use.

chatmasta5h ago

So you can either wait for every application to do that, or at least make it possible for an LLM to do it… or you can make the LLM use a computer interface that works with every application by definition.

jubilanti4h ago

it takes decades and billions of dollars to develop APIs?

orbital-decay5h ago

Spreadsheet is such a terrible idea. It may look like a valid tool, but ain't no way it's delightful to users. Most of the time people need a database instead. Eventually there'll be an iPhone moment for this.

Meanwhile, the entire world economy:

api5h ago

It's great for testing and QA automation for UIs. It's also possibly good for the vision impaired.

orbital-decay4h ago

UI QA only works well if your model plausibly matches the average user behavior and/or real-world edge cases. These models are far from that, and they are much less random than you'd like them to be for fuzzing (mode collapse).

nzach5h ago

> Computer use is such a terrible idea. It's slow, insecure, error prone, expensive.

And yet having an agent able yo use a computer on your behalf is really useful.

Recently I gave a Nix OS vm to my hermes agent and it has been a good experience. I don't really care if destroy the machine I can just rollback to an earlier version, and for any meaningful data he creates for me I make sure he creates a repo, commit and pushes to my private Gitea instance.

dbbk5h ago

> And yet having an agent able yo use a computer on your behalf is really useful.

I honestly cannot think of a single use case

airstrike5h ago

> And yet having an agent able yo use a computer on your behalf is really useful.

It is, but there's no need for it to be viewing your screen, browsing websites and watching ads.

That stuff is for humans, not for LLMs.

nzach5h ago

Sure, I don't want an agent watching MY screen. That's why I gave him his own environment, and pretty quickly he discovered that you can open chrome and make it render to a framebuffer, this way he is able to 'view' the website. And apparently with this he is able to bypass a lot of 'anti-bot' measures.

satvikpendem6h ago· 9 in thread

There's still no MCP support in the Gemini app, which is very useful to get various pieces of info as a user just via chatting. For example I recently wanted to get an Airbnb and wanted to filter by specific criteria including house image analysis and Gemini couldn't do it so I had to do it in Codex.

anticorporate6h ago

Yeah, it seems like this is the biggest missing feature from the Gemini ecosystem.

If I can't connect MCP, there's really no selling point for me to use Gemini from my watch, car, smart speaker, etc. If I'm already bound to using my own front end, then I'm only evaluating Gemini as a model/API, at which point it has many competitors that may be cheaper or better fit for the task.

thejaycampbell6h ago

agreed... this is where they lost me too

mitchell_h5h ago

I'm fairly convinced Claude's strongest point is the app. AI users aren't anywhere near as mature or smart as youtube/hn would have folks believe. The claude app is amazing for bridging that gap.

dr_dshiv4h ago

Didn’t it take them like 2 days to build the first one?

dr_dshiv4h ago

Didn’t it take them like 2 days to build it?

tonyrice6h ago

This is why I don't always use the official Gemini Web app. Lately I've found that it's more useful to utilize a CLI. I'm looking forward to the day they add MCP in the web.

pregseahorses6h ago

Gemini CLi now requires antigravity subscription..

singingtoday5h ago

CLI doesn't work with my subscription..

solarkraft4h ago

They only fixed stopping the model mid-generation losing the entire session pretty recently.

The Gemini apps suck.

villgax6h ago· 2 in thread

Will it skip Ads lol

humblyCrazy6h ago

I looked at their demo and it does not

chatmasta5h ago

Better question might be will it skip recaptcha?

smallstepforman3h ago

Today I asked Gemini to extract a table from an PDF appendix and create C++ data table with its contents. After 15 or so iterations with corrections and new mistakes, it eventually gave up. I was floored when it said “I’m sorry, I cannot do this simple task, I’ve exceeded my error threshold and cannot do this task for you. My LLM prediction engine invents data instead of doing a simple data copy/reformat”.

Stunned to see that Gemini threw its digital arms in the air and gave up.

12 more replies

YuechenLi1h ago

So... has Google provided a Codex/Claude Code equivalent to Gemini yet? I would like to use Gemini for coding tasks, but that's kind of difficult to do as I don't even know how to get Gemini to even "clone this repo and read the code in it for static analysis", much less open PRs in repos.

ChatGPT/Codex can do it, Claude can do it, why can't Gemini?

And no, I don't mean going through Antigravity, and personally I'm wary about LLMs having unsupervised access on my computer without explicit policy, so I really think Google is putting the cart before the horse here.

3 more replies

mlmonkey6h ago

It's funny how in their own graph, https://storage.googleapis.com/gweb-uniblog-publish-prod/ima... Gemini 3.5 Flash is beat hands down by both Opus 4.8 and GPT 5.5, and yet the graph is drawn as if Gemini wins ... :-D

5 more replies

arjunchint1h ago

Pretty doubtful about computer use/screenshotting based approaches.

With Retriever AI, we construct custom accessibility trees to represent web pages and just switched over to using DeepSeek v4 Flash and its nearing 100x cost decrease.

We also had great success just reverse engineering the underlying APIs of websites and then writing code to hit them. This approach of using screenshots to take actions on a webpage to trigger the underlying network calls the website is making seems too naive.

1 more reply

ai_fry_ur_brain55m ago

I have basically unlimited access to every SOTA model and I opt for gemini flash 3.5 9/10 times I use an LLM.

Llms are mostly useless but when I do use them its with gemini. If they're going to waste my time 95% of the time, I might as well get it over with fast.

revolvingthrow5h ago

People using google’s models: am I holding it wrong or are the guardrails really overtuned?

I had the dubious pleasure of testing gemini of late and I kept running into refusals. How do I transfer a sim number from one provider to another? No. What should I consider when making backups on ntfs less prone to data loss and more bitrot resistant? No. Evaluate this piece of code? No.

I’m not sure if it’s cold feet from the mythos situation or what, but it reminds me of the dark days where you couldn’t use ai for much of anything. But then I go to chatgpt 5.5 and it does mostly everything I want outside of the usual cybersecurity boogeyman that you run into now and then.

8 more replies

fridder5h ago

I wonder if it will be better at building TUI's. It has been absolutely abysmal at interacting with them and building them

2 more replies

beastman826h ago

No UI like their competitors Claude CoWork or Codex. This is vaporware

knollimar5h ago

Where is 3.5 pro?

1 more reply

zuzululu5h ago

performance is quite impressive given that its 3x cheaper than 5.5

1 more reply

paganartifact2h ago

Who are these people talking about "agentic" stuff, and furthermore who are the people who can't stfu about "MCP"??

Literally 90%+ comments on HN personify their alleged use of AI in a way that is in NO WAY related to how the tool is really used.

Using LLMs for building software has NOTHING to do with those concepts. Nobody has "agents". That literally only exists in marketing. It's not even how it works.

AT ALL

Useless forum

j / k navigate · click thread line to collapse

97 comments

38 comments · 14 top-level

airstrike6h ago· 13 in thread

Computer use is such a terrible idea. It's slow, insecure, error prone, expensive.

I guess if you're trying to get people to tokenmaxx it may look like a valid strategy, but ain't no way this will be delightful to users.

I think it's a symptom of just not understanding how LLMs should interface with the OS because we're still in their early days.

Eventually there'll be an iPhone moment for the ergonomics of LLM usage outside of coding

gdudeman5h ago

Computer use is a great idea. It gets the job done when nothing else will.

I think it's an awesome way to circumvent gate keepers and the IT department to let people accomplish their goals.

uejfiweun4h ago

thorum5h ago

Or you can show an AI screenshots and ask it where to click.

sarreph5h ago

I disagree if your application is networked. Most SaaS is built on RESTful APIs that can be converted trivially into interfaces / contracts for tool use.

chatmasta5h ago

jubilanti4h ago

it takes decades and billions of dollars to develop APIs?

orbital-decay5h ago

Meanwhile, the entire world economy:

api5h ago

It's great for testing and QA automation for UIs. It's also possibly good for the vision impaired.

orbital-decay4h ago

nzach5h ago

> Computer use is such a terrible idea. It's slow, insecure, error prone, expensive.

And yet having an agent able yo use a computer on your behalf is really useful.

dbbk5h ago

> And yet having an agent able yo use a computer on your behalf is really useful.

I honestly cannot think of a single use case

airstrike5h ago

> And yet having an agent able yo use a computer on your behalf is really useful.

It is, but there's no need for it to be viewing your screen, browsing websites and watching ads.

That stuff is for humans, not for LLMs.

nzach5h ago

satvikpendem6h ago· 9 in thread

anticorporate6h ago

Yeah, it seems like this is the biggest missing feature from the Gemini ecosystem.

thejaycampbell6h ago

agreed... this is where they lost me too

mitchell_h5h ago

I'm fairly convinced Claude's strongest point is the app. AI users aren't anywhere near as mature or smart as youtube/hn would have folks believe. The claude app is amazing for bridging that gap.

dr_dshiv4h ago

Didn’t it take them like 2 days to build the first one?

dr_dshiv4h ago

Didn’t it take them like 2 days to build it?

tonyrice6h ago

This is why I don't always use the official Gemini Web app. Lately I've found that it's more useful to utilize a CLI. I'm looking forward to the day they add MCP in the web.

pregseahorses6h ago

Gemini CLi now requires antigravity subscription..

singingtoday5h ago

CLI doesn't work with my subscription..

solarkraft4h ago

They only fixed stopping the model mid-generation losing the entire session pretty recently.

The Gemini apps suck.

villgax6h ago· 2 in thread

Will it skip Ads lol

humblyCrazy6h ago

I looked at their demo and it does not

chatmasta5h ago

Better question might be will it skip recaptcha?

smallstepforman3h ago

Stunned to see that Gemini threw its digital arms in the air and gave up.

12 more replies

YuechenLi1h ago

ChatGPT/Codex can do it, Claude can do it, why can't Gemini?

3 more replies

mlmonkey6h ago

5 more replies

arjunchint1h ago

Pretty doubtful about computer use/screenshotting based approaches.

With Retriever AI, we construct custom accessibility trees to represent web pages and just switched over to using DeepSeek v4 Flash and its nearing 100x cost decrease.

1 more reply

ai_fry_ur_brain55m ago

I have basically unlimited access to every SOTA model and I opt for gemini flash 3.5 9/10 times I use an LLM.

Llms are mostly useless but when I do use them its with gemini. If they're going to waste my time 95% of the time, I might as well get it over with fast.

revolvingthrow5h ago

People using google’s models: am I holding it wrong or are the guardrails really overtuned?

8 more replies

fridder5h ago

I wonder if it will be better at building TUI's. It has been absolutely abysmal at interacting with them and building them

2 more replies

beastman826h ago

No UI like their competitors Claude CoWork or Codex. This is vaporware

knollimar5h ago

Where is 3.5 pro?

1 more reply

zuzululu5h ago

performance is quite impressive given that its 3x cheaper than 5.5

1 more reply

paganartifact2h ago

Who are these people talking about "agentic" stuff, and furthermore who are the people who can't stfu about "MCP"??

Literally 90%+ comments on HN personify their alleged use of AI in a way that is in NO WAY related to how the tool is really used.

Using LLMs for building software has NOTHING to do with those concepts. Nobody has "agents". That literally only exists in marketing. It's not even how it works.

AT ALL

Useless forum

j / k navigate · click thread line to collapse