undefined | Better HN

0 pointshodgehog113mo ago0 comments

That's really fascinating. Every real world use case I've tried on Gemini (especially math-related) absolutely slaughtered the performance of ChatGPT in speed and quality, not even close. As an Android user, the Gemini app is also far superior, since the ChatGPT app still doesn't properly display math equations, among plenty of other bugs.

0 comments

dudeinhawaii3mo ago

I have to agree with you but I'll remain a skeptic until the preview tag is dropped. I found Gemini 2.5 Pro to be AMAZING during preview and then it's performance and quality unceremoniously dropped month after month once it went live. Optimizations in favor of speed/costs no doubt but it soured me on jumping ship during preview.

Anthropic pulled something similar with 3.6 initially, with a preview that had massive token output and then a real release with barely half -- which significantly curtails certain use cases.

That said, to-date, Gemini has outperformed GPT-5 and GPT5.1 on any task I've thrown at them together. Too bad Gemini CLI is still barely useful and prone to the same infinite loop issues that have plagued it for over a year.

I think Google has genuinely released a preview of a model that leapfrogs all other models. I want to see if that is what actually makes it to production before I change anything major in my workflows.

verdverm3mo ago

It's generally anecdotal and vibes when people make claims that some AI is better than another for things they do. There are too many variables and not enough eval for any of it to hold water imo. Personal preferences, experience, brand loyalty, and bias at play too

it's contemporary vim vs emacs at this point

hodgehog11OP3mo ago

I get what you're saying because this is typically true (this is a strong motivator for my current research) but I don't think it applies here and OpenAI seems to agree with me. Some cases are clear: GPT-5 is clearly better than Llama 3 for example. If there is a sizeable enough difference across virtually all evals, it is typically clear that one LLM is a stronger performer than another.

Experiences aside, Gemini 3 beats GPT-5 on enough evals that it seems fair to say that it is a better model. This appears in line with public consensus, with a few exceptions. Those exceptions seem to be centered around search.

kristofferR3mo ago

Try doing some more casual requests.

When I asked both ChatGPT 5.1 Extended Thinking and Gemini 3 Pro Preview High for best daily casual socks both responses were okay and had a lot of the same options, but while the ChatGPT response included pictures, specs scraped from the product pages and working links, the Gemini response had no links. After asking for links, Gemini gave me ONLY dead links.

That is a recurring experience, Gemini seems to be supremely lazy to its own detriment quite often.

A minute ago I asked for best CR2032 deal for Aqara sensors in Norway, and Gemini recommended the long discontinued IKEA option, because it didn't bother to check for updated information. ChatGPT on the other hand actually checked prices and stock status for all the options it gave me.

tootie3mo ago

I would further argue the apps are all like 99% the same. And also work just fine through a browser without installing anything

bdhtu3mo ago

What do you mean? It renders LaTex fine on Android.

hodgehog11OP3mo ago

Some LaTeX, but not all, especially for larger equations. I will admit it has gotten a lot better in recent updates, since it seemed thoroughly broken for quite a while in its early days.

null_deref3mo ago

I had a problem where ChatGPT rendered math to me from right to left. Sure thing YMMV

deaux3mo ago

You're using paid ChatGPT, set to 5.1 with Thinking?

hu33mo ago

Not op but yes and yes.

I pay for Claude, Gemini and ChatGPT.

Gemini 3 replaced ChatGPT for me and if things don't change I'll cancel ChatGPT for lack of usefulness.

croes3mo ago

One might think that benchmarks do not say much about individual usage and that an objective assessment of the performance of AIs is difficult.

At least, thanks to the hype, RAM and SSDs are becoming more expensive, which eats up all the savings from using AI and the profits from increased productivity /s?

j / k navigate · click thread line to collapse

0 comments

dudeinhawaii3mo ago

Anthropic pulled something similar with 3.6 initially, with a preview that had massive token output and then a real release with barely half -- which significantly curtails certain use cases.

verdverm3mo ago

it's contemporary vim vs emacs at this point

hodgehog11OP3mo ago

kristofferR3mo ago

Try doing some more casual requests.

That is a recurring experience, Gemini seems to be supremely lazy to its own detriment quite often.

tootie3mo ago

I would further argue the apps are all like 99% the same. And also work just fine through a browser without installing anything

bdhtu3mo ago

What do you mean? It renders LaTex fine on Android.

hodgehog11OP3mo ago

Some LaTeX, but not all, especially for larger equations. I will admit it has gotten a lot better in recent updates, since it seemed thoroughly broken for quite a while in its early days.

null_deref3mo ago

I had a problem where ChatGPT rendered math to me from right to left. Sure thing YMMV

deaux3mo ago

You're using paid ChatGPT, set to 5.1 with Thinking?

hu33mo ago

Not op but yes and yes.

I pay for Claude, Gemini and ChatGPT.

Gemini 3 replaced ChatGPT for me and if things don't change I'll cancel ChatGPT for lack of usefulness.

croes3mo ago

One might think that benchmarks do not say much about individual usage and that an objective assessment of the performance of AIs is difficult.

At least, thanks to the hype, RAM and SSDs are becoming more expensive, which eats up all the savings from using AI and the profits from increased productivity /s?

j / k navigate · click thread line to collapse