Claude vs. Gemini: Testing on 1M Tokens of Context (opens in new tab)

(every.to)

145 pointsdshipper10mo ago44 comments

44 comments

32 comments · 9 top-level

HackerThemAll10mo ago· 6 in thread

What people seem to miss very hard is that they get interactive chat mode of all the models, including the best and newest (Gemini 2.5 Pro, 2.5 Flash, 2.5 Flash Lite and older) totally for free. I mean when working from chat at https://aistudio.google.com/ the entire 1M context window and all is totally free of charge. You really get a very good AI for nothing.

https://i.imgur.com/pgfRrZY.png

7thpower10mo ago

Funny you mention this, I literally just got done loading the context window of AI studio up for an hour doing some prototyping and then was frustrated when I couldn’t see where I was at from billing (knew it couldn’t be that much, but I still like to know).

I assumed because I’m on paid tiers it would still cost behind a certain usage amount, but I guess not.

cma10mo ago

Can you opt out of them training on your data in that free tier?

relatedtitle10mo ago

If you have cloud billing enabled you can still use it for free and they say they don't train on it. https://ai.google.dev/gemini-api/docs/billing#paid-api-ai-st...

matesz10mo ago

Geminis free tier allows maybe 5 messages on average, for 2.5 pro at least and this is not usable.

I’m using Claude Pro for daily driver and Gemini / ChatGPT free tiers.

rat998810mo ago

> Geminis free tier allows maybe 5 messages on average, for 2.5 pro at least and this is not usable.

Not on ai studio.

1 more reply

HackerThemAll10mo ago

You are clearly confirming my comment above.

2 more replies

irthomasthomas10mo ago· 6 in thread

So sonnet-4 is faster than gemini-2.5-flash at long context. That is surprising. Especially since Gemini runs on those fast TPUS.

curl-up10mo ago

Note that (in the first test, the only one where output length is reported), Gemini Pro returned more than 3x the amount of text, at less than 2x the amount of time. From my experience with Gemini, that time was probably mainly spent on thinking, length of which is not reported here. So looking at pure TPS of output, Gemini is faster, but without clear info on the thinking time/length, it's impossible to judge.

jbellis10mo ago

if they left them both on defaults, flash is thinking-by-default and sonnet 4 is no-thinking-by-default

bitpush10mo ago

> Claude’s overall response was consistently around 500 words—Flash and Pro delivered 3,372 and 1,591 words by contrast.

It isnt clear from the article whether the time they quote is time-to-first-token or time to completion. If it is latter, then it makes sense why gemini* would take longer even with similar token throughput.

lugao10mo ago

Anthropic also uses TPUs for inference.

irthomasthomas10mo ago

Do they rent them from Google? Or are they a different brand?

1 more reply

netdur10mo ago

output tokens must be generated in order (autoregressive decoding), inputs don’t have that constraint, so prefill is parallel, with stronger kernels, KV-cache handling, and batching, Claude can outrun Gemini.

koakuma-chan10mo ago· 5 in thread

I really doubt you can fit all Harry Potter books in 1M tokens.

PeterStuer10mo ago

The series is 1,084,170 words. At let's say 1.4 tokens per word, this would not fit, but it is getting close.

magicalhippo10mo ago

How do they do if you test[1] them for attention deficit disorder?

[1]: https://www.imdb.com/title/tt0766092/quotes/?item=qt1440870

koakuma-chan10mo ago

It's 2M tokens for Gemini.

1 more reply

gcr10mo ago

The entire HP series is about one million words.

koakuma-chan10mo ago

Harry Potter and the Order of Phoenix alone is 400K tokens.

2 more replies

arnaudsm10mo ago· 3 in thread

https://archive.is/sb7D5

thefourthchime10mo ago

Does anyone else have trouble with the archive rendering of that? It seemed to also have the pop up.

sebastienbarre10mo ago

You can delete the div with id=subscribe-popup from the dev tools for a better view.

skarz10mo ago

Try one of these. They have the popup but you can dismiss it.

https://ghostarchive.org/archive/JlE5T

https://web.archive.org/web/20250812172455/https://every.to/...

daft_pink10mo ago· 2 in thread

i’m really curious how well they perform with a long chat history. i find that gemini often gets confused when the context is long enough and starts responding to prior prompts, using the cli or it’s gem chat window.

XenophileJKO10mo ago

From my experience. Gemini is REALLY bad about context blending. It can't keep track of what I said and what it said in a conversation under 200K tokens. It blends concepts and statements up, then refers to some fabricated hybrid fact or comment.

Gemini has done this in ways that I haven't seen in the recent or current generation models from OpenAI or Anthropic.

It really surprised me that Gemini performs so well in multi-turn benchmarks, given that tendency.

IanCal10mo ago

I’ve not experimented with the recent models for this but older Gemini models were awful for this - they’d lie about what I’d said or what was in their system prompt even with short conversations.

akomtu10mo ago· 1 in thread

IMO, a good contest between LLMs would be data compression. Each LLM is given the same pile of text, and then asked to create compact notes that fit into N pages of text. Then the original text is replaced with their notes and they need to answer a bunch of questions about the original text using the notes alone.

rafaelmn10mo ago

Summarization ? I'm pretty sure there are benchmarks for this because people used summarization to build search indexes (at least a few years ago when I was working on this they did and there were benchmarks)

dang10mo ago

Related ongoing thread:

Claude Sonnet 4 now supports 1M tokens of context - https://news.ycombinator.com/item?id=44878147 - Aug 2025 (160 comments)

sm110010mo ago

I built a tool that lets you prompt Gemini and Claude at the same time so you can compare their answers side by side. You should check it out : www.tantyai.com

ozbonus10mo ago

Mess o youxwh to yt h!

j / k navigate · click thread line to collapse

44 comments

32 comments · 9 top-level

HackerThemAll10mo ago· 6 in thread

https://i.imgur.com/pgfRrZY.png

7thpower10mo ago

I assumed because I’m on paid tiers it would still cost behind a certain usage amount, but I guess not.

cma10mo ago

Can you opt out of them training on your data in that free tier?

relatedtitle10mo ago

If you have cloud billing enabled you can still use it for free and they say they don't train on it. https://ai.google.dev/gemini-api/docs/billing#paid-api-ai-st...

matesz10mo ago

Geminis free tier allows maybe 5 messages on average, for 2.5 pro at least and this is not usable.

I’m using Claude Pro for daily driver and Gemini / ChatGPT free tiers.

rat998810mo ago

> Geminis free tier allows maybe 5 messages on average, for 2.5 pro at least and this is not usable.

Not on ai studio.

1 more reply

HackerThemAll10mo ago

You are clearly confirming my comment above.

2 more replies

irthomasthomas10mo ago· 6 in thread

So sonnet-4 is faster than gemini-2.5-flash at long context. That is surprising. Especially since Gemini runs on those fast TPUS.

curl-up10mo ago

jbellis10mo ago

if they left them both on defaults, flash is thinking-by-default and sonnet 4 is no-thinking-by-default

bitpush10mo ago

> Claude’s overall response was consistently around 500 words—Flash and Pro delivered 3,372 and 1,591 words by contrast.

lugao10mo ago

Anthropic also uses TPUs for inference.

irthomasthomas10mo ago

Do they rent them from Google? Or are they a different brand?

1 more reply

netdur10mo ago

koakuma-chan10mo ago· 5 in thread

I really doubt you can fit all Harry Potter books in 1M tokens.

PeterStuer10mo ago

The series is 1,084,170 words. At let's say 1.4 tokens per word, this would not fit, but it is getting close.

magicalhippo10mo ago

How do they do if you test[1] them for attention deficit disorder?

[1]: https://www.imdb.com/title/tt0766092/quotes/?item=qt1440870

koakuma-chan10mo ago

It's 2M tokens for Gemini.

1 more reply

gcr10mo ago

The entire HP series is about one million words.

koakuma-chan10mo ago

Harry Potter and the Order of Phoenix alone is 400K tokens.

2 more replies

arnaudsm10mo ago· 3 in thread

https://archive.is/sb7D5

thefourthchime10mo ago

Does anyone else have trouble with the archive rendering of that? It seemed to also have the pop up.

sebastienbarre10mo ago

You can delete the div with id=subscribe-popup from the dev tools for a better view.

skarz10mo ago

Try one of these. They have the popup but you can dismiss it.

https://ghostarchive.org/archive/JlE5T

https://web.archive.org/web/20250812172455/https://every.to/...

daft_pink10mo ago· 2 in thread

XenophileJKO10mo ago

Gemini has done this in ways that I haven't seen in the recent or current generation models from OpenAI or Anthropic.

It really surprised me that Gemini performs so well in multi-turn benchmarks, given that tendency.

IanCal10mo ago

akomtu10mo ago· 1 in thread

rafaelmn10mo ago

dang10mo ago

Related ongoing thread:

Claude Sonnet 4 now supports 1M tokens of context - https://news.ycombinator.com/item?id=44878147 - Aug 2025 (160 comments)

sm110010mo ago

I built a tool that lets you prompt Gemini and Claude at the same time so you can compare their answers side by side. You should check it out : www.tantyai.com

ozbonus10mo ago

Mess o youxwh to yt h!

j / k navigate · click thread line to collapse