Well, for one, Anthropic mostly uses Google TPUs and Amazon Inferentia2 chips, not Nvidia NVL72s. That's because... Google and Amazon are major investors in Anthropic.
Secondly, you missed out the entire AI industry trend in 2024-2025, where the failure of the GPT-4.5 pretrain run and the pullback from GPT-4 to GPT-4 Turbo to GPT-4o (each of which are smaller in parameter count). GPT-4 is 1.6T, GPT-4 Turbo is generally considered 1/2 to 1/4 that, and GPT-4o is even smaller (details below)
Thirdly, we KNOW that GPT-4o runs on Microsoft Maia 100 hardware with 64GB each chip, which gives a hard limit on the size of GPT-4o and tells us that it's a much smaller distilled version of GPT-4. Microsoft says each server has 4 Maia 100 chips and 256GB total. We know Microsoft uses Maia 100s to serve GPT-4o for Azure! So we know that quantized GPT-4o fits in 256GB, and GPT-4 does not fit. It's not possible to have GPT-4o be some much larger model that requires a large cluster to serve- that would drop performance below what we see in Azure.
Fourthly, it is not publicly KNOWN, but leaks say that GPT-4o is 200b-300b in size, which also tells us that running GPT-4 sized models is nonsense. This matches the information from Microsoft Maia servers above.
Fifthly, OpenAI Head of Research has since confirmed that o1, o3, GPT-5 use the same pretrain run as 4o, so they would be the same size.[1] That means GPT-5 is not some 1T+ model! Semianalysis confirms that the only pretrain run since 4o is 4.5, which is a ~10T model but everyone knows is a failed run.
Sixthly, Amazon Bedrock and Google Vertex serves models at approximately similar memory bandwidths when calculating tokens/sec, giving 4900GB/sec for Google Vertex. Opus 4.5 aligns very well with 100b of active params.
42 tps for Claude Opus 4.6 https://openrouter.ai/anthropic/claude-opus-4.6
143 tps for GLM 4.7 (32B active parameters) https://openrouter.ai/z-ai/glm-4.7
70 tps for Llama 3.3 70B (dense model) https://openrouter.ai/meta-llama/llama-3.3-70b-instruct
For GLM 4.7, that makes 143 * 32B = 4576B parameters per second, and for Llama 3.3, we get 70 * 70B = 4900B. There's calculations for Amazon Bedrock on the Opus 4.5 launch thread that compares it to gpt-oss-120b with similar conclusions.
Seventhly, Anthropic distilled Opus 4/4.1 to 4.5, which is why it runs ~3x faster than Opus 4 while costing 1/3 the price in terms of API fees.
Eightly, no respectable model has a sparsity below 3% these days- ridiculously low sparsity gives you Llama 4. Every single cutting edge model are around 3-5% sparsity. Knowing the active param count for Opus 4.5 gives you a very good estimate of total param count.
The entire AI industry is moving AWAY from multi-trillion-parameter models. Everything is about increasing efficiency with the amount of parameters you have, not hyperscaling like GPT-4.5 which was shown to be a bad way forward.
Nobody thinks Opus 4.5 is bigger than around 2T in size (so not 10T). Opus 4/4.1 may have been ~6T, but that's it. Any guess of 10T or above is patently ridiculous for both Opus 4/4.1 and Opus 4.5.
[1] https://x.com/petergostev/status/1995744289079656834