undefined | Better HN

0 pointsjohndough16d ago0 comments

Opus 4.6 likely has in the order of 100B active parameters. OpenRouter lists the following throughput for Google Vertex:

    42 tps for Claude Opus 4.6 https://openrouter.ai/anthropic/claude-opus-4.6
    143 tps for GLM 4.7 (32B active parameters) https://openrouter.ai/z-ai/glm-4.7
    70 tps for Llama 3.3 70B (dense model) https://openrouter.ai/meta-llama/llama-3.3-70b-instruct

For GLM 4.7, that makes 143 * 32B = 4576B parameters per second, and for Llama 3.3, we get 70 * 70B = 4900B, which makes sense since denser models are easier to optimize. As a lower bound, we get 4576B / 42 ≈ 109B active parameters for Opus 4.6. (This makes the assumption that all three models use the same number of bits per parameter and run on the same hardware.)

0 comments

jychang16d ago

Yep, you can also get similar analysis from Amazon Bedrock, which serves Opus as well.

I'd say Opus is roughly 2x to 3x the price of the top Chinese models to serve, in reality.

j / k navigate · click thread line to collapse

0 pointsjohndough16d ago0 comments

Opus 4.6 likely has in the order of 100B active parameters. OpenRouter lists the following throughput for Google Vertex:

    42 tps for Claude Opus 4.6 https://openrouter.ai/anthropic/claude-opus-4.6
    143 tps for GLM 4.7 (32B active parameters) https://openrouter.ai/z-ai/glm-4.7
    70 tps for Llama 3.3 70B (dense model) https://openrouter.ai/meta-llama/llama-3.3-70b-instruct

0 comments

jychang16d ago

Yep, you can also get similar analysis from Amazon Bedrock, which serves Opus as well.

I'd say Opus is roughly 2x to 3x the price of the top Chinese models to serve, in reality.

j / k navigate · click thread line to collapse