For an 8B parameter model.
Opus is estimated at 500B-2T parameters. At that scale you’re past reticle limits and need HBM and multi-die packaging, which means you’ve essentially built an inference ASIC (like Groq or Etched) rather than something categorically cheaper than GPUs. The “burned into silicon” advantage mostly evaporates at frontier scale.