gpt-oss-120b is over 600 tokens/s PP for all but one backend.
nemotron-3-super is at best 260 tokens/s PP.
Comparing token generation, it's again like 50 tokens/sec vs 15 tokens/sec
That really bogs down agentic tooling. Something needs to be categorically better to justify halving output speed, not just playing in the margins.