But, you can’t get Fable level performance. OSS has reliably trailed the frontier by like 4-7 months for years now
Hard to imagine where things go from here. GLM-5.3 will be released some day, with Fable class capabilities, and the (MAGA) US government will still be faffing around in their alt-reality cinematic bullshitiverse.
For comparison, the current agent swarm challenge on HF is at 508 tok/s on a A10G GPU:
https://huggingface.co/spaces/gemma-challenge/gemma-dashboar...
> Hours later, Anthropic rolled back invisible LLM development safeguards, and it hit 255 tok/s.
Wow. Limitnig access to models for other reasons than that you can't physically provide it should be a crime against humanity or the planet or something. So much immediate efficency left on the table for stupid reasons.