Fable 5 pushed Gemma 4 to 255 tok/s on WebGPU (opens in new tab)

(xcancel.com)

48 pointskirubakaran9d ago22 comments

22 comments

17 comments · 6 top-level

freedomben9d ago· 6 in thread

More of a meta comment, but I really wish anthropic would say something about their plans for Fable. We're all just kind of left here floating and aimless, with no idea of what to expect

rst9d ago

They're kind of at the mercy of the US government on this, and the government seems to have them in the position you describe.

freedomben9d ago

Agreed, though it sounds like they could add KYC stuff and restore access for US citizens. I utterly hate that we're at that point and I think it's ludicrous for privacy and just common sense, but it would be nice to know if that's their plan or not for example. Or if their plan is to just wait for the government to decide on something, or if they're planning to sue, or whatever.

1 more reply

pornel9d ago

It's so great that the US is against AI regulation and gives corporations freedom to innovate /s

1 more reply

cyanydeez9d ago

some of us knew the cloud was unreliable and chose a better path.

aspenmartin9d ago

Well I would say you just chose ANOTHER sensible path with tradeoffs just like frontier API/subscriptions. Yes you get 100% control, cheaper inference, not having to stare at status.claude.com for like 2 hours per week, complete privacy (assuming local hosting or self hosting on servers).

But, you can’t get Fable level performance. OSS has reliably trailed the frontier by like 4-7 months for years now

1 more reply

jauntywundrkind9d ago

The US Government has demanded a solution to the Halting Problem squared and by George (Washington) these tinpot facsists are going to get what they demand.

Hard to imagine where things go from here. GLM-5.3 will be released some day, with Fable class capabilities, and the (MAGA) US government will still be faffing around in their alt-reality cinematic bullshitiverse.

mike_hearn9d ago· 3 in thread

That's very impressive. What's the best way to run these kernels natively on a Mac? I saw that there's a way to plug Claude into Apple's Foundation Models framework, and there's a CLI tool that can access models via that framework. It might be useful to have something so fast and good available via a small CLI tool for various purposes, especially when connected with a small suite of tools I have for things like file editing, showing, simple agentic purposes etc.

thomspoon9d ago

Either ollama or omlx, both are pretty dang performant. Omlx lets you run Claude code locally though as long as you bootstrap it with the right model

mike_hearn8d ago

Omlx is really nice, thanks for the recommendation!

karussell9d ago

Why would you need Omlx? For speed up?

1 more reply

exabrial9d ago· 2 in thread

apologies for a dumb question, is this someone running fable5 on their own machine and it pushed to 255 tok/s? How is that possible (how did a person acquire the model?)

r_lee9d ago

I'm assuming it means that someone used Fable 5 to implement Gemma 4 in WebGPU and it performs at 255 tok/s

LoganDark9d ago

It literally says they used it before it was shut down.

nmfisher9d ago

It's not immediately clear, but this seems to be 250 tok/s on an M4 Max.

For comparison, the current agent swarm challenge on HF is at 508 tok/s on a A10G GPU:

https://huggingface.co/spaces/gemma-challenge/gemma-dashboar...

scotty799d ago

> It climbed to 84 tok/s, then hit a wall, insisting further optimization was impossible.

> Hours later, Anthropic rolled back invisible LLM development safeguards, and it hit 255 tok/s.

Wow. Limitnig access to models for other reasons than that you can't physically provide it should be a crime against humanity or the planet or something. So much immediate efficency left on the table for stupid reasons.

LoganDark9d ago

I miss Fable. It worked so well -- it was so confident and would actually make decisions on its own that I agreed with. Opus 4.8 feels so dumb now.

j / k navigate · click thread line to collapse

22 comments

17 comments · 6 top-level

freedomben9d ago· 6 in thread

More of a meta comment, but I really wish anthropic would say something about their plans for Fable. We're all just kind of left here floating and aimless, with no idea of what to expect

rst9d ago

They're kind of at the mercy of the US government on this, and the government seems to have them in the position you describe.

freedomben9d ago

1 more reply

pornel9d ago

It's so great that the US is against AI regulation and gives corporations freedom to innovate /s

1 more reply

cyanydeez9d ago

some of us knew the cloud was unreliable and chose a better path.

aspenmartin9d ago

But, you can’t get Fable level performance. OSS has reliably trailed the frontier by like 4-7 months for years now

1 more reply

jauntywundrkind9d ago

The US Government has demanded a solution to the Halting Problem squared and by George (Washington) these tinpot facsists are going to get what they demand.

mike_hearn9d ago· 3 in thread

thomspoon9d ago

Either ollama or omlx, both are pretty dang performant. Omlx lets you run Claude code locally though as long as you bootstrap it with the right model

mike_hearn8d ago

Omlx is really nice, thanks for the recommendation!

karussell9d ago

Why would you need Omlx? For speed up?

1 more reply

exabrial9d ago· 2 in thread

apologies for a dumb question, is this someone running fable5 on their own machine and it pushed to 255 tok/s? How is that possible (how did a person acquire the model?)

r_lee9d ago

I'm assuming it means that someone used Fable 5 to implement Gemma 4 in WebGPU and it performs at 255 tok/s

LoganDark9d ago

It literally says they used it before it was shut down.

nmfisher9d ago

It's not immediately clear, but this seems to be 250 tok/s on an M4 Max.

For comparison, the current agent swarm challenge on HF is at 508 tok/s on a A10G GPU:

https://huggingface.co/spaces/gemma-challenge/gemma-dashboar...

scotty799d ago

> It climbed to 84 tok/s, then hit a wall, insisting further optimization was impossible.

> Hours later, Anthropic rolled back invisible LLM development safeguards, and it hit 255 tok/s.

LoganDark9d ago

I miss Fable. It worked so well -- it was so confident and would actually make decisions on its own that I agreed with. Opus 4.8 feels so dumb now.

j / k navigate · click thread line to collapse