undefined | Better HN

0 pointszone4111y ago0 comments

Yes, that's very fast. The same query on Groq, which is known for its fast AI inference, got 249 tokens/s, and 25 tokens/s on Together.ai. However, it's unclear what (if any) quantization was used and it's just a spot check, not a true benchmark.

https://www.zdnet.com/article/cerebras-did-not-spend-one-min...

0 comments

1 comments · 1 top-level

Tetraslam1y ago

Met them at an MIT event last week, they don't quantize any models.

j / k navigate · click thread line to collapse