Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
boroboro4
1y ago
0 comments
Save
Share
They get more than this. For prefill we can get 70% matmul utilization, for generation less than this but we’ll get to >50 too eventually.
0 comments
1 comments · 1 top-level
top
newest
oldest
bjornsing
1y ago
And even when you get to 100% utilization you’ll still be wasting a crazy amount of gates / die area, plus you’re paying the Nvidia tax. There is no way in hell that will go on for 10 years if we have good AGI but inference is too expensive.
j
/
k
navigate · click thread line to collapse