Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
Sparse Llama: 70% Smaller, 3x Faster, Full Accuracy
(opens in new tab)
(cerebras.net)
40 points
panabee
2y ago
1 comments
Save
Share
1 comments
1 comments · 1 top-level
top
newest
oldest
free_bip
2y ago
Specifically this is Llama2, not Llama3, was a bit disappointed from that. Also wasn't totally clear from the article - will this actually increase GPU inference speed / decrease GPU memory usage?
j
/
k
navigate · click thread line to collapse