Sparse Llama: 70% Smaller, 3x Faster, Full Accuracy (opens in new tab)

(cerebras.net)

40 pointspanabee2y ago1 comments

1 comments

1 comments · 1 top-level

Specifically this is Llama2, not Llama3, was a bit disappointed from that. Also wasn't totally clear from the article - will this actually increase GPU inference speed / decrease GPU memory usage?

j / k navigate · click thread line to collapse