Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
LLM.int8(): 8-Bit Matrix Multiplication for Transformers at Scale
(opens in new tab)
(arxiv.org)
7 points
ofirpress
3y ago
1 comments
Save
Share
1 comments
1 comments · 1 top-level
top
newest
oldest
ofirpress
OP
3y ago
Cool new efficient inference method that saves 2x memory and does not degrade performance for large language models!
More from the author about this at:
https://twitter.com/Tim_Dettmers/status/1559892888326049792
j
/
k
navigate · click thread line to collapse