LLM.int8(): 8-Bit Matrix Multiplication for Transformers at Scale (opens in new tab)

(arxiv.org)

7 pointsofirpress3y ago1 comments

1 comments

Cool new efficient inference method that saves 2x memory and does not degrade performance for large language models!

j / k navigate · click thread line to collapse

Cool new efficient inference method that saves 2x memory and does not degrade performance for large language models!

j / k navigate · click thread line to collapse