Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
Catloafdev
8h ago
0 comments
Save
Share
Nobody runs unquantized, there's literally no reason to. Q8 would be the largest anyone actually runs on consumer hardware for inference.
0 comments
2 comments · 1 top-level
top
newest
oldest
bityard
4h ago
· 1 in thread
Halving the precision of the weights is not a free lunch...
Catloafdev
OP
2h ago
Q8 is virtually lossless. The quantization is much more noticeable around Q4 and below. FP16->Q8 on consumer hardware is 2x the speed at ~99.99% the quality.
j
/
k
navigate · click thread line to collapse