Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
umangsh
3y ago
0 comments
Save
Share
30B fp16 takes ~500 ms/token on M2 Max 96GB. Interestingly, that's the same performance as 65B q4 quantized.
65B fp16 is ungodly slow, ~300,000 ms/token on the same machine.
0 comments
No comments yet.