undefined | Better HN

0 pointsumangsh3y ago0 comments

30B fp16 takes ~500 ms/token on M2 Max 96GB. Interestingly, that's the same performance as 65B q4 quantized.

65B fp16 is ungodly slow, ~300,000 ms/token on the same machine.

0 comments

No comments yet.