undefined | Better HN

0 pointsjazzyjackson1y ago0 comments

I'm returning my 96GB m2 max. It can run unquantized llama 3.3 70B but tokens per second is slow as molasses and still I couldn't find any use for it, just kept going back to perplexity when I actually needed to find an answer to something.

0 comments

1 comments · 1 top-level

Tepix1y ago

Interesting. You're using the FP8 version i'm guessing? How many tokens/s are you using and which software? MLX?

j / k navigate · click thread line to collapse