undefined | Better HN

0 pointsint_19h3y ago0 comments

Unless you're doing training, is there much point in 8-bit for this model size? My understanding is that the larger the model, the less affected it is by quantization; for 65b, 4-bit gives you ~2% perplexity penalty over 8-bit.

0 comments

2 comments · 1 top-level

valine3y ago· 1 in thread

I don’t buy into perplexity as a good benchmark for the usefulness of an LLM. In my experience playing with many LLaMA models of various sizes and quantization levels, the higher bit models perform significantly better on complex questions.

int_19hOP3y ago

Perplexity is a rough metric for sure, but the non-linear dependency is also directly observable. I would definitely agree with 7b and 13b giving better results with 8-bit, but the difference with 30b is much more subtle.

It should also be noted that the method of quantization makes a big difference. In particular, if you were experimenting with llama.cpp, their original take on it was considerably inferior to GPTQ. And for the latter, parameters such as group size can also make a difference.

j / k navigate · click thread line to collapse