undefined | Better HN

0 pointsLoganDark2y ago0 comments

> Would love to hear folks inference setups, the A100 was... not fast - but I didn't spend any time trying to make it fast.

What do you mean? I get something like 25 tokens per second on an RTX 3060 12G. Try using quantized weights, the full-size ones are only for training.

0 comments

1 comments · 1 top-level

lumost2y ago

Aye was on quantized weights using gptq.

1 more reply

j / k navigate · click thread line to collapse