What do you mean? I get something like 25 tokens per second on an RTX 3060 12G. Try using quantized weights, the full-size ones are only for training.