undefined | Better HN

0 pointsandroiddrew3d ago0 comments

Could you share what you are using for inference and how you are running it? I have a 64G VRAM/128G system RAM setup.

0 comments

Most people are using something in the llama family for inference. Llama server is my go to. Unsloth guides describe how to configure inference for your model of choice.

j / k navigate · click thread line to collapse

0 pointsandroiddrew3d ago0 comments

Could you share what you are using for inference and how you are running it? I have a 64G VRAM/128G system RAM setup.

0 comments

sosodev3d ago

Most people are using something in the llama family for inference. Llama server is my go to. Unsloth guides describe how to configure inference for your model of choice.

j / k navigate · click thread line to collapse