undefined | Better HN

0 pointsZambyte1y ago0 comments

The significant convenience benefits outweigh the higher TPS that vLLM offers in the context of my single machine homelab GPU server. If I was hosting it for something more critical than just myself and a few friends chatting with it, sure. Being able to just paste a model name into Open WebUI and run it is important to me though.

It is important to know about both to decide between the two for your use case though.

0 comments

4 comments · 1 top-level

Der_Einzige1y ago· 3 in thread

Running any HF model on vllm is as simple as pasting a model name into one command in your terminal.

ZambyteOP1y ago

What command is it? Because that was not at all my experience.

Der_Einzige1y ago

Vllm serve… huggingface gives run instructions for every model with vllm on their website.

1 more reply

iAMkenough1y ago

Had to build it from source to run on my Mac, and the experimental support doesn't seem to include these latest Gemma 3 QAT models on Apple Silicon.

j / k navigate · click thread line to collapse