Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
verdverm
12d ago
0 comments
Save
Share
There is a bug in llama-cpp for qwen/gemma models, use vLLM instead
0 comments
2 comments · 1 top-level
top
newest
oldest
pdyc
12d ago
· 1 in thread
what bug and it affects what?
verdverm
OP
11d ago
it's a prompt cache invalidation bug that causes all input to be reprocessed instead of getting preloaded
There are other reasons to prefer vllm to llama-cpp as well
j
/
k
navigate · click thread line to collapse