undefined | Better HN

0 pointsverdverm12d ago0 comments

There is a bug in llama-cpp for qwen/gemma models, use vLLM instead

0 comments

2 comments · 1 top-level

pdyc12d ago· 1 in thread

what bug and it affects what?

verdvermOP11d ago

it's a prompt cache invalidation bug that causes all input to be reprocessed instead of getting preloaded

There are other reasons to prefer vllm to llama-cpp as well

j / k navigate · click thread line to collapse