Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
undefined | Better HN
0 points
throwdbaaway
2mo ago
0 comments
Share
Using ik_llama.cpp to run a 27B 4bpw quant on a RTX 3090, I get 1312 tok/s PP and 40.7 tok/s TG at zero context, dropping to 1009 tok/s PP and 36.2 tok/s TG at 40960 context.
35B A3B is faster but didn't do too well in my limited testing.
0 comments
default
newest
oldest
ranger_danger
2mo ago
with regular llama.cpp on a 3070ti I get 60tok/s TG with the 9B model, it's quite impressive.
j
/
k
navigate · click thread line to collapse