Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
syntaxing
3y ago
0 comments
Save
Share
Supposedly the new update with GPU offloading will bring that up to 10 tokens per second! 1 token per second is painfully slow, that’s about 30s for a sentence.
0 comments
1 comments · 1 top-level
top
newest
oldest
int_19h
3y ago
10 tokens / second is what you get running llama-30b entirely on the GPU. A 65b model will be slower than that since there's more compute involved.
j
/
k
navigate · click thread line to collapse