Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
FergusArgyll
2y ago
0 comments
Save
Share
You shouldn't have to quantize it that much, maybe you're running a lot of other programs while running inference?
Also, try using pure llama.cpp, AFAIK it's the least possible overhead
0 comments
1 comments · 1 top-level
top
newest
oldest
regularfry
2y ago
Getting more value out of phi-2-sized models is where you really want to be on lower-end M1's.
j
/
k
navigate · click thread line to collapse