Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
undefined | Better HN
0 points
lostmsu
1y ago
0 comments
Share
All of this is true only while no software is utilizing parallel inference of multiple LLM queries. The Macs will hit the wall.
0 comments
default
newest
oldest
ryao
1y ago
People interested in running multiple LLM queries in parallel are not people who would consider buying Apple Silicon.
int_19h
1y ago
There are other ways to parallelize even a single query for faster output, e.g. speculative decoding with small draft models.
j
/
k
navigate · click thread line to collapse