undefined | Better HN

0 pointslostmsu1y ago0 comments

All of this is true only while no software is utilizing parallel inference of multiple LLM queries. The Macs will hit the wall.

0 comments

ryao1y ago

People interested in running multiple LLM queries in parallel are not people who would consider buying Apple Silicon.

There are other ways to parallelize even a single query for faster output, e.g. speculative decoding with small draft models.

j / k navigate · click thread line to collapse

ryao1y ago

People interested in running multiple LLM queries in parallel are not people who would consider buying Apple Silicon.

There are other ways to parallelize even a single query for faster output, e.g. speculative decoding with small draft models.

j / k navigate · click thread line to collapse