Parallel Scaling Law for Language Models (opens in new tab)

(arxiv.org)

2 pointsanerli0y ago1 comments

1 comments

Qwen team shows how parallel streams of inference-time thinking tokens could be far more efficient than a serial stream.

Compared to scaling parameters alone, the same performance increase using their technique may be achieved with 22x less increase in memory and 6x less latency increase.

j / k navigate · click thread line to collapse

1 comments

anerliOP0y ago

Qwen team shows how parallel streams of inference-time thinking tokens could be far more efficient than a serial stream.

Compared to scaling parameters alone, the same performance increase using their technique may be achieved with 22x less increase in memory and 6x less latency increase.

j / k navigate · click thread line to collapse