undefined | Better HN

0 pointszozbot2341mo ago0 comments

Whether something is "impractical" depends on your expectations. High-latency unattended inference is definitely viable, even though it doesn't align much with what's being run in hyperscale datacenters.

0 comments

1 comments · 1 top-level

dns_snek1mo ago

I'd like to meet the person who's been using a 1 token/second system as their primary LLM for at least a few weeks. Anyone?

I think 1 token/second is optimistic here - and even then it's over 11 days per million tokens.

j / k navigate · click thread line to collapse