undefined | Better HN

0 pointscyanydeez2y ago0 comments

a lot of the true AI value is context window size limited, not compute limited.

0 comments

Assuming 50 input tokens per second, you could still be waiting ten minutes for a full 32k token prompt.

What you are talking about is highly optimized inference using accelerators, batching and speculative decoding to achieve high throughout. Once you have that then compute is irrelevant except in terms of cost, but if all you have is a small consumer grade GPU you will be compute limited at the extreme limits of your context window.

cyanydeezOP2y ago

I'm taking about context in, not out. reports I have and the knowledge base I want answers from are 500-1000k tokens.

I don't need long answers, I need by site specific knowledge base

j / k navigate · click thread line to collapse

0 comments

imtringued2y ago

Assuming 50 input tokens per second, you could still be waiting ten minutes for a full 32k token prompt.

cyanydeezOP2y ago

I'm taking about context in, not out. reports I have and the knowledge base I want answers from are 500-1000k tokens.

I don't need long answers, I need by site specific knowledge base

j / k navigate · click thread line to collapse