undefined | Better HN

0 pointsgrepfru_it9mo ago0 comments

I am curious the need for 70 t/sec?

0 comments

Waiting minutes for your call to succeed is too frustrating?

Depends entirely on the use case. Not every LLM workflow is a chatbot

no, but if you're not latency sensitive you should probably be using DeepSeek v3 (cheaper than flash, significantly smarter)

High concurrency voice AI systems.

Why are you self hosting that?

j / k navigate · click thread line to collapse

Aeolun9mo ago

Waiting minutes for your call to succeed is too frustrating?

Depends entirely on the use case. Not every LLM workflow is a chatbot

no, but if you're not latency sensitive you should probably be using DeepSeek v3 (cheaper than flash, significantly smarter)

High concurrency voice AI systems.

Why are you self hosting that?

j / k navigate · click thread line to collapse