undefined | Better HN

0 pointsrpdillon4mo ago0 comments

You're pointing out a bunch of high capex costs (hardware, SRAM), but then concluding that their opEx is greater than their revenue on a per unit basis. Are they really losing money on every token? It seems that using hardware acceleration would decrease inference costs and they could make it up on unit economics over time.

But I'm just reasoning from first principles. I don't have any specific data about them.

0 comments

aurareturn4mo ago

  It seems that using hardware acceleration would decrease inference costs and they could make it up on unit economics over time.

Nvidia GPUs are accelerators too. The reason they can do this so fast is because they're storing entire models in SRAM.

rpdillonOP4mo ago

There are degrees of acceleration. My understanding, limited as it is, is that groq and cerebras are using highly optimized acceleration to achieve their token generation rates, far beyond that in a regular GPU, and this leads to lower costs per token.

Is this incorrect?

aurareturn4mo ago

Yes, they're called ASICs on Grog. But Cerebras has more general cores that can do more complex things. Inference is mostly limited by bandwidth though.

j / k navigate · click thread line to collapse

0 pointsrpdillon4mo ago0 comments

But I'm just reasoning from first principles. I don't have any specific data about them.

0 comments

aurareturn4mo ago

  It seems that using hardware acceleration would decrease inference costs and they could make it up on unit economics over time.

Nvidia GPUs are accelerators too. The reason they can do this so fast is because they're storing entire models in SRAM.

rpdillonOP4mo ago

Is this incorrect?

aurareturn4mo ago

Yes, they're called ASICs on Grog. But Cerebras has more general cores that can do more complex things. Inference is mostly limited by bandwidth though.

j / k navigate · click thread line to collapse