undefined | Better HN

0 pointsgjsman-100021d ago0 comments

With some research, that chip appears like it would cost about $300-$400 to manufacture, die only.

For an 8B parameter model.

Opus is estimated at 500B-2T parameters. At that scale you’re past reticle limits and need HBM and multi-die packaging, which means you’ve essentially built an inference ASIC (like Groq or Etched) rather than something categorically cheaper than GPUs. The “burned into silicon” advantage mostly evaporates at frontier scale.

0 comments

mixermachine21d ago

The cutting edge, max size models will likely stay in the GPU space for a long time. But these models are not needed for most general requests. With a fine tuned 30B quantisized model you can serve a large portion of requests with around 32GB of RAM. Free users will likely only get these kinds of models.

At some point we will get these models in hardware and the cost per token will be minimal.

zozbot23421d ago

> With a fine tuned 30B quantisized model you can serve a large portion of requests with around 32GB of RAM. Free users will likely only get these kinds of models.

These are exactly the kinds of models that you can easily run locally by repurposing existing hardware. Depending on how much you're willing to wait for the answer, running local even gives you strictly better outcomes for simple Q&A queries.

(Long-context and agentic use cases are admittedly much harder to fit under that model, since non-AI uses for the high-end hardware you'd realistically need for those are rather more limited, and they're hit by the ongoing hardware shortage.)

mixermachine20d ago

For programmers maybe. I do this too. But think about all the regular users out there. Your dad and your mum, maybe even your grandparents. This is a huge marked too and for that we can use these special chips at scale.

robkop20d ago

There’s a lot of tradeoffs to play with, those inference ASICs may not carry the gradient but they are still optimised for larger batches and to run any model. They need enough memory for the weights, wide batch inference, and ideally leftovers for kv cache efficiency.

For personal inference you’re given a lot more room to play in - much of it poorly explored today - enough to concern an argument of cost advantages evaporating

tomrod21d ago

Does the cost scale linearly/superlinearly? What does the $300-$400 price data point tell us with relationship to the parameter density?

No gotchas here. I genuinely don't know that 8B parameters is in a zone with significant decreasing marginal returns -- too far out of my knowledge area but genuinely curious.

avidiax21d ago

Die size increases cost exponentially, by decreasing chips per wafer and decreasing yield.

I expect that this kind of burned-in model is also very difficult to verify (how do you know if some of the weights are off), and not amenable to partial disablement to increase yield. For CPUs, you just laser disable bad cores. Can't forego part of a neural net.

robkop20d ago

You can ablate surprisingly large chunks of a model with near to no effect, you can try this easily - download an open weight model in torch.

Obviously it’s not ideal but you could likely have single digit % of all weights affected and still have a useful model (many caveats here: e.g. locality of damaged weights matters, distribution of errors matters, fail high/low matters, …)

hdndjsbbs21d ago

I mean, you probably can just turn off defective parts of the network. You better believe if this becomes popular they would salvage yields by selling "dumber" chips at a discount.

vrighter21d ago

except that if you do, you've just implemented a different model, with no way to tell which part of it is wrong

j / k navigate · click thread line to collapse

0 comments

mixermachine21d ago

At some point we will get these models in hardware and the cost per token will be minimal.

zozbot23421d ago

> With a fine tuned 30B quantisized model you can serve a large portion of requests with around 32GB of RAM. Free users will likely only get these kinds of models.

mixermachine20d ago

robkop20d ago

For personal inference you’re given a lot more room to play in - much of it poorly explored today - enough to concern an argument of cost advantages evaporating

tomrod21d ago

Does the cost scale linearly/superlinearly? What does the $300-$400 price data point tell us with relationship to the parameter density?

No gotchas here. I genuinely don't know that 8B parameters is in a zone with significant decreasing marginal returns -- too far out of my knowledge area but genuinely curious.

avidiax21d ago

Die size increases cost exponentially, by decreasing chips per wafer and decreasing yield.

robkop20d ago

You can ablate surprisingly large chunks of a model with near to no effect, you can try this easily - download an open weight model in torch.

hdndjsbbs21d ago

I mean, you probably can just turn off defective parts of the network. You better believe if this becomes popular they would salvage yields by selling "dumber" chips at a discount.

vrighter21d ago

except that if you do, you've just implemented a different model, with no way to tell which part of it is wrong

j / k navigate · click thread line to collapse