undefined | Better HN

0 pointsstocknoob1y ago0 comments

It’s wild, are people purposefully overlooking that inference costs are dropping 10-100x each year?

https://a16z.com/llmflation-llm-inference-cost/

Look at the log scale slope, especially the orange MMLU > 83 data points.

0 comments

Those are the (subsidized) prices that end clients are paying for the service so that's not something that is representative of what the actual inference costs are. Somebody still needs to pay that (actual) price in the end. For inference, as well as for training, you need actual (NVidia) hardware and that hardware didn't become any cheaper. OTOH models are only becoming increasingly more complex and bigger and with more and more demand I don't see those costs exactly dropping down.

atleastoptimal1y ago

Actual inference costs without considering subsidies and loss leaders are going down, due to algorithmic improvements, hardware improvements, and quantized/smaller models getting the same performance as larger ones. Companies are making huge breakthroughs making chips specifically for LLM inference

menaerus1y ago

In August 2023, llama2 34B was released and at that time, without employing model quantization, in order to fit this model one needed to have a GPU, or set of GPUs, with total of ~34x2.5=85G of VRAM.

That said, can you be more specific what are those "algorithmic" and "hardware" improvements that has driven this cost and hardware requirements down? AFAIK I still need the same hardware to run this very same model.

1 more reply

croes1y ago

A bit early for a every year claim not to mention what all these AI is used for.

In some parts of the internet it’s you hardly find real content only AI spam.

It will get worse the cheaper it gets.

Think of email spam.

j / k navigate · click thread line to collapse

0 comments

menaerus1y ago

atleastoptimal1y ago

menaerus1y ago

In August 2023, llama2 34B was released and at that time, without employing model quantization, in order to fit this model one needed to have a GPU, or set of GPUs, with total of ~34x2.5=85G of VRAM.

1 more reply

croes1y ago

A bit early for a every year claim not to mention what all these AI is used for.

In some parts of the internet it’s you hardly find real content only AI spam.

It will get worse the cheaper it gets.

Think of email spam.

j / k navigate · click thread line to collapse