undefined | Better HN

0 pointscoffeefirst18d ago0 comments

> The cost for the same quality of output is going to drop at least 10x over the next 18-24 months.

How do you know that?

In 2026 the prices have been spiking. It now costs orders of magnitude more than it did in November.

0 comments

11 comments · 3 top-level

Ukv18d ago· 7 in thread

Price of the current frontier may vary, but price for a given level of capability tends to drop pretty fast.

April of last year you'd get 1431 ELO[0] from o3-2025-04-16 for $8.00 per million output tokens. April of this year you can get 1436 ELO from deepseek-v4-flash for $0.2 per million output tokens.

[0]: https://huggingface.co/spaces/lmarena-ai/arena-leaderboard

saxenaabhi18d ago

Sure, but i don't think it's reasonable to hold given level of capability constant in a landscape where a give consumer of AI also has competitive pressures.

I can't use last year's SOTA model when my competitors can use the current SOTA model.

This is also baked in the eye watering valuations of model companies.

margalabargala18d ago

> I can't use last year's SOTA model when my competitors can use the current SOTA model.

Lots of people can. Tools don't need to be top of the line to be useful. Snap-on may exist, but they don't put Harbor Freight out of business.

Advanced IDEs exist but complex projects were still built in vim.

The more capable the budget models get, the lower the marginal gains from using the frontier models, even if the frontier models always stay 6 months ahead.

onlyrealcuzzo18d ago

> I can't use last year's SOTA model when my competitors can use the current SOTA model.

You can use open source models of equivalent or better capabilities for ~90% less cost...

If you kick and scream hard enough, you can always find a data point to make sure you're correct.

No one is saying that the Opus model last year costs 90% less now than it does this year.

That's not how it works.

There are better, more efficient models with equivalent capabilities that are 90% cheaper (see DeepSeek v4 Pro).

rzmmm18d ago

The ranking is not comparable across time like that.

Ukv17d ago

I'm using the current ELO of the models, and both are still running in the arena.

Denzel17d ago

Aren’t DeepSeek models deliberately priced lower than the cost to deliver? They’re subsidized which means the true cost is more than $0.2/Mtok.

Ukv17d ago

DeepSeek models are open-source so there are a bunch of third-party providers offering similar prices. Factoring in that DeepSeek have to train the model (whereas third parties can make a small profit over just the inference costs) I'd assume that on net they're spending investor money, but I wouldn't think that's any less true of OpenAI.

1 more reply

onlyrealcuzzo18d ago· 1 in thread

> How do you know that?

Historic trends, every 18 months, performance for the same level of quality has gone down 90%.

See: https://www.reddit.com/r/LocalLLaMA/comments/1gpr2p4/llms_co...

And Chart 13 here: https://www.rdworldonline.com/ais-great-compression-20-chart...

And here: https://epoch.ai/data-insights/llm-inference-price-trends

The technology already exists now on the algorithmic front for the next 10x drop between everyone adopting DeepSeek's MLA, MoE (mostly already done), Medusa (a better version of Google's speculative decoding), Kimi's Attn Residuals, and Mimo's Sliding Window Attn, and (possibly) Microsoft's 1.58b (this may be a nothing burger).

Historically, algorithmic gains are only ~30% of the pie, but there's enough out there to get to 10x, with just what's available already. The other ~70% of the pie is better training data (often synthetic) and distilling frontier knowledge. There's no sign we are tapped out on that front.

> In 2026 the prices have been spiking.

That's not for the SAME level of output...

Der_Einzige18d ago

MoE isn’t the magical improvement you think it is. Logprobs of MoE models are always worse in quality than the dense equivalent and they struggler harder at very long context quality than equivalent dense models. This is why Chinese companies like qwen are releasing dense and MoE versions of their models at near equivalent sizes. I always use/prefer the dense one.

Speculative decoding usually only improves decode and sometimes actually harm prefill and for agentic coding prefill matters more.

You’re right about the rest but I need to set the record straight on these details.

senordevnyc17d ago

It now costs orders of magnitude more than it did in November.

Really? Care to do the math for me? Just curious about exactly how many orders of magnitude it's gone up.

j / k navigate · click thread line to collapse

0 comments

11 comments · 3 top-level

Ukv18d ago· 7 in thread

Price of the current frontier may vary, but price for a given level of capability tends to drop pretty fast.

April of last year you'd get 1431 ELO[0] from o3-2025-04-16 for $8.00 per million output tokens. April of this year you can get 1436 ELO from deepseek-v4-flash for $0.2 per million output tokens.

[0]: https://huggingface.co/spaces/lmarena-ai/arena-leaderboard

saxenaabhi18d ago

Sure, but i don't think it's reasonable to hold given level of capability constant in a landscape where a give consumer of AI also has competitive pressures.

I can't use last year's SOTA model when my competitors can use the current SOTA model.

This is also baked in the eye watering valuations of model companies.

margalabargala18d ago

> I can't use last year's SOTA model when my competitors can use the current SOTA model.

Lots of people can. Tools don't need to be top of the line to be useful. Snap-on may exist, but they don't put Harbor Freight out of business.

Advanced IDEs exist but complex projects were still built in vim.

The more capable the budget models get, the lower the marginal gains from using the frontier models, even if the frontier models always stay 6 months ahead.

onlyrealcuzzo18d ago

> I can't use last year's SOTA model when my competitors can use the current SOTA model.

You can use open source models of equivalent or better capabilities for ~90% less cost...

If you kick and scream hard enough, you can always find a data point to make sure you're correct.

No one is saying that the Opus model last year costs 90% less now than it does this year.

That's not how it works.

There are better, more efficient models with equivalent capabilities that are 90% cheaper (see DeepSeek v4 Pro).

rzmmm18d ago

The ranking is not comparable across time like that.

Ukv17d ago

I'm using the current ELO of the models, and both are still running in the arena.

Denzel17d ago

Aren’t DeepSeek models deliberately priced lower than the cost to deliver? They’re subsidized which means the true cost is more than $0.2/Mtok.

Ukv17d ago

1 more reply

onlyrealcuzzo18d ago· 1 in thread

> How do you know that?

Historic trends, every 18 months, performance for the same level of quality has gone down 90%.

See: https://www.reddit.com/r/LocalLLaMA/comments/1gpr2p4/llms_co...

And Chart 13 here: https://www.rdworldonline.com/ais-great-compression-20-chart...

And here: https://epoch.ai/data-insights/llm-inference-price-trends

> In 2026 the prices have been spiking.

That's not for the SAME level of output...

Der_Einzige18d ago

Speculative decoding usually only improves decode and sometimes actually harm prefill and for agentic coding prefill matters more.

You’re right about the rest but I need to set the record straight on these details.

senordevnyc17d ago

It now costs orders of magnitude more than it did in November.

Really? Care to do the math for me? Just curious about exactly how many orders of magnitude it's gone up.

j / k navigate · click thread line to collapse