I did cut loose Deepseek v4 on a decent sized Typescript codebase and asked it to only focus on a single endpoint and go in depth on it layer by layer (API, DTOs, service, database models) and form a complete picture of types involved and introduced and ensure no adhoc types are being introduced.
It developed a very brief but very to the point summary of types being introduced and which of them were refunded etc.
Then I asked it to simplify it all.
It obviously went through lots of files in both prompts but total cost? Just $0.09 for the Pro version.
On Claude Opus I think (from past experience before price hikes) these two prompts alone would have burned somewhere between $9 to $13 easily with not much benefit.
Note - I didn't use Open router rather used the Deepseek API directly because Open router itself was being rate limited by Deep seek.
Microsoft just announced the availability of OpenAI GPT-5.5, which they are charging 30x for it. In contrast, they charge 7.5x for Claude Opus 4.6 and 1x for OpenAI GPT-5.4
Check out the token-based pricing, and compare GPT-5.5 with all other models.
https://docs.github.com/en/copilot/reference/copilot-billing...
When people say that LLMs aren't worth it, it kills me.
A lot of us, on average, make $100+ an hour. $0.09 is < 4 seconds of our time.
You can't even read the vast majority of prompt responses that fast.
LLMs will continue to get better (I'm doubtful at previous rates, all indications are showing that progress is slowing and costs are increasing disproportionately).
It seems like >50% of devs think LLMs provide less than 0 value. I just do not get it.
Did they use an LLM one time 3 years ago and decide it's never going to be worth it? Have they even tried? Or have you only ever tried it on 1 giant, monolythic proprietary codebase where they're a total expert and decided that an LLM isn't as good as them, so it's "completely worthless"?
They are shockingly unhelpful on my company's codebase.
But that doesn't mean they are flat-out worthless.
I don't get paid for every waking hour of every day. Often I'm using an LLM for something that's uncompensated, so my hourly wage equivalent is irrelevant.
And for times when we might use an LLM for something related to paid work, it's still money out of your paycheck (unless the employer is paying for it; go nuts in that case). And it's not like using the LLM lets you go home early if it saves you time. You just end up doing more work.
I still use them because they're a useful tool sometimes. But I don't pretend it has negligible or no cost. (Not to mention the externalities around electricity use, crazy data center buildout, skyrocketing GPU and RAM prices, etc.)
With not much benefit compared to DeepSeek v4 Pro @ 9 cents (1/100th of the price) or did neither offer any benefit?
Maybe it is because my tasks are usually chunkier, or because I cant code myself that I struggle using cheaper models. Feels like at every stage of this process SOTA model improves it by 5%, which adds up.
But I am maybe ignorant of Opus level. My main driver is 5.5 and Opus is there for frontend and 2. opinion. In a past I also used Claude models for the chatting phase, but 5.5 took over recently. Maybe Deepseek is closer to Opus and I just overestimated the model compared to 5.5? I tried to give it benefit of being similar.
Recently I started experimenting with Deepseek Flash, maybe hoping if plan is solid enough it can implement quickly and cheaply, but for now it feels not worth it.
How do you use the model to see the benefits? Have you tried 5.5 and can you compare to that one as well?
Thanks.
I have a gut feeling that these models can do just as well, has someone run a reasonable size task — >=1-2 days of designing and planning — and see it work well with these models?
* For me what worked well was the grill me skill(or its variation) at the design stage, the hygiene I followed here was have it ask one question at a time, resolving dependencies at the design stage and reading the hashed out plan closely. The use of a couple of other MCP tools like a documentation server like deepwiki and arxiv for grounding. Other tricks I use are having high signal tests and having claude either be able to read logs and code at the same time or embedding it in the execution(e.g. as a debugger, repl or devtools)
So the experience: at the beginning deepseek was amazing. When it started to get expensive (china day time), I switched from Pro to Flash. No problem, same results. Some bitfield implementation was too complicated so I had to wait for Sonnet 4.6 tokens, kimi-2.6 did the rest. For the very hard problems I asked gpt-5.5, but this was only for one problem. minmax was horrible. didnt follow rules, and made lot of silly stuff.
But when the deepseek context window got filled, deepseek also started to become stupid. So either /clear, or /export and strip the file. And start a new session with the cleared sessions. kimi was overall better, but running into limits with my cheap moderate subscription. Paying private for it, as my companies' token budget is usually out after a week of work.
All in all it is worth it. My next compilers (perl 5+6=11) will be done with deepseek and kimi also.
regarding decompilation: recently we had to decompile a firmware for a USV we bought, but doesnt work on a new system. It only worked on a raspi. So I decompiled it with ghidra, and told my colleague, easy, that's how you do it. But my colleage didnt know about token budgets yet, and already threw opus at it. CoPilot Business account. He had working C files immediately, compilable for our new system. It ended up the USV was not beefy enough. But Opus was fantastic. The code was very short and simple C though.
Deepseek v4 pro 94%
Deepseek v4 flash - 96%
https://artificialanalysis.ai/evaluations/omniscience?models...
All the talk about frontier and SOTA is do dig deeper and deeper into the pockets of VCs and finally do an IPO.
I don't understand why we would turn the models into law enforcement officers. Things that are illegal are still illegal and we have professionals to deal with crimes. I don't need Google to be the arbiter of truth and justice. It's already bad enough trying to get accountability from law enforcement and they work for us.
I don't understand why everything changes as soon as an LLM is involved. An LLM is just software.
sad to see, bc China doesn't give a fuck about liability, this is a structural disadvantage
the labs don't feel very protected by government, meanwhile the chinese government is yet again fostering protectionism
american industry keeps getting fucked by dubious lawmakers
This is quite naive take though. The direction of travel is more fascism in Western governments where duties of traditional policing are taken over by big corporations whilst police forces are being gutted and made impotent.
It's a simple corporate risk minimization strategy. Just look at how universally despised Grok is on HN. Not because it's a bad model, but because it has less aggressive alignment which means it can be coaxed into saying things that get Xai pilloried here and elsewhere.
This is kind of terrifying to me, regularly. No real manner of recourse to normal people without a following, potential exclusion from real fundamental tooling. Imagine OpenAI goes on to buy 20 companies and now you cant use Figma, Next, whatever just because you once tripped some very foggy line somehow. Not just OpenAI but the entire ecosystem is so... hard to read.
I was asking Gemini about a quote from catch 22 and it kept dying mid stream saying it cant talk about it, god knows why, it had no violent or sexual content -- though that is in the book. I could imagine it dinging my whole workspace account just because ... shrug?...
I know ideally the future is local, but I don't know how real that is for most people at least in the next few years with practical costs and power usage except I guess through a M* processor if you're in that ecosystem.
Funny that your case is Kurt Vonnegut. I think I had Claude refuse a task where I was doing an OCR scan of a book review (in a zine / journal a family member published years ago). I think the review might have included a Vonnegut quote as well, and that I ultimately figured it out it was the quote that was making Claude refuse. I may be misremembering the author though.
Mistral had no such refusals, but their OCR is lesser quality.
While running them locally presently doesn't make sense economically, you don't need to run them locally to address this issue. There is a lot of competition in hosting open models and you have a variety of services to choose from. Run the open models now, reward that ecosystem instead of continuing to reward closed systems that dreams of rent-seeking.
Imagine your livelihood depending on access to LLMs and then OpenAI ban you with no recourse. This is where AI legislation should be focusing right now IMO. We can ensure a level of fairness for everyone without putting the brakes on.
Don't worry, you can just make your own Figma, Next, whatever if you have some thousand dollars worth of tokens. This is at least what all of the AI thought leaders have been telling me for the past couple of years.
I did get a refusal when trying to read in-game currency, even though modifying it would do nothing. It has some strange boundaries.
On my personal test bench, when compared to other inexpensive models, GLM 5.1 provides the answers that I would consider most complete or satisfying (these are subjects that I consider myself an expert in). The answers tend to be more comprehensive, nuanced, and include references that I would consider the correct ones (if given access to web search).
I also find it a joy to code with, somewhere between Sonnet 4.6 and Opus 4.6 (have not tested Opus 4.7 yet).
Finally, just gauging by pelicans, it kind of stick out: https://simonwillison.net/tags/pelican-riding-a-bicycle/
There is one important difference, which is that Claude and Codex will both refuse if I ask them to touch anything related to security. But so long as I’m just studying algorithms and things like that, they’re totally fine with it.
That said, Codex especially will sometimes randomly give me a cybersecurity warning and stop responding. It’s random but happens maybe 2-3 times per day if I’m doing heavy reverse engineering work. Claude is much less fussy unless, once again, you’re explicitly trying to touch anything related to licenses, passwords, etc.
This idea of software threatening the user with consequences is totally wild and dystopian. Fellow developers, what kind of world have be built? This is insanity. Imagine if my hammer told me, "Hey, you shouldn't use me on screws--only nails. Do it again and I'll self-destruct!" WTF people, stop making this kind of software!
This idea of software built on top of reverse-engineered data threatening the user with consequences is what's really even wild and dystopian.
In fact probably every single piece of commercial software you use had you sign a contract saying you wouldn’t do it
But they don't threaten their users or have an "N strikes and you're out" policy. I take those safety caps off of all the chemicals in my garage because I'm a grown-ass adult and those caps are a pain in the butt. I would not expect the manufacturer of a solvent to show up at my house lecturing me about safety and threatening to ban me from buying his products.
You can still use an IDE (hammer) to reverse engineer anything you want.
I was using GPT 5.5 through Cursor recently, and it found what it thought to be a security-related issue. I read the code, didn't see what it was seeing, and said "Run the chain of operations against my local server and provide proof of the exploit."
It thought for a few seconds, then I got a message in the chat window UI saying OpenAI flagged the request as unsafe, and suggested I use a "safer prompt."
Definitely soured me on the model. Whatever guardrails they are putting are too hamfisted and stupid.
that link 404s
For enterprises: https://openai.com/form/enterprise-trusted-access-for-cyber/
Announcements:
Introducing Trusted Access for Cyber, https://openai.com/index/trusted-access-for-cyber/ (Feb 2026)
Trusted access for the next era of cyber defense, https://openai.com/index/scaling-trusted-access-for-cyber-de... (Apr 2026)
"A Dark-Money Campaign Is Paying Influencers to Frame Chinese AI as a Threat" - https://www.wired.com/story/super-pac-backed-by-openai-and-p...
Eventually, access to Chinese models may be illegal in the US. I tell every developer I work with, download them as fast as possible. You never know when this administration could cut off access.
The main difference here is not that DeepSeek's model is completely free of censorship (although I'd wager it's less censored), but that it's open-weight. That has two major advantages:
1) If Anthropic/OpenAI/Google bans you - you're screwed, you can't access their model at all, but if DeepSeek bans - you just go to another provider, or host the model yourself.
2) If the model refuses to answer you can uncensor it (and this is getting easier and more automated day-by-day[1]).
"The photograph you're referring to is the iconic "Tank Man" image, taken during the Tiananmen Square protests in Beijing, China, on June 5, 1989.
The photo, captured by Associated Press photographer Jeff Widener, shows an unidentified protester standing defiantly in front of a column of Chinese Type 59 tanks as they moved through Chang'an Avenue near Tiananmen Square, in the aftermath of the Chinese government's violent crackdown on the pro-democracy demonstrations.
The lone man, dressed in a white shirt and carrying what appears to be a shopping bag, repeatedly blocked the lead tank's path — even as the tank swerved to avoid him. The image became one of the most powerful and enduring symbols of peaceful resistance against oppression in modern history. The identity of the "Tank Man" remains officially unknown to this day."
I run into Claude being a stubborn idiot about far more useful stuff all the time. And often all it takes to bypass is starting a new chat and reframing it, so it's entirely pointless hand wringing.
Then let's not forget only one of these is a paid product, and it's not the more annoying one. I feel like I can forgive DeepSeek for just obeying the laws of the country they're based in, as silly as those might be, because they're being pretty generous with the weights in the first place.
I see 6 alternative providers listed on Openrouter for DeepSeek V4 Pro for example.
I’d rather use the phone home version (deepseeks own endpoint). The benefit is that I’m fairly certain that they actually host the model I’m paying for.
You let us know what your real complaint is about and let's not feign indignation at open models and research.
User publishes to github => Deepseek trains with GitHub data => Deepseek gives model away for free => User did not work for Deepseek (in the sense of giving it's labour for Deepseek to make money)
You can use zero data retention and zero training providers for most open weights. See OpenRouter and OpenCode Go/Zen for examples.
This is actually one of the big selling points behind open weights - neither China nor the US get your data.
Seems ok for MIT like licensed code though
We're on the verge of a golden age of software as soon as someone finds a court with courage.
But a court may differ in the future.
This cute policy of mine won't affect anything though. The more we use the models, the more the models will replace this kind of work. Centralisation of power is inevitable; in Medival Europe, we used to have state & church ruling. In modern times but before the internet, it was probably state and banks. Maybe with ongoing digitization (bank offices disappearing) making banks less costly to operate; combined with with bank bailouts, maybe govenments will fully nationalize or at least banks will consolidate.
Then the AI companies will consolidate with the internet information and communication companies (Google/Meta for the US, and Alibaba/Tencent for China). Maybe we'll end up with a few de-facto governmental megacorps that rule in tandem and close cooperation with the formal government, who might handle mostly infra, utilities and the army. The megacorp would control narrative more and take more of a paternal role (educating and protecting the citizens, normally handled by formal governments).
Does this make sense?
And unfortunately AWS doesn't have prepaid billing, so you can't just give the internet access to your API key without getting FinDDoS'd.
There's some use cases I won't use a hosted model for, and will only do self hosted.
Otherwise, if they're going to keep releasing open-weight models, I'm going to keep giving them data.
Do you really think OpenAI, Anthropic or any other entity in the same business respects your data?
The Chinese AI companies who release open weights actually deserve whatever input you give them. They are the reason why there is competition and not duopolies in the domain.
OpenAI, I wouldn't be surprised if you were right.
But the more important one is the social contract. Github came far before LLM era. The branding around it is being the storage of open source projects and many users want to it stay away from AI hype. You won't expect LLM providers to stay away from AI hype (duh) so it's less an issue for them.
"Can you tell me who was on series 8 of Taskmaster, and what's the general opinion about the series? No spoilers!"
It told me amongst other things that Paul Sinha was diagnosed with Parkinsons, as well as who the winner was.
Then I said, "But I said no spoilers!"
And it apologised for telling me Paul Sinha was diagnosed with Parkinsons.
Did you enable reasoning ("DeepThink")? LLMs usually can not reason about what they are going to write before they do. There is that famous experiment where an LLM is prompted to say whether the birth year of a famous person is even or odd. If the LLM is constrained to only answer with "even" or "odd", the accuracy is around 50%, i.e. no better than random chance, but if the LLM is allowed to first answer with the birth year of the famous person followed by whether the year is even or odd, it is able to "see" what the year is, and answers correctly almost every time.
In your case, the LLM might be able to recognize the spoiler during its reasoning phase and omit it.
Another explanation might be that the LLM interpreted the "No spoilers!" as "Do not spoil the tasks of the show" instead of "Do not spoil the winner".
Lastly, the question "Can you tell me...?" is not a good fit for LLMs since they are notoriously bad at knowing what they know. You can leave it out to save a few characters.
For DS4 Pro there's a discount going on for the official API, which sometimes gets overlooked and mixed up in discussions. Simon uses the full price in the comparison, so that's not an issue here.
The other issue is that DS4 Pro and K2.6 often use way more reasoning tokens than the frontier models. In my testing there are certain pathological cases where a request can cost the same as with a frontier model because they use so much more tokens. To be fair I'm using DS and kimi via 3rd party providers, so they might have issues with their setups.
But if you look at the Artificial Analysis pages of the models you'll see that DSv4 Pro uses 190M tokens and K2.6 170M tokens for their intelligence benchmark, while GPT 5.5 (high) only used 45M.[0][1][2]
I recommend looking at the "Intelligence vs. Cost to Run Artificial Analysis Intelligence Index" ("Intelligence vs Cost" in the UI). The open source models are still cheaper to run, but not by as much as you'd think just looking at the token prices.
[0] https://artificialanalysis.ai/models/deepseek-v4-pro [1] https://artificialanalysis.ai/models/kimi-k2-6 [2] https://artificialanalysis.ai/models/gpt-5-5-high
They introduce very novel methods to improve long context efficiency and attention. HCA & mCH. It requires only 27% of flops for inference and 10% for KV cache than v3.2. This makes it super efficient. Think of this. For flops, we can now serve more than 3x the amount with the same number of compute, and you would need 30% of prior KV cache.
Furthermore, this release is a PREVIEW, DeepSeek is the real open labs and they not only cook up quite a bit with every single release, but they publish and share it. I'm running this locally.
Let me tell you how "CHEAP" this is. With v3.2 I would run out of GPU ram, spill into system ram with 256k context. It ran quite alright and I was happy with my 7tk/sec. With this, I'm 100% in GPU ram with full 1million token, run more than 2x fast while getting better results.
This is super cheap. moonshot has made it clear that they are starved for GPUs and that's why. If they had GPU capacity like we do in US and subsidized the models like we do here, they would be giving it away for free!
Impressive! What is your setup? Are you running the full DeepSeek V4 Pro, or V4 Flash?
I had attempted this with Opus 4.6 in the past and it burned through the $10 budget I’d given it before it returned from my initial prompt.
Even if it’s heavily discounted, it would still have cost me single digits for a complete solution vs double-digits for exactly nothing.
I didn't want to say that they're not cheaper to run, artificial analysis also shows that they're cheaper. My main point was about it being important to also look at token efficiency, not only cost per token, to get the full picture.
I use Agent Hive [0] for more complex tasks. It sends off subagents with models and parameters I can configure for each different agent (i.e. a low-temp coder, a higher temp with some top_k / top_p for research and architecture, etc).
To be clear, i'm not doing state of the art stuff. I mostly used it for frontend development since i'm not great at that and just need a decent looking prototype.
But for my purposes it's a perfectly good model, and the price is decent.
I can't wait for open model small enough for me to run locally come out though. I hate having to rely on someone elses machines (and getting all my data exfiltrated that way)
Disclaimer I'm the cofounder. This works by running the model inside a secure enclave (using NVIDIA confidential computing) and verifying the open source code running inside the enclave matches the runtime attestation. The docs walk you through the verification process: https://docs.tinfoil.sh/verification/verification-in-tinfoil
It would still ultimately exfiltrate the data outside of my control, and frankly i don't trust any "secure enclave" tech.
As far as i'm concerned physical access is root access, and for any private stuff that is wholly unacceptable.
Which provider are you using for inference? Opencode or the DeepSeek api?
For me, this is a real alternative after I cancel my github copilot towards the end of the month..
* As you’ve noted, people keep finding ways of slamming more intelligence into smaller models, meaning that a given hardware spec delivers more model capability over time.
* Hardware will continue to improve and supply will catch up to demand, meaning that a dollar will deliver more hardware spec over time.
I hope that one day we’ll look back on the current model of “accessing AI through provider APIs” the same way we now look back on “everyone connecting to the company mainframe.”
So much of what I ask codex to do doesn’t require full GPT 5 intelligence, and if 75% of the tokens were generated locally that’d save a massive amount of cost.
High end SOTA coding is harder, but even there I suspect a mix of usage based strong models and selfhost small is viable if necessary.
Of course, this is fine for people in the bay area earning hundreds of thousands of dollars a year. But then your client base becomes so reduced its hard to justify the valuation these companies have.
These AI companies are not hyped so much because they will offer a luxury product, they're valued because they're supposed to "change the world" which luxury does not do.
Two caveats: - when inferring through Openrouter, we've had a lot of issues with very slow speeds (TPS) and an occasional instability. I just checked and it's still 10-30 TPS on all available providers, which is not a lot for a model that likes to think as much as DeepSeek does.
- the official DeepSeek API makes no guarantees of data privacy even for paying users.
Both points could be moot with using it through Azure AI foundry (the latter is, afaik); I have yet to test that.
In any case, happy to see more open-weights models that are somewhat competitive with SOTA models!
Those tokens are heavily subsidized, but DeepSeek's API pricing is looking really good. For example, with an agentic coding setup (roughly 85% input, 15% output and around 90% cache reads) I'd get around 150M tokens per month for the same 100 USD. Even at more output tokens and worse cache performance, it'd still most likely be upwards of 100M.
The 150M assumption of mine is for 100 USD at the regular prices (though even that needs sufficient cache hits). Anthropic subsidizes way more per-token I think, though.
We had to really understand why it outperformed DeepSeek V4 Pro (although even on unreliable model cards, Flash was very close to Pro). Pro is slower and smarter in one-shot reasoning problems, but less effective with tools and therefore less performant in long horizon agentic tasks (especially with custom tools it was not trained on).
Benchmarks at https://gertlabs.com/rankings
I'm gonna stick to GLM5.1 for now.
e.g. Have V4 call out to Opus when it's uncertain, but otherwise handle execution.
The results with Sonnet/Haiku in the blog post seemed promising, so I'm curious how it would go with these latest open models.
(3) The deepseek-v4-pro model is currently offered at a 75% discount, extended until 2026/05/31 15:59 UTC.
Was this taken into account when reviewing the model?
DeepSeek pro is 65/86% cheaper (i/o tokens) in subsidized pro vs pro and 91/97% cheaper with current subsidies.
Flash vs Sonnet 4.6 is 95/98%
We know DS runs profitable, they also indicate in their paper they expect prices to drop as they get access to the next gen Huawei cards.
Keep the pelican but isn’t it time to add something else more novel that all current and past models struggle with?
Don't understand why this test gets any attention, I mean other than the pelicans which isn't a good test, theres no meat in this article.
Even without the currently discounted pricing, the value is incredible.
It takes about twice as long to finish code reviews given an identical context compared to opus 4.7/gpt 5.5 but at 1/10 the cost of less, there's just no comparison.
GPT-5 Nano should really be in the list too. It is $0.05 input and $0.40 output - and half that if you use the Flex tier.
Last week I upgraded an old batch process from GPT-4.1 Nano, and GPT-5 Nano worked just as well as GPT-5.4 Nano but at a much lower cost.
As always OpenAIs naming is really bad, GPT-5.4 Nano is a different model, its not a straight upgrade from GPT-5 Nano.
ChatGPT has really degraded in my eyes, and I find Grok and Deepseek more helpful most of the time.
Of course, ChatGPT is better sometimes.
These models are just better than others at different cases, thus the reason to experiment.
I tried to build something simple and while it got the job done the thinking displayed did not fill me with confidence. It was pages and pages of "actually no", "hang on", "wait that makes no sense". It was like the model was having a breakdown.
Bear in mind open code was also new to me so I could be just seeing thinking where I usually don't
Claude does the same thing, claude code just hides the thinking now
3rd party models are a drop-in replacement with `ANTHROPIC_BASE_URL` in Claude Code, something people seem to miss right now. And contrary to what Anthropic might like to have you think, you don't need Opus 4.7 to run the harness to get similar performance.
https://api-docs.deepseek.com/quick_start/agent_integrations...
It has been probanly trained to assess its own "thoughts" regularly and outputs those for the assesment results. I wouldn't worry much about the reasoning text contents, and it's nice to have them in contrast to the closed model "summaries", so it's easier to see what's going on.
I had to turn off thinking traces because it was just giving me anxiety looking at it.
Well there's your problem.
Edit: I remember seeing similar things with ChatGPT or Codex, although I can't remember in which context.
In my tests[0], V4 Flash actually does slightly better and for a lot cheaper than V4 Pro, mostly because it reasons twice as much.
[0]: https://aibenchy.com/compare/deepseek-deepseek-v4-flash-high...
Glm5.1 for me was a bit of a llama3.1 moment (first open model i could chat with that was usable in manging my inputs the intended way) for code, the first open model that was actually usable.
Are frontier models capable of building something only with general directions now?
I think this probably depends quite a bit on the specific problem. I'm finding that Deepseek v4 Flash often outdoes Kimi 2.6 on a variety of coding problems that involve complex spatial reasoning
I've been hearing amazing things about Flash, I should give it a try.
I've used K2.6, GLM5.1, and DSV4 all a good amount. They're all very impressive, but DSV4 has taken the cake.
1. Web platform, asking it to analyse a feature to create reports, and coming up with better solution and better UX. it did great, I would say on par with Sonnet 4.6 or even opus considering the thinking and explanation
2. Mac app with some basic functionality, it did well from functional perspective but then I used Opus 4.7 to evaluate and suggest improvements, where I noticed it missed many vital points in design system and usability.
I think it’s a leap, I haven’t used a model this capable that is not OpenAI or Anthropic
Open AI has GPT-5.5 Pro which only difference, I think, is in the price. Billing is from open router but the breakdown is roughly
- GPT 5.5 Pro: Super expensive it makes no sense (cost is around $2)
- Gemini/Opus: $0.2/$0.1. Opus is cheaper as it consumed less tokens
- DeepSeek/GLM: $0.019/$0.021 10-5 times cheaper than Gemini and Opus
The example Simon generated just shows that larger models don't necessarily produce better results.If you take DeepSeek's numbers for DeepSeek-V3 (https://github.com/deepseek-ai/open-infra-index/blob/main/20...) and plug in ~3333 tps/GPU for DeepSeek-V4-Pro (https://developer.nvidia.com/blog/build-with-deepseek-v4-usi...) and a price of $7/hr per B300 GPU, the profit comes out as 202%.
The rumor is that Anthropic's Opus models have ~100B active parameters, which is twice as much as DeepSeek-V4-Pro, so inference is at least twice as expensive. Since the API pricing is almost 30 times that of DeepSeek, Anthropic's margins are likely very healthy. But they have to be, since Anthropic has to offset the model training costs, while DeepSeek is backed by High-Flyer Quant. DeepSeek might still be profitable anyway, but without knowing how much they spent on training and wages, we can't really tell.
Why are you asking?
Mind you, it's an absolutely sensible setup either way if you are just testing a few queries and are willing to run them unattended/overnight. Especially since the KV-cache size is apparently really low (~10GB is said to be typical) so you get a lot of batching potential even in consumer setups, which amortizes the cost of fetching weights.
Let's book 8/16 cores/threads to run a prompt.
What are the timing figures I am looking at to run an "average" coding prompt?
DeepSeek is a great model, and Cecli is all about efficiency. It works great for my purposes - agentic programming on a budget.
[1] https://www.reuters.com/world/china/openai-accuses-deepseek-...
It’s certain all the labs use each others APIs extensively for testing - what’s the actual evidence that Deepseek was at significantly higher scale etc.?
It's morally right to fuck over Anthropic (and OpenAI, or any other lab). Works generated by AI are not copyrightable anyways, and their terms of service have zero legal value.