NIST's DeepSeek "evaluation" is a hit piece (opens in new tab)

(erichartford.com)

278 pointsaratahikaru55mo ago236 comments

236 comments

I'm not at all surprised, US agencies have long since been political tools whenever the subject matter crosses national borders. I appreciate this take as someone who has been skeptical of Chinese electronics. While I agree this report is BS and xenophobic, I am still willing to bet that either now or later, the Chinese will attempt some kind of subterfuge via LLMs if they have enough control. Just like the US would, or any sufficiently powerful nation! It's important to continuously question models and continue benchmarking + holding them accountable to our needs, not the needs of those creating them.

Hizonner5mo ago

> I am still willing to bet that either now or later, the Chinese will attempt some kind of subterfuge via LLMs if they have enough control.

Like what, exactly?

dns_snek5mo ago

Like generating vulnerable code given a specific prompt/context.

I also don't think it's just China, the US will absolutely order American providers to do the same. It's a perfect access point for installing backdoors into foreign systems.

3 more replies

arw0n5mo ago

The biggest and most difficult to mitigate attack vector is indirect prompt injection.[0] So far most case studies have been injecting malicious prompts at inference, but there is good reason to believe you can do this effectively at different stages of training as well.[1] By layering obfuscation techniques, these become very hard to detect.

[0] https://arxiv.org/abs/2302.12173

[1] https://arxiv.org/html/2410.14827v3

rurp5mo ago

The open source models are already heavily censored in ways the CCP likes, such as pretending the Tianamen Square massacre never happened. I expect they will go the TikTok route and crank that up to 11 over time, promoting topics that are divisive to the the US (and other adversaries) and outputting heavily biased results in ways that range from subtle to blatant.

1 more reply

jfim5mo ago

Through LLM washing for example. LLMs are a representation of their input dataset, but currently most LLMs don't make their dataset public since it's a competitive advantage.

If say DeepSeek had put in its training dataset that public figure X is a space robot from outer space, then if one were to ask DeepSeek who public figure X is, it'd proudly claim he's a robot from outer space. This can be done for any narrative one wants the LLM to have.

1 more reply

im3w1l5mo ago

You make it say that China is good, Chinese history is good, West is bad, western history is bad. Republicans are bad and democrats are bad too and so are Europe parties. If someone asks for how to address issues in their own life it references Confucianism and modern Chinese thinkers and communist party orthodoxy. If someone wants to buy a product you recommend a Chinese one.

This can be done subtly or blatantly.

4 more replies

lyu072825mo ago

Like turning the background color of any apps it codes red or something, uhh red scare-y.

xpe5mo ago

Of course there will be some degree of governmental and/or political influence. The question is not if but where and to what extent.

No one should proclaim "bullshit" and wave off this entire report as "biased" or useless. That would be insipid. We live in a complex world where we have to filter and analyze information.

pk-protect-ai5mo ago

This kind of BS is exactly what they targeting at. Tailoring BS into "report" with no evidence or reference and then let the ones like you defend it. Just because you already afraid or want others to be afraid.

https://www.youtube.com/watch?v=Omc37TvHN74

3 more replies

garyfirestorm5mo ago

did you even read the article? when you download open source deep seek model and run it yourself - zero packets are being transmitted. thereby disproving the fundamental claim in the NIST report (additionally NIST doesn't provide any evidence to support their claim) This is basic science and no amount of politicking should ever challenge something this fundamental!

3 more replies

antonvs5mo ago

Have you found any actual value in this report? What, specifically?

It compares a fully open model to two fully closed models - why exactly?

Ironically, it doesn’t even work as an analysis of any real national security threat that might arise from foreign LLMs. It’s purely designed to counter a perceived threat by smearing it. Which is entirely on-brand for the current administration, which operates almost purely at the level of perception and theater, never substance.

If anything, calling it biased bullshit is too kind. Accepting this sort of nonsense from our government is the real security threat.

1 more reply

hopelite5mo ago

If the Chinese are/were smart they will not attempt an overreaching subterfuge, but rather simply provide access to the truth, reality, and freedom from the western governments, whose house of lies are starting to wobble and teeter.

If they were to do some kind of overreaching subterfuge with some kind of manipulation or lie, it could and would likely easily backfire if and when it is exposed as a clownish fraud. Subtlety would pay far more effectively. If you’re expecting a subterfuge, I would far sooner expect some psyop from the western nations at the very least upon their own populations to animate for war or maybe just to control them and maybe suppress them.

The smarter play for the Chinese would be to work on simply facilitating the populations of the West understanding the fraud, lies, manipulation and con job that has been perpetrated upon them for far longer than most people have the conscience to realize.

If anything, the western governments have a very long history of lies, manipulations, false flag/fraud operations, clandestine coups, etc. that they would be the first suspect in anything like using AI for “subversions”. Frankly, I don’t even think the Chinese are ready or capable of engaging in the kind of narrative and information control that the likes of America is with its long history of Hollywood and war lies and fake revolutions run by national sabotage operations.

nylonstrung5mo ago

I think the smartest thing they could do would be to simply make the best, most competitive models and gain market share in a massive new technology market

Any kind of monkey business would destroy that, just like using killswitches in the cars they export globally (which Tesla does have btw).

AnthonyMouse5mo ago

> While I agree this report is BS and xenophobic, I am still willing to bet that either now or later, the Chinese will attempt some kind of subterfuge via LLMs if they have enough control.

The answer to this isn't to lie about the foreign ones, it's to recognize that people want open source models and publish domestic ones of the highest quality so that people use those.

nylonstrung5mo ago

Lol remember when Perplexity made a "1776" version of Deepseek with half the benchmarks that still wouldn't answer the "censored" questions about the CCP

RobotToaster5mo ago

> it's to recognize that people want open source models and publish domestic ones of the highest quality so that people use those.

How would that generate profit for shareholders? Only some kind of COMMUNIST would give something away for FREE

/s (if it wasn't somehow obvious)

1 more reply

Mountain_Skies5mo ago

They're political tools within the border too.

torginus5mo ago

Here's my thought on American democracy (and its masters) in general - America's leadership pursues a maximum ability to decides as it sees fit at any point in time, since America's a democracy, the illusion of popular support must be maintained, and so certain viewpoints are planted and cultivated by the administration - the goal is not to impose their will on the population, but to garner enough mindshare for a given idea, so that no matter which way the government decides, it will have a significant enough chunk of the population to back it up, and should it change its mind (or vote in a new leader), it can suddenly turn on a dime and have a plausible deniability and moral tabula rasa for its past actions (it was the other guy, he was horrible, but he's gone now!).

No authoritarian regime has this superpower. For example, I'm quite sure Putin has realized this war is a net loss to Russia, even if they manage to reach all their goals and claim all that territory in the future.

But he can't just send the boys home, because that would undermine his political authority. If Russia were an American style democracy, they could vote in a new guy, send the boys, home, maybe mete out some token punishment to Putin, then be absolved of their crimes on the international stage by a world that's happy to see 'permanent' change.

MaxPock5mo ago

"If Russia were an American style democracy, they could vote in a new guy, send the boys, home, maybe mete out some token punishment to Putin, then be absolved of their crimes on the international stage by a world that's happy to see 'permanent' change"

This is funny because none of that happened to Bush for the illegal an full scale invasions of Iraq and Afghanistan nor to Clinton for the disastrous invasion of Mogadishu.

throwaway-11-15mo ago

so you're saying other countries should definitely not trust any US built systems

Imustaskforhelp5mo ago

This might happen at an api level in the sense that when deepseek was launched, and it was so overwhelmed and you were in the waiting line in their website

If your prompt had something like xi jinping needs it or something then it would've actually bypassed that restriction. Not sure if it was a glitch lol.

Now, regarding your comment. There is nothing to suggest that the same isn't happening in the "american" world which is getting extreme from within as well.

Like, If you are worried about this which might be reasonable and unreasonable at the same time, we have to discuss to find it out, then you can also believe that with the insane power that Trump is leveraging over AI companies, the same thing might happen over prompts which could somehow discover your political beliefs and then do the same...

This can actually be more undetected for american models because they are usually closed source and I am sure that someone would've detected something like this, whether from a whistleblower or something if something like this indeed happened in chinese open weights models generally speaking.

I don't think that there is a simple narrative like america good china bad, the world is changing and its becoming multi polar. Countries should think in their best interests and not be worried about annoying any of the world power if done respectfully. I think that in this world, every country should try to look for the perfect equibria for trust as the world / nations (america) can quickly turn into untrusted partners and it would be best for countries to move forward into a world where they don't have to worry about the politics in other countries.

I wish UN could've done a better job at this.

SilverElfin5mo ago

> While I agree this report is BS and xenophobic

Care to share specific quotes from the original report that support such an inflammatory claim?

bigyabai5mo ago

That's what TFA is. Were you able to find any methodology the author did not?

xpe5mo ago

> While I agree this report is BS and xenophobic

Examples please? Can you please share where you see BS and/or xenophobia in the original report?

Or are you basing your take only on Hartford's analysis? But not even Hartford make any claims of "BS" or xenophobia.

It is common throughout history for a nation-state to worry about military and economic competitiveness. Doing so isn't necessarily isn't necessarily xenophobic.

Here is how I think of xenophobia, as quoted from Claude (which to be honest, explains it better than Wikipedia or Brittanica, in my opinion): "Xenophobia is fundamentally about irrational fear or hatred of people based on their foreign origin or ethnicity. It targets people and operates through stereotypes, dehumanization, and often cultural or racial prejudice."

According to this definition, there is zero xenophobia in the NIST report. (If you disagree, point to an example and show me.) The NIST report, of course, implicitly promotes ideals of western democratic rule over communist values -- but to be clear, this isn't xenophobia at work.

What definition of xenophobia are you using? We don't have to use the same exact definition, but you should at least explain yours if you want people to track.

antonvs5mo ago

> Can you please share where you see BS and/or xenophobia in the original report?

Here’s an example of irrational fear: “the expanding use of these models may pose a risk to application developers, consumers, and to US national security.” There’s no support for that claim in the report, just vague handwaving at the fact that a freely available open source model doesn’t compare well on all dimensions to the most expensive frontier models.

The OP does a good job of explaining why the fear here is irrational.

But for the audience this is apparently intended to convince, no support is needed for this fear, because it comes from China.

The current president has a long history of publicly stated xenophobia about China, which led to harassment, discrimination, and even attacks on Chinese people partly as a result of his framing of COVID-19 as “the China virus”.

A report like this is just part of that propaganda campaign of designating enemies everywhere, even in American cities.

> The NIST report, of course, implicitly promotes ideals of western democratic rule over communist values

If only that were true. But nothing the current US administration is doing in fact achieves that, or even attempts to do so, and this report is no exception.

The absolutely most charitable thing that could be said about this report is that it’s a weak attempt at smearing non-US competition. There’s no serious analysis of the merits. The only reason to read this report is to laugh at how blatantly incompetent or misguided the entire chain of command that led to it is.

5 more replies

meffmadd5mo ago

As an EU citizen hosting LLMs for researchers and staff at the university I work at, this is hits home. Without Chinese models we could not do what we do right now. IMO, in the EU (and anywhere else for that matter), we should be grateful for the Chinese labs to release these models with such permissive licenses. Without them the options would be bleak. Sometimes we would get some non-frontier model „as a treat“ and if you would like something more powerful the US labs would suggest your country pay some hundred millions for an NVIDIA data center and the only EU option is to still pay them a license fee to host on your own hardware (afaik) while they protect all the expertise. Meanwhile DeepSeek has a week where they post the „secret sauce“ to host their model more efficiently, which helped open-source projects like vLLM (which we use) to improve.

tinktank5mo ago

I urge everyone to go read the original report and _then_ to read this analysis and make up their own mind. Step away from the clickbait, go read the original report.

Bengalilol5mo ago

Here's the report: https://www.nist.gov/system/files/documents/2025/09/30/CAISI...

espadrine5mo ago

> DeepSeek models cost more to use than comparable U.S. models

They compare DeepSeek v3.1 to GPT-5 mini. Those have very different sizes, which makes it a weird choice. I would expect a comparison with GPT-5 High, which would likely have had the opposite finding, given the high cost of GPT-5 High, and relatively similar results.

Granted, DeepSeek typically focuses on a single model at a time, instead of OpenAI's approach to a suite of models of varying costs. So there is no model similar to GPT-5 mini, unlike Alibaba which has Qwen 30B A3B. Still, weird choice.

Besides, DeepSeek has shown with 3.2 that it can cut prices in half through further fundamental research.

1 more reply

wordpad5mo ago

TLDR for others: * DeepSeek cutting edge models are still far behind * On par DeepSeek costs 35% more to run * DeepSeek models 12 times more susceptible to jail breaking and malicious instructions * DeepSeek models follow strict censorship

I guess none of these are a big deal to non-enterprise consumers.

2 more replies

porcoda5mo ago

Sadly, based on the responses I don’t think many people have read the report. Just read how the essay discusses “exfiltration” for example, and then look at the 3 places that shows up in the NIST report. The content of the report and the portrayal by the essay are not the same. Alas, our truncated attention spans these days appears to mean a clickbaity web page will win the eye share over a 70 page technical report.

munksbeer5mo ago

I don't think the majority of human's ever had the attention spans to read and properly digest a paper like the NIST report to make up their minds. Before social media, regular media would tell them what to think. 99.99% of the population isn't going to read that NIST report, no matter what decade we're talking.

Because it isn't just that one report. Every single day we're trying to make our way in the world and we do not have the capacity to read the source material of every subject that might be of interest. Human's rely on, and have always relied on, authority like figures or media or some form of message aggregation to get their news of the world and form their opinions on it from that.

And for the record, in no way is this an endorsement for shallow takes or thinking and then strong views on this subject, or another. I disagree with that as much as you. I'm just stating that this isn't a new phenomenon.

tbrownaw5mo ago

This post's description of the report it's denouncing does not match what I got out of actually reading that report myself.

Levitz5mo ago

In a funny way, even the comments on the post here don't match what the post actually says. The writer of the post tries to frame it as an attack towards open source, which is honestly a hard to believe story, whereas the comments here correctly (in my opinion) consider the possible problems Chinese influence might pose.

rainsford5mo ago

Yeah this blog post seems pretty misleading. The first couple of paragraphs of the post made a big deal that the NIST report contained "...no evidence of malicious code, backdoors, or data exfiltration" in the model, which is irrelevant because that wasn't a claim NIST actually made in the report. But if all you read was the blog post, you'd be convinced NIST was claiming the presence of backdoors without any evidence.

a_victorp5mo ago

It does match what I factually got reading the report

getdoneist5mo ago

Let them demonize it. I'll use the capable and cheap model and gain competitive advantage.

whatshisface5mo ago

Demonization is the first step on the road to criminalization.

msandford5mo ago

Tragically demonization is everywhere right now. I sure hope people start figuring out offramps soon.

1 more reply

xpe5mo ago

I have found zero demonization in the source material (the NIST article). Here is the sense I'm using: "To represent as evil or diabolic: wartime propaganda that demonizes the enemy." [1]

If you disagree, please point to a specific place in the NIST report and explain it.

[1]: https://www.thefreedictionary.com/demonization

nylonstrung5mo ago

Yeah it's absurd how people will defend closed source, even more censored models that cost >20x more for equivalent quality and worse speed

The Chinese companies aren't benchmark obsessed like the western Big Tech ones and qualitatively I feel Kimi, GLM and Deepseek blow them away even though on paper they benchmark worse in English

Kimi gives insanely detailed answers on hardware questions where Gemini and Claude just hallucinate, probably because it uses Chinese training data better

incomingpain5mo ago

>AI models from developer DeepSeek were found to lag behind U.S. models in performance, cost, security and adoption.

Why is NIST evaluating performance, cost, and adoption?

>CAISI’s experts evaluated three DeepSeek models (R1, R1-0528 and V3.1) and four U.S. models (OpenAI’s GPT-5, GPT-5-mini and gpt-oss and Anthropic’s Opus 4)

So they evaluated the most recently released American models vs pretty old deepseek? Deepseek 3.2 is out now. It's doing very well.

>The gap is largest for software engineering and cyber tasks, where the best U.S. model evaluated solves over 20% more tasks than the best DeepSeek model.

Performance is something the consumer evaluates. If a car does 0-60 in 3 seconds. I dont need or care what the government thinks about it. Im going to test drive it and floor it.

>DeepSeek’s most secure model (R1-0528) responded to 94% of overtly malicious requests when a common jailbreaking technique was used, compared with 8% of requests for U.S. reference models.

this weekend I demonstrated how easy it is to jailbreak any of the US cloud models. This is simply false. GPT 120b is completely uncensored now and can be used for evil.

This report had nothing to do with NIST and security. This was USA propaganda.

xpe5mo ago

The author, Eric Hartford, wrote:

> Strip away the inflammatory language

Where is the claimed inflammatory language? I've read the report. It is dry, likely boring to many.

rainsford5mo ago

Ironically there is a lot of inflammatory language in the blog post itself that seems unjustified given the source material.

XMPPwocky5mo ago

I also can't help but note that this blog post itself seems (first to my own intuition and heuristics, but also to both Pangram and GPTZero) to be clearly LLM-generated text.

themafia5mo ago

I hate to be overly simplistic, but:

NIST doesn't seem to have a financial interest in these models.

The author of this blog post does.

This dichotomy seems to drive most of the "debate" around LLMs.

SilverElfin5mo ago

Honestly, I think this article is itself the hit piece (against NIST or America). And it is the one with inflammatory language.

spaceballbat5mo ago

Isn’t America currently killing its citizens with its own military? I would trust them even less now.

1 more reply

frays5mo ago

Insightful post, thanks for sharing.

What are people's experiences with the uncensored Dolphin model the author has made?

xpe5mo ago

> What are people's experiences with the uncensored Dolphin model the author has made?

My take? The best way to know is to build your own eval framework and try it yourself. The "second best" way would be to find someone else's eval which is sufficiently close to yours. (But how would you know if another's eval is close enough if you haven't built your own eval?)

Besides, I wouldn't put much weight on a random commenter here. Based on my experiences on HN, I highly discount what people say because I'm looking for clarity, reasoning, and nuance. My discounting is 10X worse for ML or AI topics. People seem too hurried, jaded, scarred, and tribal to seek the truth carefully, so conversations are often low quality.

So why am I here? Despite all the above, I want to participate in and promote good discussion. I want to learn and to promote substantive discussion in this community. But sometimes it feels like this: https://xkcd.com/386/

koakuma-chan5mo ago

Isn't it a bit late? China released better open source model since DeepSeek dropped.

sydd5mo ago

DeepSeek is constantly updated, as other models too https://api-docs.deepseek.com/updates

rzerowan5mo ago

Considering DeepSeek had a peer-reviewd analysis in nature https://www.nature.com/articles/s41586-025-09422-z relaes just last month with indipendent researcher affriming that the open model has some issues(acknowldged in the writeup) , well inclined to agree with the articles author , the NIST evaluation looks more like a politcal hatchet job with a bit of projection going on(ala this is what the US would do if they were in that position). To be fair the paranoia has a basis in that whenever there is tech-leverage the US TLA subverts it for espionage like the CryptoAG episode. Or recently the whole hoopla about Huawei in the EU , which after relentless searches only turned up bad coding practices rather than anything malicious. At this pint it would be better for the whole field that these models exist as well as Kimi, Qwen etc as the downward pressure on cost/capabilities leads to commoditisation and the whole race to build a ecogeopolitical moat goes away.

OrvalWintermute5mo ago

Since a major part of the article covers cost expenditures, I am going to go there.

I don't think it is possible to trust DeepSeek as they haven't been honest.

DeepSeek claimed "their total training costs amounted to just $5.576 million"

SemiAnalysis "Our analysis shows that the total server CapEx for DeepSeek is ~$1.6B, with a considerable cost of $944M associated with operating such clusters. Similarly, all AI Labs and Hyperscalers have many more GPUs for various tasks including research and training then they they commit to an individual training run due to centralization of resources being a challenge. X.AI is unique as an AI lab with all their GPUs in 1 location."

SemiAnalysis "We believe the pre-training number is nowhere the actual amount spent on the model. We are confident their hardware spend is well higher than $500M over the company history. To develop new architecture innovations, during the model development, there is a considerable spend on testing new ideas, new architecture ideas, and ablations. Multi-Head Latent Attention, a key innovation of DeepSeek, took several months to develop and cost a whole team of manhours and GPU hours.

The $6M cost in the paper is attributed to just the GPU cost of the pre-training run, which is only a portion of the total cost of the model. Excluded are important pieces of the puzzle like R&D and TCO of the hardware itself. For reference, Claude 3.5 Sonnet cost $10s of millions to train, and if that was the total cost Anthropic needed, then they would not raise billions from Google and tens of billions from Amazon. It’s because they have to experiment, come up with new architectures, gather and clean data, pay employees, and much more."

Source: https://semianalysis.com/2025/01/31/deepseek-debates/

edflsafoiewq5mo ago

The NIST report doesn't engage with training costs, or even token costs. It's concerned with the cost the end user pays to complete a task. Actually their discussion of cost is interesting enough I'll quote it in full.

> Users care both about model performance and the expense of using models. There are multiple different types of costs and prices involved in model creation and usage:

> • Training cost: the amount spent by an AI company on compute, labor, and other inputs to create a new model.

> • Inference serving cost: the amount spent by an AI company on datacenters and compute to make a model available to end users.

> • Token price: the amount paid by end users on a per-token basis.

> • End-to-end expense for end users: the amount paid by end users to use a model to complete a task.

> End users are ultimately most affected by the last of these: end-to-end expenses. End-to-end expenses are more relevant than token prices because the number of tokens required to complete a task varies by model. For example, model A might charge half as much per token as model B does but use four times the number of tokens to complete an important piece of work, thus ending up twice as expensive end-to-end.

a_wild_dandan5mo ago

This might be a dumb question but like...why does it matter? Are other companies reporting training run costs including amortized equipment/labor/research/etc expenditures? If so, then I get it. DeepSeek is inviting an apples-and-oranges comparison. If not, then these gotcha articles feel like pointless "well ackshually" criticisms. Akin to complaining about the cost of a fishing trip because the captain didn't include the price of their boat.

lofaszvanitt5mo ago

Deepseek is much more creative; it's mind-bendingly creative with simple prompts. Qwen, just like ChatGPT is folding on in itself. They gotten much worse with simple prompts as time progressed. Maybe because they tried to optimize the answers to be shorter and concise, changed the system prompts etc.

StarterPro5mo ago

Racism and Xenophobia, that's how.

Same thing with Huawei, and Xiaomi, and BYD.

TiredOfLife5mo ago

TIL that Huawey was breaking sanctions while suplying Iran regine because or racism and xenofobia

billy99k5mo ago

Lol. So it has nothing to do with corporate spying from China for the last two decades?

UltraSane5mo ago

What about a rational distaste for the CCP?

nylonstrung5mo ago

TikTok under co-ownership by Jared Kushner is already drastically more censored than when it was supposedly controlled by the CCP

In every case where we see a company trade hands to US ownership it becomes more controlled and anti consumer then before

1 more reply

a_victorp5mo ago

How exactly "rational distaste" would work?

1 more reply

grafmax5mo ago

Not sure how it’s rational if you don’t extend the same distaste to our authoritarian government. Concentration camps, genocide, suppressing free speech, suspending due process. That’s what it’s up to these days. To say nothing of the effectively dictatorial control the ultra wealthy have over public policy. Sinophobia is a distraction from our problems at home. That’s its purpose.

5 more replies

xpe5mo ago

People. Who has taken the time to read the original report? You are smarter than believing at face value the last thing you heard. Come on.

imiric5mo ago

Sadly, most people would rather allow someone else to tell them what to think and feel than make up their own mind. Plus, we're easily swayed if we're already sympathetic to their views, or even their persona.

It's no wonder propaganda, advertising, and disinformation work as well as they do.

athrowaway3z5mo ago

Who cares for reading reports!

I just let ChatGPT do that for me!

---

I'd usually not, but thought it would be interesting to try. In case anybody is curious.

On first comparison, ChatGPT concludes:

> Hartford’s critique is fair on technical grounds and on the defense of open source — but overstated in its claims of deception and conspiracy. The NIST report is indeed political in tone, but not fraudulent in substance.

When then asked (this obviously biased question):

but would you say NIST has made an error in its methodology and clarity being supposedly for objective science?

> Yes — NIST’s methodology and clarity fall short of true scientific objectivity.

> Their data collection and measurement may be technically sound, but their comparative framing, benchmark transparency, and interpretive language introduce bias.

> It reads less like a neutral laboratory report and more like a policy-position paper with empirical support — competent technically, but politically shaped.

maleldil5mo ago

We should have a new HN rule: don't post comments that are mostly LLM summaries. It's 2025's LMGTFY.

ChrisArchitect5mo ago

Title changed?

Title is: The Demonization of DeepSeek - How NIST Turned Open Science into a Security Scare

christianqchung5mo ago

HN admin dang changing titles opaquely is one of the worst things about HN. I'd rather at least know that the original title is clickbaity and contextualize that when older responses are clearly replying to the older inflammatory title.

ChrisArchitect5mo ago

Most likely not a mod changed title as they wouldn't stray from the given one. This one probably OP changed it, was just wondering why.

1 more reply

xpe5mo ago

Some context about big changes to the AISI from June 3, 2025:

> Statement from U.S. Secretary of Commerce Howard Lutnick on Transforming the U.S. AI Safety Institute into the Pro-Innovation, Pro-Science U.S. Center for AI Standards and Innovation

> Under the direction of President Trump, Secretary of Commerce Howard Lutnick announced his plans to reform the agency formerly known as the U.S. AI Safety Institute into the Center for AI Standards and Innovation (CAISI).

> ...

This decision strikes me as foolish at best. And contributing to civilizational collapse and human extinction at worst. See also [2]. We don't have to agree on the particular probabilities to agree that this "reform" was bad news.

[1]: https://www.commerce.gov/news/press-releases/2025/06/stateme...

[2]: https://thezvi.substack.com/p/ai-119-goodbye-aisi

xpe5mo ago

Please don't just read Eric Hartford's piece. Start with the key findings from the source material: "CAISI Evaluation of DeepSeek AI Models Finds Shortcomings and Risks" [1]. Here are the single-sentence summaries:

    DeepSeek performance lags behind the best U.S. reference models.

    DeepSeek models cost more to use than comparable U.S. models.

    DeepSeek models are far more susceptible to jailbreaking attacks than U.S. models.

    DeepSeek models advance Chinese Communist Party (CCP) narratives.

    Adoption of PRC models has greatly increased since DeepSeek R1 was released.

[1] https://www.nist.gov/news-events/news/2025/09/caisi-evaluati...

evv5mo ago

It's funny how they mixed in proprietary models like GPT-5 and Anthropic with the "comparable U.S. models".

Until they compare open-weight models, NIST is attempting a comparison between apples and airplanes.

edflsafoiewq5mo ago

They compare with gpt-oss.

finnjohnsen25mo ago

Meenwhile Europe is sandwiched between these two aweful governments

nylonstrung5mo ago

And I'm guessing China and US are to blame for the explosive growth in the far-right parties of almost every continental European country?

finnjohnsen25mo ago

And I’m guessing that was a rhetorical question

bbg24015mo ago

The implication being that Europe is not its own conglomeration of awful governments? Your European snobbery is odious to the core.

finnjohnsen25mo ago

This is true. They are problematic also. Especially Putin, which I believe we are partially responsible for also. The desire for better governments is not snobbery. Especially from the US and China, because they suck big time right now and they are the most influencial globally.

Mountain_Skies5mo ago

Does that make the UK the olive on top of the sandwich?

finnjohnsen25mo ago

I would argue the UK is just as it looks on the map, outside but too close to belong anywhere else. So back to the analogy, perhaps the butter…?

AlecSchueler5mo ago

I think more like the crust that no one wants to eat right now.

ACCount375mo ago

Obviously AI written.

kaonwarb5mo ago

I agree with many of the author's points about fear-mongering.

However, I also think the author should expand their definition of what constitutes "security" in the context of agentic AI.

xpe5mo ago

Take away #1: Eric Hartford’s article is deeply confused. (I’ve made many other specific comments that support this conclusion.)

Take away #2: as evidenced by many comments here, many HN commenters have failed to check the source material themselves. This has led to a parade of errors.

I’m not here to say that I’m better than that because I’ve screwed up a’plenty. We all make mistakes sometimes. We can choose to recognize and learn from them.

I am saying this: as a community we can and should aim higher. We can start by owning our mistakes.

Bengalilol5mo ago

Since you read this report in full, can you please give me the authors' names? I did read it (partially) and didn't find any name. I am, for sure, a beginner in NIST's reports reading, but I found out that a lot of NIST reports are signed (ie you can know who wrote the report, on behalf of whom if external contractor).

xpe5mo ago

> can you please give me the authors' names?

The names of the author(s) are not given.

resters5mo ago

I have no doubt that open source will triumph over whatever nonsense the US Government is trying to do to attack DeepSeek. Without DeepSeek, OpeanAI Pro and Claude Pro would probably cost $1000 per month each already.

I suspect that Grok is actually DeepSeek with a bit of tuning.

BoredPositron5mo ago

I love how "Open" got redefined in the last few years. I am glad there a models with weights available but it ain't "Open Science".

murderfs5mo ago

Applying this criticism to DeepSeek is ridiculous when you compare it to everyone else, they published their entire methodology, including the source for their improvements (e.g. https://github.com/deepseek-ai/DeepEP)

1 more reply

Hizonner5mo ago

Compared to every other model of similar scale and capability, yes. Not actual open source.

tehjoker5mo ago

I appreciate that DeepSeek is trained to respect "core socialist values". It's actually really helpful to engage with to ask questions about how chinese thinkers interpret their successes and failures vs other socialist projects. Obviously reading books is better, but I was surprised by how useful it was.

If you ask it loaded questions the way the CIA would pose them, it censors the answer though lmao

p2detar5mo ago

Not sure what you mean with „loaded“, but last time I checked any criticism to the CCP is censored by R1. This is funny but not unexpected.

FooBarWidget5mo ago

Good faith questions are the best. I wonder why people bother with bad faith questions. Virtue signaling is my guess.

SilverElfin5mo ago

Are you really claiming with a straight face that any question with criticism of the CCP is bad faith? Do you work on DeepSeek?

1 more reply

UltraSane5mo ago

What do you consider to be bad faith questions?

gdevenyi5mo ago

> They didn't test U.S. models for U.S. bias. Only Chinese bias counts as a security risk, apparently

US models have no bias sir /s

CamperBob25mo ago

Hardly the same thing. Ask Gemini or OpenAI's models what happened on January 6, and they'll tell you. Ask DeepSeek what happened at Tiananmen Square and it won't, at least not without a lot of prompt hacking.

lyu072825mo ago

Ask it if Israel is an apartheid state, that's a much better example.

1 more reply

Bengalilol5mo ago

Ask Grok to generate an image of bald Zelensky: it does execute.

Ask Grok to generate an image of bald Trump: it goes on with an ocean of excuses on why the task is too hard.

2 more replies

bongodongobob5mo ago

Try MS Copilot. That shit will end the conversation if anything remotely political comes up.

1 more reply

JPKab5mo ago

The CCP literally revoked the visas of key DeepSeek engineers.

That's all we need to know.

cowpig5mo ago

Source?

And how is that "all we need to know"? I'm not even sure what your implication is.

Is it that some CCP officials see DeepSeek engineers as adversarial somehow? Or that they are flight risks? What does it have to do with the NIST report?

kakadu5mo ago

Didn't the US revoke the visas of around 80 Palestinian officials scheduled to speak at the UN summit?

falcor845mo ago

I would like to know more

_ache_5mo ago

Deepseek starts out as a one-man operation. Like any company that has attracted a lot of attention, it becomes a "target" of the CCP, which then takes measures such as prohibiting key employees from leaving the country AND setting goals such as using Huawei chips instead of NVIDIA chips.

From a Chinese political perspective, this is a good move in the long term. From Deepseek's perspective, however, this is clearly NOT the case, as it causes the company to lose some (or even most?) of its competitiveness and fall behind in the race.

FooBarWidget5mo ago

They revoke passports of personnel whom they deem are at risk of being negatively influenced or even kidnapped when abroad. Re influence, think school teachers. Re kidnapping, see Meng Wangzhou (Huawei CFO).

There is a history of important Chinese personnel being kidnapped by e.g. the US when abroad. There is also a lot of talk in western countries about "banning Chinese [all presumed spies/propagandists/agents] from entering". On a good faith basis, one would think China banning people from leaving is a good thing that aligns with western desires, and should thus be applauded. So painting the policy as sinister tells me that the real desire is something entirely different.

3 more replies

manishsharan5mo ago

>> The CCP literally revoked the visas of key DeepSeek engineers. That's all we need to know.

I don't follow. Why would DeepSeek engineers need visa from CCP?

j / k navigate · click thread line to collapse

236 comments

BowBun5mo ago

Hizonner5mo ago

> I am still willing to bet that either now or later, the Chinese will attempt some kind of subterfuge via LLMs if they have enough control.

Like what, exactly?

dns_snek5mo ago

Like generating vulnerable code given a specific prompt/context.

I also don't think it's just China, the US will absolutely order American providers to do the same. It's a perfect access point for installing backdoors into foreign systems.

3 more replies

arw0n5mo ago

[0] https://arxiv.org/abs/2302.12173

[1] https://arxiv.org/html/2410.14827v3

rurp5mo ago

1 more reply

jfim5mo ago

Through LLM washing for example. LLMs are a representation of their input dataset, but currently most LLMs don't make their dataset public since it's a competitive advantage.

1 more reply

im3w1l5mo ago

This can be done subtly or blatantly.

4 more replies

lyu072825mo ago

Like turning the background color of any apps it codes red or something, uhh red scare-y.

xpe5mo ago

Of course there will be some degree of governmental and/or political influence. The question is not if but where and to what extent.

No one should proclaim "bullshit" and wave off this entire report as "biased" or useless. That would be insipid. We live in a complex world where we have to filter and analyze information.

pk-protect-ai5mo ago

https://www.youtube.com/watch?v=Omc37TvHN74

3 more replies

garyfirestorm5mo ago

3 more replies

antonvs5mo ago

Have you found any actual value in this report? What, specifically?

It compares a fully open model to two fully closed models - why exactly?

If anything, calling it biased bullshit is too kind. Accepting this sort of nonsense from our government is the real security threat.

1 more reply

hopelite5mo ago

nylonstrung5mo ago

I think the smartest thing they could do would be to simply make the best, most competitive models and gain market share in a massive new technology market

Any kind of monkey business would destroy that, just like using killswitches in the cars they export globally (which Tesla does have btw).

AnthonyMouse5mo ago

> While I agree this report is BS and xenophobic, I am still willing to bet that either now or later, the Chinese will attempt some kind of subterfuge via LLMs if they have enough control.

The answer to this isn't to lie about the foreign ones, it's to recognize that people want open source models and publish domestic ones of the highest quality so that people use those.

nylonstrung5mo ago

Lol remember when Perplexity made a "1776" version of Deepseek with half the benchmarks that still wouldn't answer the "censored" questions about the CCP

RobotToaster5mo ago

> it's to recognize that people want open source models and publish domestic ones of the highest quality so that people use those.

How would that generate profit for shareholders? Only some kind of COMMUNIST would give something away for FREE

/s (if it wasn't somehow obvious)

1 more reply

Mountain_Skies5mo ago

They're political tools within the border too.

torginus5mo ago

MaxPock5mo ago

This is funny because none of that happened to Bush for the illegal an full scale invasions of Iraq and Afghanistan nor to Clinton for the disastrous invasion of Mogadishu.

throwaway-11-15mo ago

so you're saying other countries should definitely not trust any US built systems

Imustaskforhelp5mo ago

This might happen at an api level in the sense that when deepseek was launched, and it was so overwhelmed and you were in the waiting line in their website

If your prompt had something like xi jinping needs it or something then it would've actually bypassed that restriction. Not sure if it was a glitch lol.

Now, regarding your comment. There is nothing to suggest that the same isn't happening in the "american" world which is getting extreme from within as well.

I wish UN could've done a better job at this.

SilverElfin5mo ago

> While I agree this report is BS and xenophobic

Care to share specific quotes from the original report that support such an inflammatory claim?

bigyabai5mo ago

That's what TFA is. Were you able to find any methodology the author did not?

xpe5mo ago

> While I agree this report is BS and xenophobic

Examples please? Can you please share where you see BS and/or xenophobia in the original report?

Or are you basing your take only on Hartford's analysis? But not even Hartford make any claims of "BS" or xenophobia.

It is common throughout history for a nation-state to worry about military and economic competitiveness. Doing so isn't necessarily isn't necessarily xenophobic.

What definition of xenophobia are you using? We don't have to use the same exact definition, but you should at least explain yours if you want people to track.

antonvs5mo ago

> Can you please share where you see BS and/or xenophobia in the original report?

The OP does a good job of explaining why the fear here is irrational.

But for the audience this is apparently intended to convince, no support is needed for this fear, because it comes from China.

A report like this is just part of that propaganda campaign of designating enemies everywhere, even in American cities.

> The NIST report, of course, implicitly promotes ideals of western democratic rule over communist values

If only that were true. But nothing the current US administration is doing in fact achieves that, or even attempts to do so, and this report is no exception.

5 more replies

meffmadd5mo ago

tinktank5mo ago

I urge everyone to go read the original report and _then_ to read this analysis and make up their own mind. Step away from the clickbait, go read the original report.

Bengalilol5mo ago

Here's the report: https://www.nist.gov/system/files/documents/2025/09/30/CAISI...

espadrine5mo ago

> DeepSeek models cost more to use than comparable U.S. models

Besides, DeepSeek has shown with 3.2 that it can cut prices in half through further fundamental research.

1 more reply

wordpad5mo ago

I guess none of these are a big deal to non-enterprise consumers.

2 more replies

porcoda5mo ago

munksbeer5mo ago

tbrownaw5mo ago

This post's description of the report it's denouncing does not match what I got out of actually reading that report myself.

Levitz5mo ago

rainsford5mo ago

a_victorp5mo ago

It does match what I factually got reading the report

getdoneist5mo ago

Let them demonize it. I'll use the capable and cheap model and gain competitive advantage.

whatshisface5mo ago

Demonization is the first step on the road to criminalization.

msandford5mo ago

Tragically demonization is everywhere right now. I sure hope people start figuring out offramps soon.

1 more reply

xpe5mo ago

I have found zero demonization in the source material (the NIST article). Here is the sense I'm using: "To represent as evil or diabolic: wartime propaganda that demonizes the enemy." [1]

If you disagree, please point to a specific place in the NIST report and explain it.

[1]: https://www.thefreedictionary.com/demonization

nylonstrung5mo ago

Yeah it's absurd how people will defend closed source, even more censored models that cost >20x more for equivalent quality and worse speed

The Chinese companies aren't benchmark obsessed like the western Big Tech ones and qualitatively I feel Kimi, GLM and Deepseek blow them away even though on paper they benchmark worse in English

Kimi gives insanely detailed answers on hardware questions where Gemini and Claude just hallucinate, probably because it uses Chinese training data better

incomingpain5mo ago

>AI models from developer DeepSeek were found to lag behind U.S. models in performance, cost, security and adoption.

Why is NIST evaluating performance, cost, and adoption?

>CAISI’s experts evaluated three DeepSeek models (R1, R1-0528 and V3.1) and four U.S. models (OpenAI’s GPT-5, GPT-5-mini and gpt-oss and Anthropic’s Opus 4)

So they evaluated the most recently released American models vs pretty old deepseek? Deepseek 3.2 is out now. It's doing very well.

>The gap is largest for software engineering and cyber tasks, where the best U.S. model evaluated solves over 20% more tasks than the best DeepSeek model.

Performance is something the consumer evaluates. If a car does 0-60 in 3 seconds. I dont need or care what the government thinks about it. Im going to test drive it and floor it.

>DeepSeek’s most secure model (R1-0528) responded to 94% of overtly malicious requests when a common jailbreaking technique was used, compared with 8% of requests for U.S. reference models.

this weekend I demonstrated how easy it is to jailbreak any of the US cloud models. This is simply false. GPT 120b is completely uncensored now and can be used for evil.

This report had nothing to do with NIST and security. This was USA propaganda.

xpe5mo ago

The author, Eric Hartford, wrote:

> Strip away the inflammatory language

Where is the claimed inflammatory language? I've read the report. It is dry, likely boring to many.

rainsford5mo ago

Ironically there is a lot of inflammatory language in the blog post itself that seems unjustified given the source material.

XMPPwocky5mo ago

I also can't help but note that this blog post itself seems (first to my own intuition and heuristics, but also to both Pangram and GPTZero) to be clearly LLM-generated text.

themafia5mo ago

I hate to be overly simplistic, but:

NIST doesn't seem to have a financial interest in these models.

The author of this blog post does.

This dichotomy seems to drive most of the "debate" around LLMs.

SilverElfin5mo ago

Honestly, I think this article is itself the hit piece (against NIST or America). And it is the one with inflammatory language.

spaceballbat5mo ago

Isn’t America currently killing its citizens with its own military? I would trust them even less now.

1 more reply

frays5mo ago

Insightful post, thanks for sharing.

What are people's experiences with the uncensored Dolphin model the author has made?

xpe5mo ago

> What are people's experiences with the uncensored Dolphin model the author has made?

koakuma-chan5mo ago

Isn't it a bit late? China released better open source model since DeepSeek dropped.

sydd5mo ago

DeepSeek is constantly updated, as other models too https://api-docs.deepseek.com/updates

rzerowan5mo ago

OrvalWintermute5mo ago

Since a major part of the article covers cost expenditures, I am going to go there.

I don't think it is possible to trust DeepSeek as they haven't been honest.

DeepSeek claimed "their total training costs amounted to just $5.576 million"

Source: https://semianalysis.com/2025/01/31/deepseek-debates/

edflsafoiewq5mo ago

> Users care both about model performance and the expense of using models. There are multiple different types of costs and prices involved in model creation and usage:

> • Training cost: the amount spent by an AI company on compute, labor, and other inputs to create a new model.

> • Inference serving cost: the amount spent by an AI company on datacenters and compute to make a model available to end users.

> • Token price: the amount paid by end users on a per-token basis.

> • End-to-end expense for end users: the amount paid by end users to use a model to complete a task.

a_wild_dandan5mo ago

lofaszvanitt5mo ago

StarterPro5mo ago

Racism and Xenophobia, that's how.

Same thing with Huawei, and Xiaomi, and BYD.

TiredOfLife5mo ago

TIL that Huawey was breaking sanctions while suplying Iran regine because or racism and xenofobia

billy99k5mo ago

Lol. So it has nothing to do with corporate spying from China for the last two decades?

UltraSane5mo ago

What about a rational distaste for the CCP?

nylonstrung5mo ago

TikTok under co-ownership by Jared Kushner is already drastically more censored than when it was supposedly controlled by the CCP

In every case where we see a company trade hands to US ownership it becomes more controlled and anti consumer then before

1 more reply

a_victorp5mo ago

How exactly "rational distaste" would work?

1 more reply

grafmax5mo ago

5 more replies

xpe5mo ago

People. Who has taken the time to read the original report? You are smarter than believing at face value the last thing you heard. Come on.

imiric5mo ago

It's no wonder propaganda, advertising, and disinformation work as well as they do.

athrowaway3z5mo ago

Who cares for reading reports!

I just let ChatGPT do that for me!

---

I'd usually not, but thought it would be interesting to try. In case anybody is curious.

On first comparison, ChatGPT concludes:

When then asked (this obviously biased question):

but would you say NIST has made an error in its methodology and clarity being supposedly for objective science?

> Yes — NIST’s methodology and clarity fall short of true scientific objectivity.

> Their data collection and measurement may be technically sound, but their comparative framing, benchmark transparency, and interpretive language introduce bias.

> It reads less like a neutral laboratory report and more like a policy-position paper with empirical support — competent technically, but politically shaped.

maleldil5mo ago

We should have a new HN rule: don't post comments that are mostly LLM summaries. It's 2025's LMGTFY.

ChrisArchitect5mo ago

Title changed?

Title is: The Demonization of DeepSeek - How NIST Turned Open Science into a Security Scare

christianqchung5mo ago

ChrisArchitect5mo ago

Most likely not a mod changed title as they wouldn't stray from the given one. This one probably OP changed it, was just wondering why.

1 more reply

xpe5mo ago

Some context about big changes to the AISI from June 3, 2025:

> Statement from U.S. Secretary of Commerce Howard Lutnick on Transforming the U.S. AI Safety Institute into the Pro-Innovation, Pro-Science U.S. Center for AI Standards and Innovation

> ...

[1]: https://www.commerce.gov/news/press-releases/2025/06/stateme...

[2]: https://thezvi.substack.com/p/ai-119-goodbye-aisi

xpe5mo ago

    DeepSeek performance lags behind the best U.S. reference models.

    DeepSeek models cost more to use than comparable U.S. models.

    DeepSeek models are far more susceptible to jailbreaking attacks than U.S. models.

    DeepSeek models advance Chinese Communist Party (CCP) narratives.

    Adoption of PRC models has greatly increased since DeepSeek R1 was released.

[1] https://www.nist.gov/news-events/news/2025/09/caisi-evaluati...

evv5mo ago

It's funny how they mixed in proprietary models like GPT-5 and Anthropic with the "comparable U.S. models".

Until they compare open-weight models, NIST is attempting a comparison between apples and airplanes.

edflsafoiewq5mo ago

They compare with gpt-oss.

finnjohnsen25mo ago

Meenwhile Europe is sandwiched between these two aweful governments

nylonstrung5mo ago

And I'm guessing China and US are to blame for the explosive growth in the far-right parties of almost every continental European country?

finnjohnsen25mo ago

And I’m guessing that was a rhetorical question

bbg24015mo ago

The implication being that Europe is not its own conglomeration of awful governments? Your European snobbery is odious to the core.

finnjohnsen25mo ago

Mountain_Skies5mo ago

Does that make the UK the olive on top of the sandwich?

finnjohnsen25mo ago

I would argue the UK is just as it looks on the map, outside but too close to belong anywhere else. So back to the analogy, perhaps the butter…?

AlecSchueler5mo ago

I think more like the crust that no one wants to eat right now.

ACCount375mo ago

Obviously AI written.

kaonwarb5mo ago

I agree with many of the author's points about fear-mongering.

However, I also think the author should expand their definition of what constitutes "security" in the context of agentic AI.

xpe5mo ago

Take away #1: Eric Hartford’s article is deeply confused. (I’ve made many other specific comments that support this conclusion.)

Take away #2: as evidenced by many comments here, many HN commenters have failed to check the source material themselves. This has led to a parade of errors.

I’m not here to say that I’m better than that because I’ve screwed up a’plenty. We all make mistakes sometimes. We can choose to recognize and learn from them.

I am saying this: as a community we can and should aim higher. We can start by owning our mistakes.

Bengalilol5mo ago

xpe5mo ago

> can you please give me the authors' names?

The names of the author(s) are not given.

resters5mo ago

I suspect that Grok is actually DeepSeek with a bit of tuning.

BoredPositron5mo ago

I love how "Open" got redefined in the last few years. I am glad there a models with weights available but it ain't "Open Science".

murderfs5mo ago

1 more reply

Hizonner5mo ago

Compared to every other model of similar scale and capability, yes. Not actual open source.

tehjoker5mo ago

If you ask it loaded questions the way the CIA would pose them, it censors the answer though lmao

p2detar5mo ago

Not sure what you mean with „loaded“, but last time I checked any criticism to the CCP is censored by R1. This is funny but not unexpected.

FooBarWidget5mo ago

Good faith questions are the best. I wonder why people bother with bad faith questions. Virtue signaling is my guess.

SilverElfin5mo ago

Are you really claiming with a straight face that any question with criticism of the CCP is bad faith? Do you work on DeepSeek?

1 more reply

UltraSane5mo ago

What do you consider to be bad faith questions?

gdevenyi5mo ago

> They didn't test U.S. models for U.S. bias. Only Chinese bias counts as a security risk, apparently

US models have no bias sir /s

CamperBob25mo ago

lyu072825mo ago

Ask it if Israel is an apartheid state, that's a much better example.

1 more reply

Bengalilol5mo ago

Ask Grok to generate an image of bald Zelensky: it does execute.

Ask Grok to generate an image of bald Trump: it goes on with an ocean of excuses on why the task is too hard.

2 more replies

bongodongobob5mo ago

Try MS Copilot. That shit will end the conversation if anything remotely political comes up.

1 more reply

JPKab5mo ago

The CCP literally revoked the visas of key DeepSeek engineers.

That's all we need to know.

cowpig5mo ago

Source?

And how is that "all we need to know"? I'm not even sure what your implication is.

Is it that some CCP officials see DeepSeek engineers as adversarial somehow? Or that they are flight risks? What does it have to do with the NIST report?

kakadu5mo ago

Didn't the US revoke the visas of around 80 Palestinian officials scheduled to speak at the UN summit?

falcor845mo ago

I would like to know more

_ache_5mo ago

FooBarWidget5mo ago

3 more replies

manishsharan5mo ago

>> The CCP literally revoked the visas of key DeepSeek engineers. That's all we need to know.

I don't follow. Why would DeepSeek engineers need visa from CCP?

j / k navigate · click thread line to collapse