undefined | Better HN

undefined | Better HN

0 comments

throwup2381y ago

It’s not better. In most of my tests (C++/QT code) it just runs out of context before it can really do anything. And the output is very bad - it mashes together the header and cpp file. The reasoning output is fun to look at and occasionally useful though.

The max token output is only 8K (32K thinking tokens). O1 is 128k, which is far more useful, and it doesn’t get stuck like R1 does.

The hype around the DeepSeek release is insane and I’m starting to really doubt their numbers.

sho_hn1y ago

Is this a local run of one of the smaller models and/or other-models-distilled-with-r1, or are you using their Chat interface?

I've also compared o1 and (online-hosted) r1 on Qt/C++ code, being a KDE Plasma dev, and my impression so far was that the output is roughly on par. I've given both models some tricky tasks about dark corners of the meta-object system in crafting classes etc. and they came up with generally the same sort of suggestions and implementations.

I do appreciate that "asking about gotchas with few definitive solutions, even if they require some perspective" and "rote day-to-day coding ops" are very different benchmarks due to how things are represented in the training data corpus, though.

throwup2381y ago

I use it through Kagi Assistant which has the proper R1 model through Together.ai/Fireworks.ai

My standard test is to ask the model to write a QSyntaxHighlighter subclass that uses TreeSitter to implement syntax highlighting. O1 can do it after a few iterations, but R1’s output has been a mess. That said, its thought process revealed a few issues that I then fixed in my canonical implementation.

nialv71y ago

Tried this on chat.deepseek.com, it seems to be able to do it.

CamperBob21y ago

Some have said (for what little that's worth) that Kagi's version is not the real thing, but one of the distillations.

sho_hn1y ago

Thanks for adding detail! My prompts have been very in-the-bubble-of-Qt I'd say, less so about mashing together Qt and something else, which I agree is a good real-world test case.

R1 is trained for a context length of 128K. Where are you getting 8K/32K? The model doesn't distinguish "thinking" tokens and "output" tokens, so this must be some specific API limitations.

throwup2381y ago

> max_tokens：The maximum length of the final response after the CoT output is completed, defaulting to 4K, with a maximum of 8K. Note that the CoT output can reach up to 32K tokens, and the parameter to control the CoT length (reasoning_effort) will be available soon. [1]

[1] https://api-docs.deepseek.com/guides/reasoning_model

So yes, it's a limitation of their own API at the moment, not a model limitation.

coliveira1y ago

He's just repeating a lot of disinformation that has been released about deepseek in the last few days. People who took the time to test DeepSeek models know that the results have the same or better quality for coding tasks.

goosejuice1y ago

Benchmarks are great to have but individual/org experiences on specific codebases still matter tremendously.

If an org consistently finds one model performs worse on their corpus than another, they aren't going to keep using it because it ranks higher in some set of benchmarks.

sheepdestroyer1y ago

There are R1 providers on openrouter with bigger input/output token limitations than what DeepSeek's API access currently offers.

For instance Fireworks offers R1 with 164K/164K. They are far more expensive than DeepSeek though

api1y ago

It's not great at super-complex tasks due to limited context, but it's quite a good "junior intern that has memorized the Internet." Local deepseek-r1 on my laptop (M1 w/64GiB RAM) can answer about any question I can throw at it... as long as it's not something on China's censored list. :)

How are you running r1 on 64mb of ram? I’m guessing you’re running a distill which is not r1

api1y ago

The 70b distill at 4bit quantize fits, so yes, and performance and quality seem pretty good. I can't run the gigantic one.

adamnemecek1y ago

Thanks for saying this, I thought I was insane, DeepSeek is kinda bad. I guess it’s impressive all things considered but in absolute terms it’s not great.

coliveira1y ago

I have run personal tests and the results are at least as good as I get from OpenAI. Smarter people have also reached the same conclusion. Of course you can find contrary datapoints, but it doesn't change the big picture.

sebzim45001y ago

To be fair, it's amazing by the standards of six months ago. The only models that beat it are o1, the latest gemini models and (for some things) sonnet 3.6

cdelsolar1y ago

false. It seems better than o1 to me.

marricksOP1y ago

> it just runs out of context before it can really do anything

I mean, couldn't that be because they're just overwhelmed by users at the moment?

> And the output is very bad - it mashes together the header and cpp file

That sounds way worse, and like, not something caused by being hugged to death though.

Aider recently stated DeepSeek is placed a the top of their benchmark though[1] so I'm inclined to believe it isn't all hype.

[1] https://aider.chat/docs/llms/deepseek.html

throwup2381y ago

It’s definitely not all hype, it really is a breakthrough for open source reasoning models. I don’t mean to diminish their contribution, especially since being able to read the reasoning output is a very interesting new modality (for lack of a better word) for me as a developer.

It’s just not as impressive as people make it out to be. It might be better than o1 on Python or Javascript thats all over the training data, but o1 is overwhelmingly better at anything outside the happy path.

beAbU1y ago

How can they ban something thats open source that you can just run on your own hardware?

fabianhjr1y ago

There are illegal numbers in the USA land of the "free".

https://en.wikipedia.org/wiki/Illegal_number

> An AACS encryption key (09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0) that came to prominence in May 2007 is an example of a number claimed to be a secret, and whose publication or inappropriate possession is claimed to be illegal in the United States.

JumpCrisscross1y ago

> illegal numbers in the USA land of the "free"

This is a silly take for anyone in tech. Any binary sequence is a number. Any information can be, for practical purposes, rendered in binary [1].

Getting worked up about restrictions on numbers works as a meme, for the masses, because it sounds silly, but is tantamount to technically arguing against privacy, confidentiality, the concept of national secrets, IP as a whole, et cetera.

[1] https://en.m.wikipedia.org/wiki/Shannon%27s_source_coding_th...

fabianhjr1y ago

Good thing that is part of the wikipedia entry:

> Any piece of digital information is representable as a number; consequently, if communicating a specific set of information is illegal in some way, then the number may be illegal as well.

sheepdestroyer1y ago

All those things are not self-evident and thus debatable

suraci1y ago

> is collecting rain water illegal?

> It depends on where you live. In many places, collecting rainwater is completely legal and even encouraged, but some regions have regulations or restrictions.

United States: Most states allow rainwater collection, but some have restrictions on how much you can collect or how it can be used. For example, Colorado has limits on the amount of rainwater homeowners can store. Australia: Generally legal and encouraged, with many homes using rainwater tanks. UK & Canada: Legal with few restrictions. India & Many Other Countries: Often encouraged due to water scarcity.

bloopernova1y ago

That takes me back! Fark.com would delete any comment that contained random hexadecimal.

KPGv21y ago

It was the beginning of the end for Digg, too, IIRC. Started a lot of people leaving for Reddit, right?

KPGv21y ago

> whose publication or inappropriate possession is claimed to be illegal in the United States.

That's not the same thing as a number being illegal at all. Here, watch this:

> I claim breathing is illegal in the United States

There, now breathing is claimed to be illegal in the United States.

I-M-S1y ago

In both cases, legality depends entirely on repercussions, i.e. if there's someone to enforce the ban. I suspect that in the "illegal numbers" case there might be.

shafyy1y ago

It's not open source. The provide the model and the weights, but not the source code and, crucially, the training data. As long as LLM makers don't provide the training data (and they never will, because then they will be admitting to stealing), LLMs are never going to be open source.

sho_hn1y ago

Thanks for reminding people of this.

Open source means two things in spirit:

(a) You have everything you need to be able to re-create something, and at any step of the process change it.

(b) You have broad permissions how to put the result to use.

The "open source" models from both Meta so far fail either both or one of these checks (Meta's fails both). We should resist the dilution of the term open source to the point where it means nothing useful.

jprete1y ago

I think people are looking for the term "freeware" although the connotations don't match.

KPGv21y ago

At the risk of being called rms, no, that's not what open source means. Open source just means you have access to the source code. Which you do. Code that is open source but restrictively licensed is still open source.

That's why terms like "libre" were born to describe certain kinds of software. And that's what you're describing.

This is a debate that started, like, twenty years ago or something when we started getting big code projects that were open source but encumbered by patents so that they couldn't be redistributed, but could still be read and modified for internal use.

HDThoreaun1y ago

Open source means the source code is freely available. It’s in the name.

coliveira1y ago

People say this, but when it comes to AI models, the training data is not owned by these companies/groups, so it cannot be "open sourced" in any sense. And the training code is basically accessing that training data that cannot be open sourced, therefore it also cannot be shared. So the full open source model you wish to have can only provide subpar results.

sheepdestroyer1y ago

They could easily list the data used though. These datasets are mostly known and floating around. When they are constructed, instructions for replication could be provided too

Timon31y ago

Isn't this the same situation that any codebase faces when one thinks about open sourcing it? I can't legally open source the code I don't own.

beAbU1y ago

Thanks, I was not aware of this distinction.

But I think my argument still stands though? Users can run Deepseek locally, so unless the US Gov't wants to reach for book burning levels or idiocy, there is not really a feasible way to ban the American public of running DeepSeek, no?

shafyy1y ago

Yes, your argument still stands. But I think it's important to stand firm that the term "open source" is not a good label for what these "freeware" LLMs are.

There was an executive order passed by the previous administration that make using anything with more than 10 billion parameters illegal and punishable by government force if done without authorization. Of course like most government regulations (even though this is not a regulation, it is an executive action) the point is not to stop the behavior but instead to create a system where everyone breaks the regulation constantly so that if anyone rocks the boat they can be indicted/charged and dealt with.

https://www.federalregister.gov/documents/2023/11/01/2023-24...

>(k) The term “dual-use foundation model” means an AI model that is trained on broad data; generally uses self-supervision; contains at least tens of billions of parameters; is applicable across a wide range of contexts; and that exhibits, or could be easily modified to exhibit, high levels of performance at tasks that pose a serious risk to security, national economic security, national public health or safety, or any combination of those matters, such as by: ...

That order does not "make using anything with more than 10 billion parameters illegal and punishable by government force if done without authorization".

It orders the Secretary of Commerce to "solicit input from the private sector, academia, civil society, and other stakeholders through a public consultation process on potential risks, benefits, other implications, and appropriate policy and regulatory approaches related to dual-use foundation models for which the model weights are widely available".

derektank1y ago

Many regulations are created by executive action, without input from Congress. The Council on Environmental Quality, created by the National Environmental Policy Act, has the power to issue it's own regulations. Executive Orders can function similarly and the executive can order rulemaking bodies to create and remove regulations, though there is a judicial effort to restrict this kind of policymaking and return regulatory power back to Congress.

There’s an effort to restrict certain regulatory rule-making where it’s ideologically convenient, but it isn’t “returning” regulatory power. That rulemaking authority isn’t derived by some bullshit executive order, but by Federal law, as implemented by congress.

Congress has never ceded power to anyone. They wield legislative authority and power of the purse, and wield it as they see fit. The special interests campaigning about this are extreme reactionaries whose stated purpose is to make government ineffective.

If I'm no wrong wasn't PGP encryption once illegal to export ? Not quite the same but the government has a nice habit of feeling like they can bad the export of research.

https://en.wikipedia.org/wiki/Export_of_cryptography_from_th...

Prbeek1y ago

Add PS1 too. The US government banned sale of PlayStation to China because the PLA would apparently have access to cutting edge chips for their missiles

beAbU1y ago

You are right, but I cannot find a single example of such a ban actually being effective though. Information wants to be free and all that.

KPGv21y ago

Because you haven't heard of the proprietary software that wasn't ever sold internationally because of these bans.

Of course Joe Sixpack can throw their code up anywhere, but Joe Corporation gets wrecked if they try to sell it.

https://developer.apple.com/documentation/security/complying...

For example, this is enforced by Apple Store.

coliveira1y ago

But that's not the goal, the goal is to protect the "intelectual property" only to American companies. Countries not in the "friends list" cannot sell products in that area without suffering repercussions. That's how the US has maintained technological dominance in some areas by restricting what other countries can do.

calgoo1y ago

If i remember correctly, if you changed the dropdown on the webpage to USA you could download the full version of PGP anyway.

Make commercial hosting illegal, and make the hardware to run it locally cost $6000+

Drakim1y ago

They banned certain branches of math during the cold war, it can be done.

Such as?

Drakim1y ago

All non-trivial encryption algorithms.

https://en.wikipedia.org/wiki/Crypto_Wars

I never said they are just a clone! There's an actual tech breakthrough!

Read the two following sections of my blog post:

1. "Distilled language models"

2. "DeepSeek: Less supervision"

j / k navigate · click thread line to collapse