story
The max token output is only 8K (32K thinking tokens). O1 is 128k, which is far more useful, and it doesn’t get stuck like R1 does.
The hype around the DeepSeek release is insane and I’m starting to really doubt their numbers.
I've also compared o1 and (online-hosted) r1 on Qt/C++ code, being a KDE Plasma dev, and my impression so far was that the output is roughly on par. I've given both models some tricky tasks about dark corners of the meta-object system in crafting classes etc. and they came up with generally the same sort of suggestions and implementations.
I do appreciate that "asking about gotchas with few definitive solutions, even if they require some perspective" and "rote day-to-day coding ops" are very different benchmarks due to how things are represented in the training data corpus, though.
My standard test is to ask the model to write a QSyntaxHighlighter subclass that uses TreeSitter to implement syntax highlighting. O1 can do it after a few iterations, but R1’s output has been a mess. That said, its thought process revealed a few issues that I then fixed in my canonical implementation.
If an org consistently finds one model performs worse on their corpus than another, they aren't going to keep using it because it ranks higher in some set of benchmarks.
For instance Fireworks offers R1 with 164K/164K. They are far more expensive than DeepSeek though
I mean, couldn't that be because they're just overwhelmed by users at the moment?
> And the output is very bad - it mashes together the header and cpp file
That sounds way worse, and like, not something caused by being hugged to death though.
Aider recently stated DeepSeek is placed a the top of their benchmark though[1] so I'm inclined to believe it isn't all hype.
It’s just not as impressive as people make it out to be. It might be better than o1 on Python or Javascript thats all over the training data, but o1 is overwhelmingly better at anything outside the happy path.
https://en.wikipedia.org/wiki/Illegal_number
> An AACS encryption key (09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0) that came to prominence in May 2007 is an example of a number claimed to be a secret, and whose publication or inappropriate possession is claimed to be illegal in the United States.
This is a silly take for anyone in tech. Any binary sequence is a number. Any information can be, for practical purposes, rendered in binary [1].
Getting worked up about restrictions on numbers works as a meme, for the masses, because it sounds silly, but is tantamount to technically arguing against privacy, confidentiality, the concept of national secrets, IP as a whole, et cetera.
[1] https://en.m.wikipedia.org/wiki/Shannon%27s_source_coding_th...
> Any piece of digital information is representable as a number; consequently, if communicating a specific set of information is illegal in some way, then the number may be illegal as well.
> It depends on where you live. In many places, collecting rainwater is completely legal and even encouraged, but some regions have regulations or restrictions.
United States: Most states allow rainwater collection, but some have restrictions on how much you can collect or how it can be used. For example, Colorado has limits on the amount of rainwater homeowners can store. Australia: Generally legal and encouraged, with many homes using rainwater tanks. UK & Canada: Legal with few restrictions. India & Many Other Countries: Often encouraged due to water scarcity.
That's not the same thing as a number being illegal at all. Here, watch this:
> I claim breathing is illegal in the United States
There, now breathing is claimed to be illegal in the United States.
Open source means two things in spirit:
(a) You have everything you need to be able to re-create something, and at any step of the process change it.
(b) You have broad permissions how to put the result to use.
The "open source" models from both Meta so far fail either both or one of these checks (Meta's fails both). We should resist the dilution of the term open source to the point where it means nothing useful.
That's why terms like "libre" were born to describe certain kinds of software. And that's what you're describing.
This is a debate that started, like, twenty years ago or something when we started getting big code projects that were open source but encumbered by patents so that they couldn't be redistributed, but could still be read and modified for internal use.
But I think my argument still stands though? Users can run Deepseek locally, so unless the US Gov't wants to reach for book burning levels or idiocy, there is not really a feasible way to ban the American public of running DeepSeek, no?
https://www.federalregister.gov/documents/2023/11/01/2023-24...
>(k) The term “dual-use foundation model” means an AI model that is trained on broad data; generally uses self-supervision; contains at least tens of billions of parameters; is applicable across a wide range of contexts; and that exhibits, or could be easily modified to exhibit, high levels of performance at tasks that pose a serious risk to security, national economic security, national public health or safety, or any combination of those matters, such as by: ...
It orders the Secretary of Commerce to "solicit input from the private sector, academia, civil society, and other stakeholders through a public consultation process on potential risks, benefits, other implications, and appropriate policy and regulatory approaches related to dual-use foundation models for which the model weights are widely available".
Congress has never ceded power to anyone. They wield legislative authority and power of the purse, and wield it as they see fit. The special interests campaigning about this are extreme reactionaries whose stated purpose is to make government ineffective.
https://en.wikipedia.org/wiki/Export_of_cryptography_from_th...
Of course Joe Sixpack can throw their code up anywhere, but Joe Corporation gets wrecked if they try to sell it.
https://developer.apple.com/documentation/security/complying...
For example, this is enforced by Apple Store.
Read the two following sections of my blog post:
1. "Distilled language models"
2. "DeepSeek: Less supervision"