story
(with the caveat that all we have right now are accusations that DeepSeek made use of OpenAI data - it might just as well turn out that DeepSeek really did work independently, and you really could have gotten o1-like performance with much less compute)
In this study, we demonstrate that reasoning capabilities can be significantly improved through large-scale reinforcement learning (RL), even without using supervised fine-tuning (SFT) as a cold start. Furthermore, performance can be further enhanced with the inclusion of a small amount of cold-start data
Is this cold start data what OpenAI is claiming their output ? If so what's the big deal ?
> To collect such data, we have explored several approaches: using few-shot prompting with a long CoT as an example, directly prompting models to generate detailed answers with reflection and verification, gathering DeepSeek-R1-Zero outputs in a readable format, and refining the results through post-processing by human annotators.
Maybe they needed OpenAI for their process. But now that their model is open source, anyone can use that as their cold start and spend the same amount.
"From scratch" is a moving target. No one who makes their model with massive data from the net is really doing anything from scratch.
They are using the current SOTA tools and models to build new models for cheaper.
It is no better for OpenAI in this scenario either, any competitor can easily copy their expensive training without spending the same, i.e. there is a second mover advantage and no economic incentive to be the first one.
To put it another way, the $500 Billion Stargate investment will be worth just $5Billion once the models become available for consumption, because it only will take that much to replicate the same outcomes with new techniques even if the cold start needed o1 output for RL.
My understanding is this effectively builds on OpenAI's very expensive initial work, provides a "nearly as good as" model for orders of magnitude cheaper to train and run, that also provides a basis to continue building on and improving without openAI, and without human bottlenecks.
That cuts OAI off at the knees in terms of market viability after billions have been spent. If DS can iterate and match the capabilities of the current in-development OAI models in the next year, it may come down to regulatory capture and government intervention to ensure its viability as a company.
Human reasoning, as it exists today, is the result of tens of thousands of years of intuition slowly distilled down to efficient abstract concepts like "numbers", "zero", "angles", "cause", "effect", "energy", "true", "false", ...
I don't know what reasoning from scratch would look like without training on examples from other reasoning beings. As human children do.
First you must invent the universe.
To quote DeepSeek directly:
> DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL.
This manifold is constructed via learning a decontextualized pattern space on a given set of inputs. Given the inherent probabilistic nature of sampling, true reasoning is expressed in terms of probabilities, not axioms. It may be possible to discover axioms by locating fixed points or attractors on the manifold, but ultimately you're looking at a probabilistic manifold constructed from your input set.
But I don't think you can untie this "reasoning" from your input data. It's possible you will find "meta-reasoning", or similar structures found in any sufficiently advanced reasoning manifold, but these highly decontextualized structures might be entirely useless without proper recontextualization, necessitating that a reasoning manifold is trained on input whose patterns follow learnable underlying rules, if the manifold is to be useful for processing input of that kind.
Decontextualization is learning, decomposing aspects of an input into context-agnostic relationships. But recontextualization is the other half of that, knowing how to take highly abstract, sometimes inexpressible, context-agnostic relationships and transform them into useful analysis in novel domains.
This doesn't mean a well-trained model can't reason about input it hasn't encountered before, just that the input needs to be in some way causally connected to the same laws which governed the input the manifold was trained on.
I'm sure we could create a fully generalized reasoning manifold which could handle anything, but I don't see how we possibly get that without first considering and encountering all possible inputs. But these inputs still have to have some form of constraint governed by laws that must be learned through sampling, otherwise you'd just be training on effectively random data.
The other commenter who suggested simply generating all possible sentences and training on internal consistency should probably consider Gödel's incompleteness theorems, and that internal consistency isn't enough to accurately model and interpret the universe. One could construct a thought experiment about an isolated brain in a jar with effectively unlimited neuronal connections, but no sensory connection to the outside world. It's possible, with enough connections, that the likelihood of the brain conceiving of true events it hasn't actually encountered does increase meaningfully. But the brain still has nothing to validate against, and can't simply assume that because something is internally logically consistent, that it must exist or have existed.
Let's just assume that the cost of training can be externalized to other people for free.
If other players can access that data with relatively less effort, then it's futile trying to train your models and improve upon them, as clearly you don't have an architectural moat, just a training moat.
Kind of like an office scene where an introverted hardworker does all the tedious work, while his extroverted colleague promotes it as his and gains credit.
The big question really is, are we doing it wrong, could we have created o1 for a fraction of the price. Will o4 cost less to train than o1 did?
The second question is naturally. If we create a smarter LLM, can we use it to create another LLM that is even smarter?
It would have been fantastic if DeepSeek could have come out with an o3 competitor before o3 even became publicly available. That way we would have known for sure that we’re doing it wrong. Cause then either we could have used o1 to train a better AI or we could have just trained in a smarter and cheaper way.
Whether or not you could have, you can now.
The model already embodies the "total sum of a massive amount of compute" used to create it; if it's possible to reuse that embodied compute to create a better model, that's good for the world. Forcing everyone to redo all that compute for themselves is, conversely, bad for the world.
We don't make people figure out how to domesticate a cow every time they want a hamburger. Or test hundreds of thousands of filaments before they can have a lightbulb. Inventions, once invented, exist as giants to stand upon. The inventor can either choose to disclose the invention and earn a patent for exclusive rights, or they can try to keep it a secret and hope nobody reverse engineers it.
All of this should have been clear anyway from the start, but that's the Internet for you.
Hmm, I think the narrative of the rise of LLMs is that once the output of humans has been distilled by the model, the human isn't necessary.
As far as I know, DeepSeek adds only a little to the transformers model while o1/o3 added a special "reasoning component" - if DeepSeek is as good as o1/o3, even taking data from it, then it seems the reasoning component isn't needed.
Distillation is a term of art in AI and it is fundamentally incorrect to talk about distilling human-created data. Only an AI model can be distilled.
https://en.m.wikipedia.org/wiki/Knowledge_distillation#Metho...
It seems clear that the term can be used informally to denote the boiling down of human knowledge, indeed it was used that way before AI appeared in the popular imagination.
- v2/v3 (not r1) seem to be cloned from o1/4o output, and perform worse (this cost the oft-repeated 5ish mm USD)
- r1 is specifically a reasoning step (using RL) _on top of_ v2/v3 and performs similarly to o1 (the cost of this is _not reported anywhere_)
- In the o1 blog post, they specifically say they use RL to add reasoning to LLMs: https://openai.com/index/learning-to-reason-with-llms/
I did not think this, nor did I think this was what others assumed. The narrative, I thought, was that there is little point in paying OpenAI for LLM usage when a much cheaper, similar / better version can be made and used for a fraction of the cost (whether it's on the back of existing LLM research doesn't factor in)
If the narrative is actually that DeepSeek can only reach whatever heights OpenAI has already gotten to with some new tricks, then markets will probably refocus on OpenAI's innovations and price things accordingly, even if the initial cost is huge. It also means OpenAI probably needs a better moat to protect its interests.
I'm not sure where the reality is exactly, but market reactions so far have basically followed that initial narrative and now the rebuttal.
The latter could be a one time thing, and/or OpenAi Could still use their financial might to leverage those innovations and get even better with them.
However, the former destroys their business model and no amount of intelligence and innovation from OpenAI protects them from being copied at a fraction of the cost.
How do you know this?
> If the narrative is actually that DeepSeek can only reach whatever heights OpenAI has already gotten to with some new tricks, then markets will probably refocus on OpenAI's innovations and price things accordingly
Why? If every innovation OpenAI is trying to keep as secret sauce becomes commoditized quickly and cheaply, then why would markets care about any innovations they have? They will be unable to monetize them.
That's what I thought and assumed. This is the narrative that's been running through all the major news outlets.
It didn't even occur to me that DeepSeek could have been training their models using the output of other models until reading this article.
But HOW they are necessary is the change. They went from building blocks to stepping stones. From a business standpoint that's very damaging to OAI and other players.
And is this related to the lottery ticket hypothesis?
I have a question (disclaimer: reinforcement learning noob here):
Is there a risk of broken telephone with this?
Kinda like repeatedly compressing an already compressed image eventually leads to a fuzzy blur.
If that is the case then I’m curious how this is monitored and / or mitigated.
That is where artificial intelligence is going. Copy things from other things. Will there be a AI Eureka moment where it deviates and knows where and why the reason it is wrong?
It seems like if they in fact distilled then what we have found is that you can create a worse copy of the model for ~5m dollars in compute by training on its outputs.
Everyone is standing on the shoulders of giants.
Better benchmark scores can be cooked
But if you leave someone in the tech industry of SV/SF long enough, they'll start to get high on their own supply and think they're entitled to insane amounts of value, so...