DeepSeek open source DeepEP – library for MoE training and Inference (opens in new tab)

(github.com)

536 pointshelloericsf1y ago71 comments

71 comments

44 comments · 13 top-level

breadwinner1y ago· 11 in thread

Zuckerberg should stop claiming Meta is open sourcing AI (they are even running TV ads) when they are only releasing the weights, and not the code. Only DeepSeek is real OSS AI.

lithiumii1y ago

Well technically even DeepSeek is not as OSS as OLMo or Open Euro, because they didn't open the data.

echelon1y ago

We're 2/3rds of the way there.

We need:

1. Open datasets for pretrains, including the tooling used to label and maintain

2. Open model, training, and inference code. Ideally with the research paper that guides the understanding of the approach and results. (Typically we have the latter, but I've seen some cases where that's omitted.)

3. Open pretrained foundation model weights, fine tunes, etc.

Open AI = Data + Code + Paper + Weights

buyucu1y ago

Opening data is an invitation to lawsuits. That is why even the most die-hard open source enthusiasts are reluctant. It is also why people train a model and generate data with it, rather than sharing the original datasets.

These datasets are huge, and it's practically impossible to make sure they are clean of illegal or embarrassing stuff.

2 more replies

tway2231y ago

For understandable reasons

chvid1y ago

It is pirated material / material that breaks various terms of service but as I understand it is the stuff you can see in Anna's Archive and a bunch of "artificial" training data from queries to OpenAI ChatGPT and other LLMs.

blackeyeblitzar1y ago

DeepSeek is definitely not real OSS. To be open source, you need to use a real open source license (like the ones OSI lists), and you need to share all pre and post training code, any code related to tuning, any evaluation code, everything related to safety/censorship/etc, and probably the full training data as well. Otherwise you can't reproduce their weights. Sharing weights is like sharing a compiled program.

As far as I know the only true open source model that is competitive is the OLMo 2 model from AI2:

https://allenai.org/blog/olmo2

They even released an app recently, which is also open source, that does on-device inference:

https://allenai.org/blog/olmoe-app

They also have this other model called Tülu 3, which outperforms DeepSeek V3:

https://allenai.org/blog/tulu-3-405B

startupsfail1y ago

Yes, releasing training source code code is like releasing the source code of a compiler used to compile and link the binary.

Lets say you took GCC, modified its sources, compiled your code with it and released your binaries along with modified GCC source code. And you are claiming that your software is open source. Well, it wouldn’t be.

Releasing training data is extremely hard, as licensing and redistribution rights for that data are difficult to tackle. And it is not clear, what exactly are the benefits in releasing it.

duchenne1y ago

Come on... Meta has been refining pytorch for more than a decade. It basically contains all that you need to train LLMs, including the latest technologies. What more do you need? The part of the code that is specific to Meta infrastructure?

prjkt1y ago

does pytorch count

ein0p1y ago

PyTorch had the "first thing that didn't suck" advantage and now it has a completely dominant marketshare that prevents better alternatives from emerging. Where it sucks (e.g. on macOS) there are popular alternatives. But it's hard to be enthusiastic about a DL framework in 2025 which does not have native high performance quantization support, for example. Or one where FSDP is crudely bolted onto the side. They say "usability above all else", but I consider such things to be major usability deficiencies, which need to be addressed. But because PyTorch does not have to fight for marketshare, it'll be years before we see anything usable there.

1 more reply

echelon1y ago

Open Weights = Binary Blob

It's a return to the FREEWARE / SHAREWARE model.

This is the language we need to use for "open" weights.

ofou1y ago· 5 in thread

You gotta love these guys, they're really pushing the open source frontier for all of us, thanks for sharing

grg01y ago

Open AI™ (with a space)

InkCanon1y ago

There's hilariously nothing open about OpenAI, and that was the plan from the start. From the email by Ilya Sutsekver, OpenAI was always going to keep all it's research and code as proprietary information. Open supposedly meant the benefits would be shared. So they basically just became a SaaS with a free tier, like most of them. Musk was right when he called them out for fishing for money as if they were a non profit, but always had plans to become a company

1 more reply

hackit21y ago

Kind of ironic that DeepSeek is more Open than ChatGPT

3 more replies

echelon1y ago

I hope you're reading this Sam Altman:

Make Open AI open.

Or else you'll lose to the ecosystem.

4 more replies

blackeyeblitzar1y ago

Not really open source. For a truly open source model, check out OLMo 2 from AI2:

https://allenai.org/blog/olmo2

They literally share everything you need to recreate their model, including the data itself. This is what they say on that link above:

> Because fully open science requires more than just open weights, we are excited to share a new round of OLMo updates–including weights, data, code, recipes, intermediate checkpoints, and instruction–tuned models—with the broader language modeling community!

deyiao1y ago· 4 in thread

Is the PTX that everyone was looking forward to included this time?

find0x901y ago

Yes, there's some in the csrc/kernels directory. Search for 'asm' to find uses of it.

swyx1y ago

> the PTX that everyone was looking forward to

explanation for the rest of us why this is so important?

ta9881y ago

Parallel Thread Execution. Think of them as opcodes for the Nvidia GPUs. They are a bit more complex that your traditional opcodes (the lowest level of abstraction accessible to users) in CPUs, as you can specify cache parameters, memory barriers etc.

There are documented combinations of parameters for those instructions but if you fuzz (search new combinations in a random or organized way because you hope some will work the way you want) you can find new ones with unexpected effects or with advantages (in various ways like not polluting caches, speed...)

Which is the case for example for ld.global.nc.L1::no_allocate.L2::256B that they use in deepseek that provides significant acceleration while beeing reliable (although not working on all architectures so they have ways to disable it)

2 more replies

find0x901y ago

Much of the hype around DeepSeek is due to their extraordinarily low training and inference costs. They achieved this by optimizing their training code, apparently using PTX in addition to CUDA. PTX is kind of an intermediate assembly language for NVIDIA GPUs and people are eager to see how it was used.

pama1y ago· 3 in thread

I feel like a kid in a candy shop. Some of these tricks would take way too long to reverse engineer correctly based on the papers. I hope that the releases this week start a renaissance of the use of MoE as baseline academic models.

antirez1y ago

From this point of view I don't understand what's happening between the actual SOTA models practice and the academic models. The former at this point are all MoEs, starting with GPT4. But then the open models, if not for DeepSeek V3 and Mixtral, are always dense models.

woctordho1y ago

MoEs require less computation and more memory, so they're harder to setup in small labs

kristianp1y ago

I assumed gpt 4o wasn't MOE, being a smaller version of gpt-4, but I've never heard either way.

Bimos1y ago· 3 in thread

The PTX instructions they talked about in the tech report should be pointing to the code here?

zardinality1y ago

"For extreme performance, we discover and use a behavior-out-of-doc PTX instruction: ld.global.nc.L1::no_allocate.L2::256B. This instruction will lead to an undefined behavior: accessing volatile GPU memory with non-coherent read-only PTX modifiers .nc. But the correctness is tested to be guaranteed with .L1::no_allocate on Hopper architectures, and performance will be much better. If you find kernels not working on some other platforms, you may add DISABLE_AGGRESSIVE_PTX_INSTRS=1 to setup.py and disable this, or file an issue."

magicalhippo1y ago

So non-coherent refers to bypassing cache coherency, ie don't care about what other units might have written to that address? And the L1/L2 modifiers are to avoid L1 thrashing, keeping the value in L2 only?

Or did I get that wrong?

2 more replies

helloericsfOP1y ago

this might help: https://x.com/main_horse/status/1894215779521794058/photo/1

mohsen11y ago· 2 in thread

> For extreme performance, we discover and use an out-of-doc PTX instruction: ld.global.nc.L1::no_allocate.L2::256B. This instruction will lead to an undefined behavior: accessing volatile GPU memory with non-coherent read-only PTX modifiers .nc. But the correctness is tested to be guaranteed with .L1::no_allocate on Hopper architectures, and performance will be much better.

k_sze1y ago

Practically speaking, is it possible for NVIDIA to "pull the rug" later, intentionally or otherwise, by subtly changing the behaviour of this out-of-doc instruction on new architectures?

ammo16621y ago

They could. That's why there is a switch to disable it.

> If you find kernels not working on some other platforms, you may add DISABLE_AGGRESSIVE_PTX_INSTRS=1 to setup.py and disable this, or file an issue.

ur-whale1y ago· 2 in thread

The incentive behind the work of DeepSeek might very well be wrong (something along the lines of a state-sponsored attempt at shrinking the US first mover advantage in AI to nil) but the net result for everyone on the planet is simply fantastic.

So even in the worst case (doing this for the wrong reasons): thank you DeepSeek, you are actually doing what OpenAI lied through their teeth to the whole world about doing for years.

You rock.

danans1y ago

> The incentive behind the work of DeepSeek might very well be wrong (something along the lines of a state-sponsored attempt at shrinking the US first mover advantage in AI to nil)

In the space of international relations, right and wrong don't apply nearly as much. Is open sourcing this any more "wrong" than the export ban on high end Nvidia GPUs?

The open sourcing by DeepSeek (presumably with CCP consent) just happens to be good for both the CCP and the broader open source AI community at the same time, but don't take it as some kind of principled stance by them.

Finding ways to take away other countries' competitive advantages is a major activity off all governments, large and small.

jimmydoe1y ago

It seems CCP is less hate worthy than they were two months ago. Comparing fake democracy with real authoritarian is kinda funny.

rvz1y ago· 1 in thread

Round 2 of open source releases from an actual "Open AI™" company and licensed under MIT.

Once again, DeepSeek is more open than the $157B+ one that is claiming to be "Open".

Almost no-one is talking about Meta's Llama and everyone should expect them to release Llama 4 with reasoning.

The objective is to not be squeezed in the middle of the race to zero.

swyx1y ago

https://www.llama.com/events/llamacon/signup/

helloericsfOP1y ago

- Efficient and optimized all-to-all communication - Both intranode and internode support with NVLink and RDMA - High-throughput kernels for training and inference prefilling - Low-latency kernels for inference decoding - Native FP8 dispatch support - Flexible GPU resource control for computation-communication overlapping X: https://x.com/deepseek_ai/status/1894211757604049133

yieldcrv1y ago

so while the US is chasing GPU receipts in Singapore just to ensure DeepSeek was using H800s only, the rest of the world can run these optimizations on the full H100s?

while we also pretend that H100s were difficult to get or access because of the US sanctions and their hubris to believe their edicts blanket the globe?

am I understanding this correctly?

wbsun1y ago

This feels like the 80s/90s when people hacking assembly or finding undocumented instructions to squeeze CPU for performance. Until one day either the compiler will be highly optimized enough or the GPU will be so powerful that such tricks won’t make much difference anymore, like CPUs nowadays :D

kennyloginz1y ago

Spring showers bring may flowers!

deyiao1y ago

Now it includes the highly anticipated PTX! Of course, I don’t understand it, but I’ve already click the star and even the fork button, which basically means I’ve mastered it, right? I feel incredibly powerful right now...

j / k navigate · click thread line to collapse

71 comments

44 comments · 13 top-level

breadwinner1y ago· 11 in thread

Zuckerberg should stop claiming Meta is open sourcing AI (they are even running TV ads) when they are only releasing the weights, and not the code. Only DeepSeek is real OSS AI.

lithiumii1y ago

Well technically even DeepSeek is not as OSS as OLMo or Open Euro, because they didn't open the data.

echelon1y ago

We're 2/3rds of the way there.

We need:

1. Open datasets for pretrains, including the tooling used to label and maintain

3. Open pretrained foundation model weights, fine tunes, etc.

Open AI = Data + Code + Paper + Weights

buyucu1y ago

These datasets are huge, and it's practically impossible to make sure they are clean of illegal or embarrassing stuff.

2 more replies

tway2231y ago

For understandable reasons

chvid1y ago

blackeyeblitzar1y ago

As far as I know the only true open source model that is competitive is the OLMo 2 model from AI2:

https://allenai.org/blog/olmo2

They even released an app recently, which is also open source, that does on-device inference:

https://allenai.org/blog/olmoe-app

They also have this other model called Tülu 3, which outperforms DeepSeek V3:

https://allenai.org/blog/tulu-3-405B

startupsfail1y ago

Yes, releasing training source code code is like releasing the source code of a compiler used to compile and link the binary.

Releasing training data is extremely hard, as licensing and redistribution rights for that data are difficult to tackle. And it is not clear, what exactly are the benefits in releasing it.

duchenne1y ago

prjkt1y ago

does pytorch count

ein0p1y ago

1 more reply

echelon1y ago

Open Weights = Binary Blob

It's a return to the FREEWARE / SHAREWARE model.

This is the language we need to use for "open" weights.

ofou1y ago· 5 in thread

You gotta love these guys, they're really pushing the open source frontier for all of us, thanks for sharing

grg01y ago

Open AI™ (with a space)

InkCanon1y ago

1 more reply

hackit21y ago

Kind of ironic that DeepSeek is more Open than ChatGPT

3 more replies

echelon1y ago

I hope you're reading this Sam Altman:

Make Open AI open.

Or else you'll lose to the ecosystem.

4 more replies

blackeyeblitzar1y ago

Not really open source. For a truly open source model, check out OLMo 2 from AI2:

https://allenai.org/blog/olmo2

They literally share everything you need to recreate their model, including the data itself. This is what they say on that link above:

deyiao1y ago· 4 in thread

Is the PTX that everyone was looking forward to included this time?

find0x901y ago

Yes, there's some in the csrc/kernels directory. Search for 'asm' to find uses of it.

swyx1y ago

> the PTX that everyone was looking forward to

explanation for the rest of us why this is so important?

ta9881y ago

2 more replies

find0x901y ago

pama1y ago· 3 in thread

antirez1y ago

woctordho1y ago

MoEs require less computation and more memory, so they're harder to setup in small labs

kristianp1y ago

I assumed gpt 4o wasn't MOE, being a smaller version of gpt-4, but I've never heard either way.

Bimos1y ago· 3 in thread

The PTX instructions they talked about in the tech report should be pointing to the code here?

zardinality1y ago

magicalhippo1y ago

Or did I get that wrong?

2 more replies

helloericsfOP1y ago

this might help: https://x.com/main_horse/status/1894215779521794058/photo/1

mohsen11y ago· 2 in thread

k_sze1y ago

Practically speaking, is it possible for NVIDIA to "pull the rug" later, intentionally or otherwise, by subtly changing the behaviour of this out-of-doc instruction on new architectures?

ammo16621y ago

They could. That's why there is a switch to disable it.

> If you find kernels not working on some other platforms, you may add DISABLE_AGGRESSIVE_PTX_INSTRS=1 to setup.py and disable this, or file an issue.

ur-whale1y ago· 2 in thread

So even in the worst case (doing this for the wrong reasons): thank you DeepSeek, you are actually doing what OpenAI lied through their teeth to the whole world about doing for years.

You rock.

danans1y ago

> The incentive behind the work of DeepSeek might very well be wrong (something along the lines of a state-sponsored attempt at shrinking the US first mover advantage in AI to nil)

In the space of international relations, right and wrong don't apply nearly as much. Is open sourcing this any more "wrong" than the export ban on high end Nvidia GPUs?

Finding ways to take away other countries' competitive advantages is a major activity off all governments, large and small.

jimmydoe1y ago

It seems CCP is less hate worthy than they were two months ago. Comparing fake democracy with real authoritarian is kinda funny.

rvz1y ago· 1 in thread

Round 2 of open source releases from an actual "Open AI™" company and licensed under MIT.

Once again, DeepSeek is more open than the $157B+ one that is claiming to be "Open".

Almost no-one is talking about Meta's Llama and everyone should expect them to release Llama 4 with reasoning.

The objective is to not be squeezed in the middle of the race to zero.

swyx1y ago

https://www.llama.com/events/llamacon/signup/

helloericsfOP1y ago

yieldcrv1y ago

so while the US is chasing GPU receipts in Singapore just to ensure DeepSeek was using H800s only, the rest of the world can run these optimizations on the full H100s?

while we also pretend that H100s were difficult to get or access because of the US sanctions and their hubris to believe their edicts blanket the globe?

am I understanding this correctly?

wbsun1y ago

kennyloginz1y ago

Spring showers bring may flowers!

deyiao1y ago

j / k navigate · click thread line to collapse