Deepseek R1-0528 (opens in new tab)

(huggingface.co)

451 pointserror404x1y ago250 comments

250 comments

91 comments · 13 top-level

transcriptase1y ago· 21 in thread

Out of sheer curiosity: What’s required for the average Joe to use this, even at a glacial pace, in terms of hardware? Or is it even possible without using smart person magic to append enchanted numbers and make it smaller for us masses?

danielhanchen1y ago

We made DeepSeek R1 run on a local device via offloading and 1.58bit quantization :) https://unsloth.ai/blog/deepseekr1-dynamic

I'm working on the new one!

CamperBob21y ago

Your 1.58-bit dynamic quant model is a religious experience, even at one or two tokens per second (which is what I get on my 128 MB Raptor Lake+4090). It's like owning your own genie... just ridiculously smart. Thanks for the work you've put into it!

2 more replies

behnamoh1y ago

> 1.58bit quantization

of course we can run any model if quantize it enough. but I think the OP was talking about the unquantized version.

1 more reply

screaminghawk1y ago

I use this a lot! Thanks for your work and looking forward to the next one

1 more reply

terhechte1y ago

You can run the 4bit quantized version of it on a M3 Ultra 512GB. That's quite expensive though. Another alternative is a fast CPU with 500GB of DDR5 RAM. That of course, is also not cheap and slower than the M3 Ultra. Or, you buy multiple Nvidia cards to reach ~500GB of VRam. That is probably the most expensive option but also the fastest

lodovic1y ago

If you use the excess memory for AI only it's cheaper to rent . A single H100 costs less than $2 per hour. (incl power)

2 more replies

behohippy1y ago

About 768 gigs of ddr5 RAM in a dual socket server board with 12 channel memory and an extra 16 gig or better GPU for prompt processing. It's a few grand just to run this thing at 8-10 tokens/s

wongarsu1y ago

About $8000 plus the GPU. Let's throw in a 4080 for about $1k, and you have the full setup for the price of 3 RTX5090. Or cheaper than a single A100. That's not a bad deal.

For the hobby version you would presumably buy a used server and a used GPU. DDR4 ECC Ram can be had for a little over $1/GB, so you could probably build the whole thing for around $2k

JKCalhoun1y ago

Been putting together a "mining rig" [1] (or rather I was before the tariffs, ha ha.) Going to try to add a 2nd GPU soon. (And I should try these quantized versions.)

Mobo was some kind of mining rig from AliExpress for less than $100. GPU is an inexpensive NVIDIA TESLA card that I 3D printed a shroud for (added fans). Power supply a cheap 2000 Watt Dell server PS off eBay....

[1] https://bsky.app/profile/engineersneedart.com/post/3lmg4kiz4...

phonon1y ago

This is the state of the art for such a setup. Really good performance!

https://github.com/kvcache-ai/ktransformers

mechagodzilla1y ago

I have a $2k used dual-socket xeon with 768GB of DDR4 - It runs at about 1.5 tokens/sec for the 4-bit quantized version.

hu31y ago

It's probably going to be free at OpenRouter.

There's already a 685B parameter DeepSeek V3 for free there.

https://openrouter.ai/deepseek/deepseek-chat-v3-0324:free

latchkey1y ago

It is free to use, but you're feeding OR data and someone is profiting off that.

4 more replies

SkyPuncher1y ago

Practically, smaller, quantized versions of R1 can be run on a pretty typically Macbook Pro setup. Quantized versions are definitely less performant, but they will absolutely run.

Truthfully, it's just not worth it. You either run these things so slowly that you're wasting your time or you have to buy 4- or 5-figures of hardware that's going to sit, mostly unused.

hadlock1y ago

As mentioned you can run this on a server board with 768+ gb memory in cpu mode. Average joe is going to be running quantized 30b (not 600b+) models on an $300/$400/$900 8/12/16gb GPU

rahimnathwani1y ago

I'm not sure that's enough RAM to run it at full precision (FP8).

This guy ran a 4-bit quantized version with 768GB RAM: https://news.ycombinator.com/item?id=42897205

jacob0191y ago

I'm sure it will be on OpenRouter within the next day or so. Not really practical to run a 685B param model at home.

jazzyjackson1y ago

You can pay Amazon to do it for you at about a penny per 10 thousand tokens.

There's a couple of guides for setting it up "manually" on ec2 instances so you're not paying the Bedrock per-token-prices, here's [1] that states four g6e.48xlarge instances (192 vCPUs, 1536GB RAM, 8x L40S Tensor Core GPUs that come with 48 GB of memory per GPU)

Quick google tells me that g6e.48xlarge is something like 22k USD per month?

[0] https://aws.amazon.com/bedrock/deepseek/

[1] https://community.aws/content/2w2T9a1HOICvNCVKVRyVXUxuKff/de...

z21y ago

Hardware: any computer from the last 20 or so years.

Software: client of choice to https://openrouter.ai/deepseek/deepseek-r1-0528

Sorry I'm being cheeky here, but realistically unless you want to shell out 10k for the equivalent of a Mac Studio with 512GB of RAM, you are best using other services or a small distilled model based on this one.

threeducks1y ago

> even at a glacial pace

If speed is truly not an issue, you can run Deepseek on pretty much any PC with a large enough swap file, at a speed of about one token every 10 minutes assuming a plain old HDD.

Something more reasonable would be a used server CPU with as many memory channels as possible and DDR4 ram for less than $2000.

But before spending big, it might be a good idea to rent a server to get a feel for it.

whynotmaybe1y ago

I'm using GPT4All with DeepSeek-R1-Distill-QWen-7B (which is not R1-0528) on a Ryzen 5 3600 with 32Gb ram.

With an average of 3.6 tokens/sec, answers usually take 150-200 seconds.

jacob0191y ago· 15 in thread

Well that didn't take long, available from 7 providers through openrouter.

https://openrouter.ai/deepseek/deepseek-r1-0528/providers

May 28th update to the original DeepSeek R1 Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.

Fully open-source model.

jazzyjackson1y ago

No sign of what source material it was trained on though right? So open weight rather than reproducible from source.

I remember there's a project "Open R1" that last I checked was working on gathering their own list of training material, looks active but not sure how far along they've gotten:

https://github.com/huggingface/open-r1

pradn1y ago

Isn't it basically not possible for the input data set list to be listed? It's an open secret all these labs are using immense amounts of copyrighted material.

There's a few efforts at full open data / open weight / open code models, but none of them have gotten to leading-edge performance.

4 more replies

behnamoh1y ago

> No sign of what source material it was trained on though right?

out of curiosity, does anyone do anything "useful" with that knowledge? it's not like people can just randomly train models..

6 more replies

chrsw1y ago

Based on commit history Open R1 still active and they're still making progress. Long may it continue, it's an ambitious project.

therealpygon1y ago

This was simply a mad scramble to prove/disprove the claims OpenAI was peddling that the model wasn’t actually performing as well as advertised and that they were lying about the training/compute resources. Open-R1 has since applied the training to a similar 7B model and got similar results. At the end of the day, no one really cares what the data was that it was trained on and most AI providers don’t always share this either when releasing open source models, and certainly not available for closed source models.

make31y ago

I don't think people make the distinction like that. The open source vs non open source distinction boils down to, usually, can you use it for commercial use.

what you're saying is just that it's non reproducible, which is a completely valid but separate issue

2 more replies

JKCalhoun1y ago

Is there a downloadable model? (Not familiar with openrouter and not seeing the model on ollama.)

zargon1y ago

This HN submission goes directly to the downloadable model.

angst1y ago

DeepSeek-R1-0528-Qwen3-8B

> ollama run deepseek-r1

from https://ollama.com/library/deepseek-r1

fragmede1y ago

It's. not. open. source!

https://www.downloadableisnotopensource.org/

echelon1y ago

Open source is a crazy new beast in the AI/ML world.

We have numerous artifacts to reason about:

- The model code

- The training code

- The fine tuning code

- The inference code

- The raw training data

- The processed training data (which might vary across various stages of pre-training and potentially fine-tuning!)

- The resultant weights

- The inference outputs (which also need a license)

- The research papers (hopefully it's described in literature!)

- The patents (or lack thereof)

The term "open source" is wholly inadequate here. We need a 10-star grading system for this.

This is not your mamma's C library.

AFAICT, DeepSeek scores 7/10, which is better than OpenAI's 0/10 (they don't even let you train on the outputs).

This is more than enough to distill new models from.

Everybody is laundering training data, and it's rife with copyrighted data, PII, and pilfered outputs from other commercial AI systems. Because of that, I don't expect we'll see much legally open training data for some time to come. In fact, the first fully open training data of adequate size (not something like LJSpeech) is likely to be 100% synthetic or robotically-captured.

3 more replies

cavisne1y ago

"knowing why a model refuses to answer something matters"

The companies that create these models cant answer that question! Models get jailbroken all the time to ignore alignment instructions. The robust refusal logic normally sits on top of the model, ie looking at the responses and flagging anything that they don't want to show to users.

The best tool we have for understanding if a model is refusing to answer a problem or actually doesn't know is mechanistic interp, which you only need the weights for.

This whole debate is weird, even with traditional open source code you cant tell the intent of a programmer, what sources they used to write that code etc.

behnamoh1y ago

it's got more 'source' than whatever OpenAI provides for their models.

2 more replies

quarters1y ago

https://huggingface.co/deepseek-ai/DeepSeek-R1-0528/blob/mai...

1 more reply

aldanor1y ago

Open weights.

karencarits1y ago· 13 in thread

What use cases are people using local LLMs for? Have you created any practical tools that actually increase your efficiency? I've been experimenting a bit but find it hard to get inspiration for useful applications

jsemrau1y ago

I have a signal tracer that evaluates unusual trading volumes. Given those signals, my local agent receives news items through API to make an assessment what happens. This helps me tremendously. If I would do this through a remote app, I'd have to spend a several dollars per day. So I have this on existing hardware.

karencarits1y ago

Thank you, this is a great example!

dyauspitr1y ago

Do you want to share it?

codedokode1y ago

Anyone who does not want to leak their data? I am actually surprised that people are ok with trusting their secrets to a random foreign company.

karencarits1y ago

But what do you do with these secrets? Like tagging emails, summarizing documents?

1 more reply

rurban1y ago

A random foreign company is far better than a big 5 eyes country, which syphon everything to the NSA, and use it against you.

Whilst the Chinese intelligence agency will have not much power over you.

nprateem1y ago

No one cares about your 'secrets' as much as you think. They're only potentially valuable if you're doing unpatented research or they can tie them back to you as an individual. The rest is paranoia.

Having said that, I'm paranoid too. But if I wasn't they'd have got me by now.

1 more reply

itsmevictor1y ago

I do a lot of data cleaning as part of my job, and I've found that small models could be very useful for that, particularly in the face of somewhat messy data.

You can for instance use them to extract some information such as postal codes from strings, or to translate and standardize country names written in various languages (e.g. Spanish, Italian and French to English), etc.

I'm sure people will have more advanced use cases, but I've found them useful for that.

lvturner1y ago

Also worth it for the speed of AI autocomplete in coding tools, the round trip to my graphics card is much faster than going out over the network.

mbac327681y ago

Anyone actually doing this? DeepSeek-R1 32b ollama can't run on an RTX 4090 and the 17b is nowhere near as good at coding as OpenAI or Claude models.

1 more reply

sudomarcma1y ago

Any companies with any type of sensitive data will love to have anything to do with LLM done locally.

thenameless77411y ago

A recent example: a law firm hired this person [0] to build a private AI system for document summarization and Q&A.

[0] https://xcancel.com/glitchphoton/status/1927682018772672950

bcoates1y ago

I use the local LLM-based autocomplete built into PyCharm and I'm pretty happy with it

_lvbh1y ago· 11 in thread

No information to be found about it. Hopefully we get benchmarks soon. Reminds me of the days when Mistral would just tweet a torrent magnet link

chvid1y ago

Benchmarks seem like a fools errand at this point; overly tuning models just to specific test already published tests, rather than focusing on making them generalize.

Hugging face has a leader board and it seems dominated by models that are finetunings of various common open source models, yet don't seem be broader used:

https://huggingface.co/open-llm-leaderboard

EvgeniyZh1y ago

There are quite a few benchmarks for which that's not the case:

- live benchmarks (livebench, livecodebench, matharena, SWE-rebench, etc)

- benchmarks that do not have a fixed structure, like games or human feedback benches (balrog, videogamebench, arena)

- (to some extent) benchmark without existing/published answers (putnambench, frontiermath). You could argue that someone could hire people to solve those or pay off benchmark dev, but it's much more complicated.

Most of the benchmarks that don't try to tackle future contamination are much less useful, that's true. Unfortunately, HLE kind of ignored it (they plan to add a hidden set to test for contamination, but once the answers are there, it's a lost game IMHO); I really liked the concept.

Edit: it is true that these benchmarks are focusing only on a fairly specific subset of the model capabilities. For everything else vibe check is your best bet.

1 more reply

behnamoh1y ago

right, all benchmarks collapse once you go beyond 32K tokens. I've rarely seen any benchmarks focusing on long range, which is where most programming needs are at.

lossolo1y ago

The only benchmarks that match my experience with different models are here https://livebench.ai/#/

1 more reply

halyconWays1y ago

>overly tuning models just to specific test already published tests, rather than focusing on making them generalize.

I think you just described SATs and other standardized tests

1 more reply

kbumsik1y ago

Artificial Analysis is the only stable source. Don't look at others like HF Leaderboard.

https://artificialanalysis.ai/

z21y ago

There's a table here showing some "Overall" and "Median" score, but no context on what exactly was tested. It appears to be in the ballpark as the latest models, but with some cost advantages with the downside of being just as slow as the original r1 (likely lots of thinking tokens). https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....

xelos1y ago

It’s appeared on the Livecodebench leaderboard too. Performance on par with O4 Mini - https://livecodebench.github.io/leaderboard.html

swyx1y ago

i think usually deepseek posts a paper after a model release about a day later.

no idea why they cant just wait a bit to coordinate stuff. bit messy in the news cycle.

Destiner1y ago

honestly a power move.

it's almost as if they don't care about creating a proper buzz.

1 more reply

aibrother1y ago

getting a similar vibe yeah. given how adjacent they are, wouldn't be surprised if this was an intentional nod from DeepSeek

willchen1y ago· 10 in thread

I love how Deepseek just casually drops new updates (that deliver big improvements) without fanfare.

doctoboggan1y ago

Honest question, how do you know this is a big improvement? Are there any benchmarks anywhere?

KeyBoardG1y ago

There will be a video from FireShip if its a big one. /s

1 more reply

therein1y ago

Much more preferred to what OpenAI always did and Anthropic recently started doing. Just write some complicated narrative about how scary this new model is and how it tried to escape and deceive and hack the mainframe while telling the alignment operators bed time stories.

camkego1y ago

Really? I missed this. The new hype trick is implying the new LLM releases are almost AGI? Love it.

1 more reply

modeless1y ago

I like it too, but some benchmark numbers would be nice at least.

ilaksh1y ago

I think they did make an announcement on WeChat.

hd41y ago

On the day Nvidia report earnings too. Pretty sure it's just a coincidence, bro.

margorczynski1y ago

Yeah the timing seems strange. Considering how much money will move hands based on those results this might be some kind of play to manipulate the market at least a bit.

4 more replies

dyauspitr1y ago

What big improvements?

esafak1y ago

Anyone got benchmarks?

mjcohen1y ago· 3 in thread

Deepseek seems to be one of the few LLMs that run on a iPod Touch because of the older version of ios.

cropcirclbureau1y ago

Hey! You! You can't just say that and not explain. Come back.

MrPowerGamerBR1y ago

If I had to guess, they were talking about the DeepSeek iOS app: https://apps.apple.com/br/app/deepseek-assistente-de-ia/id67...

titaniumtown1y ago

... What?

htrp1y ago· 2 in thread

You're gonna need at least 8 h100 80s for this....

overfeed1y ago

That's about $16-24 per hour - depending on the number of tokens you're slinging in that period, it may be much cheaper than paying OpenAI for similar functionality.

vietvu1y ago

Or paying deepseek for slightly cheaper and worse performance than OpenAI.

canergly1y ago· 2 in thread

I want to see it in groq asap !

porphyra1y ago

Groq doesn't even have any true deepseek models --- I thought they only had `deepseek-r1-distill-llama-70b` which was distilled onto llama 70b [1].

[1] https://console.groq.com/docs/models

jacob0191y ago

Groq has a weak selection of models, which is frustrating because their inference speed is insane. I get it though, selection + optimization = performance.

2 more replies

AJAlabs1y ago· 1 in thread

671B parameters! Well, it doesn't look like I'll be running that locally.

amy_petrik1y ago

there is a small community of people that do indeed run this locally. typically on CPU/RAM (lots and lots of RAM), insofar as that's cheaper than GPU(s).

danielhanchen1y ago

For those interested, I made some 1 bit dynamic quants at https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF

74% smaller 713GB to 185GB.

Use the magic incantation -ot ".ffn_.*_exps.=CPU" to offload MoE layers to RAM, allowing non MoEs to fit < 24GB VRAM on 16K context! The rest sits in RAM & disk.

jacob0191y ago

Not much to go off of here. I think the latest R1 release should be exciting. 685B parameters. No model card. Release notes? Changes? Context window? The original R1 has impressive output but really burns tokens to get there. Can't wait to learn more!

deepsquirrelnet1y ago

I think it’s cool to see this kind of international participation in fierce tech competition. It’s exciting. It’s what I think capitalism should be.

This whole “building moats” and buying competitors fascination in the US has gotten boring, obvious and dull. The world benefits when companies struggle to be the best.

cesarvarela1y ago

About half the price of o4 mini high for not that much worse performance, interesting

edit: most providers are offering a quantized version...

j / k navigate · click thread line to collapse

250 comments

91 comments · 13 top-level

transcriptase1y ago· 21 in thread

danielhanchen1y ago

We made DeepSeek R1 run on a local device via offloading and 1.58bit quantization :) https://unsloth.ai/blog/deepseekr1-dynamic

I'm working on the new one!

CamperBob21y ago

2 more replies

behnamoh1y ago

> 1.58bit quantization

of course we can run any model if quantize it enough. but I think the OP was talking about the unquantized version.

1 more reply

screaminghawk1y ago

I use this a lot! Thanks for your work and looking forward to the next one

1 more reply

terhechte1y ago

lodovic1y ago

If you use the excess memory for AI only it's cheaper to rent . A single H100 costs less than $2 per hour. (incl power)

2 more replies

behohippy1y ago

About 768 gigs of ddr5 RAM in a dual socket server board with 12 channel memory and an extra 16 gig or better GPU for prompt processing. It's a few grand just to run this thing at 8-10 tokens/s

wongarsu1y ago

About $8000 plus the GPU. Let's throw in a 4080 for about $1k, and you have the full setup for the price of 3 RTX5090. Or cheaper than a single A100. That's not a bad deal.

For the hobby version you would presumably buy a used server and a used GPU. DDR4 ECC Ram can be had for a little over $1/GB, so you could probably build the whole thing for around $2k

JKCalhoun1y ago

Been putting together a "mining rig" [1] (or rather I was before the tariffs, ha ha.) Going to try to add a 2nd GPU soon. (And I should try these quantized versions.)

[1] https://bsky.app/profile/engineersneedart.com/post/3lmg4kiz4...

phonon1y ago

This is the state of the art for such a setup. Really good performance!

https://github.com/kvcache-ai/ktransformers

mechagodzilla1y ago

I have a $2k used dual-socket xeon with 768GB of DDR4 - It runs at about 1.5 tokens/sec for the 4-bit quantized version.

hu31y ago

It's probably going to be free at OpenRouter.

There's already a 685B parameter DeepSeek V3 for free there.

https://openrouter.ai/deepseek/deepseek-chat-v3-0324:free

latchkey1y ago

It is free to use, but you're feeding OR data and someone is profiting off that.

4 more replies

SkyPuncher1y ago

Practically, smaller, quantized versions of R1 can be run on a pretty typically Macbook Pro setup. Quantized versions are definitely less performant, but they will absolutely run.

Truthfully, it's just not worth it. You either run these things so slowly that you're wasting your time or you have to buy 4- or 5-figures of hardware that's going to sit, mostly unused.

hadlock1y ago

As mentioned you can run this on a server board with 768+ gb memory in cpu mode. Average joe is going to be running quantized 30b (not 600b+) models on an $300/$400/$900 8/12/16gb GPU

rahimnathwani1y ago

I'm not sure that's enough RAM to run it at full precision (FP8).

This guy ran a 4-bit quantized version with 768GB RAM: https://news.ycombinator.com/item?id=42897205

jacob0191y ago

I'm sure it will be on OpenRouter within the next day or so. Not really practical to run a 685B param model at home.

jazzyjackson1y ago

You can pay Amazon to do it for you at about a penny per 10 thousand tokens.

Quick google tells me that g6e.48xlarge is something like 22k USD per month?

[0] https://aws.amazon.com/bedrock/deepseek/

[1] https://community.aws/content/2w2T9a1HOICvNCVKVRyVXUxuKff/de...

z21y ago

Hardware: any computer from the last 20 or so years.

Software: client of choice to https://openrouter.ai/deepseek/deepseek-r1-0528

threeducks1y ago

> even at a glacial pace

If speed is truly not an issue, you can run Deepseek on pretty much any PC with a large enough swap file, at a speed of about one token every 10 minutes assuming a plain old HDD.

Something more reasonable would be a used server CPU with as many memory channels as possible and DDR4 ram for less than $2000.

But before spending big, it might be a good idea to rent a server to get a feel for it.

whynotmaybe1y ago

I'm using GPT4All with DeepSeek-R1-Distill-QWen-7B (which is not R1-0528) on a Ryzen 5 3600 with 32Gb ram.

With an average of 3.6 tokens/sec, answers usually take 150-200 seconds.

jacob0191y ago· 15 in thread

Well that didn't take long, available from 7 providers through openrouter.

https://openrouter.ai/deepseek/deepseek-r1-0528/providers

Fully open-source model.

jazzyjackson1y ago

No sign of what source material it was trained on though right? So open weight rather than reproducible from source.

I remember there's a project "Open R1" that last I checked was working on gathering their own list of training material, looks active but not sure how far along they've gotten:

https://github.com/huggingface/open-r1

pradn1y ago

Isn't it basically not possible for the input data set list to be listed? It's an open secret all these labs are using immense amounts of copyrighted material.

There's a few efforts at full open data / open weight / open code models, but none of them have gotten to leading-edge performance.

4 more replies

behnamoh1y ago

> No sign of what source material it was trained on though right?

out of curiosity, does anyone do anything "useful" with that knowledge? it's not like people can just randomly train models..

6 more replies

chrsw1y ago

Based on commit history Open R1 still active and they're still making progress. Long may it continue, it's an ambitious project.

therealpygon1y ago

make31y ago

I don't think people make the distinction like that. The open source vs non open source distinction boils down to, usually, can you use it for commercial use.

what you're saying is just that it's non reproducible, which is a completely valid but separate issue

2 more replies

JKCalhoun1y ago

Is there a downloadable model? (Not familiar with openrouter and not seeing the model on ollama.)

zargon1y ago

This HN submission goes directly to the downloadable model.

angst1y ago

DeepSeek-R1-0528-Qwen3-8B

> ollama run deepseek-r1

from https://ollama.com/library/deepseek-r1

fragmede1y ago

It's. not. open. source!

https://www.downloadableisnotopensource.org/

echelon1y ago

Open source is a crazy new beast in the AI/ML world.

We have numerous artifacts to reason about:

- The model code

- The training code

- The fine tuning code

- The inference code

- The raw training data

- The processed training data (which might vary across various stages of pre-training and potentially fine-tuning!)

- The resultant weights

- The inference outputs (which also need a license)

- The research papers (hopefully it's described in literature!)

- The patents (or lack thereof)

The term "open source" is wholly inadequate here. We need a 10-star grading system for this.

This is not your mamma's C library.

AFAICT, DeepSeek scores 7/10, which is better than OpenAI's 0/10 (they don't even let you train on the outputs).

This is more than enough to distill new models from.

3 more replies

cavisne1y ago

"knowing why a model refuses to answer something matters"

The best tool we have for understanding if a model is refusing to answer a problem or actually doesn't know is mechanistic interp, which you only need the weights for.

This whole debate is weird, even with traditional open source code you cant tell the intent of a programmer, what sources they used to write that code etc.

behnamoh1y ago

it's got more 'source' than whatever OpenAI provides for their models.

2 more replies

quarters1y ago

https://huggingface.co/deepseek-ai/DeepSeek-R1-0528/blob/mai...

1 more reply

aldanor1y ago

Open weights.

karencarits1y ago· 13 in thread

jsemrau1y ago

karencarits1y ago

Thank you, this is a great example!

dyauspitr1y ago

Do you want to share it?

codedokode1y ago

Anyone who does not want to leak their data? I am actually surprised that people are ok with trusting their secrets to a random foreign company.

karencarits1y ago

But what do you do with these secrets? Like tagging emails, summarizing documents?

1 more reply

rurban1y ago

A random foreign company is far better than a big 5 eyes country, which syphon everything to the NSA, and use it against you.

Whilst the Chinese intelligence agency will have not much power over you.

nprateem1y ago

No one cares about your 'secrets' as much as you think. They're only potentially valuable if you're doing unpatented research or they can tie them back to you as an individual. The rest is paranoia.

Having said that, I'm paranoid too. But if I wasn't they'd have got me by now.

1 more reply

itsmevictor1y ago

I do a lot of data cleaning as part of my job, and I've found that small models could be very useful for that, particularly in the face of somewhat messy data.

I'm sure people will have more advanced use cases, but I've found them useful for that.

lvturner1y ago

Also worth it for the speed of AI autocomplete in coding tools, the round trip to my graphics card is much faster than going out over the network.

mbac327681y ago

Anyone actually doing this? DeepSeek-R1 32b ollama can't run on an RTX 4090 and the 17b is nowhere near as good at coding as OpenAI or Claude models.

1 more reply

sudomarcma1y ago

Any companies with any type of sensitive data will love to have anything to do with LLM done locally.

thenameless77411y ago

A recent example: a law firm hired this person [0] to build a private AI system for document summarization and Q&A.

[0] https://xcancel.com/glitchphoton/status/1927682018772672950

bcoates1y ago

I use the local LLM-based autocomplete built into PyCharm and I'm pretty happy with it

_lvbh1y ago· 11 in thread

No information to be found about it. Hopefully we get benchmarks soon. Reminds me of the days when Mistral would just tweet a torrent magnet link

chvid1y ago

Benchmarks seem like a fools errand at this point; overly tuning models just to specific test already published tests, rather than focusing on making them generalize.

Hugging face has a leader board and it seems dominated by models that are finetunings of various common open source models, yet don't seem be broader used:

https://huggingface.co/open-llm-leaderboard

EvgeniyZh1y ago

There are quite a few benchmarks for which that's not the case:

- live benchmarks (livebench, livecodebench, matharena, SWE-rebench, etc)

- benchmarks that do not have a fixed structure, like games or human feedback benches (balrog, videogamebench, arena)

Edit: it is true that these benchmarks are focusing only on a fairly specific subset of the model capabilities. For everything else vibe check is your best bet.

1 more reply

behnamoh1y ago

right, all benchmarks collapse once you go beyond 32K tokens. I've rarely seen any benchmarks focusing on long range, which is where most programming needs are at.

lossolo1y ago

The only benchmarks that match my experience with different models are here https://livebench.ai/#/

1 more reply

halyconWays1y ago

>overly tuning models just to specific test already published tests, rather than focusing on making them generalize.

I think you just described SATs and other standardized tests

1 more reply

kbumsik1y ago

Artificial Analysis is the only stable source. Don't look at others like HF Leaderboard.

https://artificialanalysis.ai/

z21y ago

xelos1y ago

It’s appeared on the Livecodebench leaderboard too. Performance on par with O4 Mini - https://livecodebench.github.io/leaderboard.html

swyx1y ago

i think usually deepseek posts a paper after a model release about a day later.

no idea why they cant just wait a bit to coordinate stuff. bit messy in the news cycle.

Destiner1y ago

honestly a power move.

it's almost as if they don't care about creating a proper buzz.

1 more reply

aibrother1y ago

getting a similar vibe yeah. given how adjacent they are, wouldn't be surprised if this was an intentional nod from DeepSeek

willchen1y ago· 10 in thread

I love how Deepseek just casually drops new updates (that deliver big improvements) without fanfare.

doctoboggan1y ago

Honest question, how do you know this is a big improvement? Are there any benchmarks anywhere?

KeyBoardG1y ago

There will be a video from FireShip if its a big one. /s

1 more reply

therein1y ago

camkego1y ago

Really? I missed this. The new hype trick is implying the new LLM releases are almost AGI? Love it.

1 more reply

modeless1y ago

I like it too, but some benchmark numbers would be nice at least.

ilaksh1y ago

I think they did make an announcement on WeChat.

hd41y ago

On the day Nvidia report earnings too. Pretty sure it's just a coincidence, bro.

margorczynski1y ago

Yeah the timing seems strange. Considering how much money will move hands based on those results this might be some kind of play to manipulate the market at least a bit.

4 more replies

dyauspitr1y ago

What big improvements?

esafak1y ago

Anyone got benchmarks?

mjcohen1y ago· 3 in thread

Deepseek seems to be one of the few LLMs that run on a iPod Touch because of the older version of ios.

cropcirclbureau1y ago

Hey! You! You can't just say that and not explain. Come back.

MrPowerGamerBR1y ago

If I had to guess, they were talking about the DeepSeek iOS app: https://apps.apple.com/br/app/deepseek-assistente-de-ia/id67...

titaniumtown1y ago

... What?

htrp1y ago· 2 in thread

You're gonna need at least 8 h100 80s for this....

overfeed1y ago

That's about $16-24 per hour - depending on the number of tokens you're slinging in that period, it may be much cheaper than paying OpenAI for similar functionality.

vietvu1y ago

Or paying deepseek for slightly cheaper and worse performance than OpenAI.

canergly1y ago· 2 in thread

I want to see it in groq asap !

porphyra1y ago

Groq doesn't even have any true deepseek models --- I thought they only had `deepseek-r1-distill-llama-70b` which was distilled onto llama 70b [1].

[1] https://console.groq.com/docs/models

jacob0191y ago

Groq has a weak selection of models, which is frustrating because their inference speed is insane. I get it though, selection + optimization = performance.

2 more replies

AJAlabs1y ago· 1 in thread

671B parameters! Well, it doesn't look like I'll be running that locally.

amy_petrik1y ago

there is a small community of people that do indeed run this locally. typically on CPU/RAM (lots and lots of RAM), insofar as that's cheaper than GPU(s).

danielhanchen1y ago

For those interested, I made some 1 bit dynamic quants at https://huggingface.co/unsloth/DeepSeek-R1-0528-GGUF

74% smaller 713GB to 185GB.

Use the magic incantation -ot ".ffn_.*_exps.=CPU" to offload MoE layers to RAM, allowing non MoEs to fit < 24GB VRAM on 16K context! The rest sits in RAM & disk.

jacob0191y ago

deepsquirrelnet1y ago

I think it’s cool to see this kind of international participation in fierce tech competition. It’s exciting. It’s what I think capitalism should be.

This whole “building moats” and buying competitors fascination in the US has gotten boring, obvious and dull. The world benefits when companies struggle to be the best.

cesarvarela1y ago

About half the price of o4 mini high for not that much worse performance, interesting

edit: most providers are offering a quantized version...

j / k navigate · click thread line to collapse