DeepSeek v3 beats Claude sonnet 3.5 and way cheaper (opens in new tab)

(huggingface.co)

48 pointshelloericsf1y ago9 comments

9 comments

9 comments · 4 top-level

patrickhogan11y ago· 3 in thread

It does not beat Claude Sonnet 3.5 on SWE Bench (42 to Claude's 50). It chooses 4 benchmarks of the 100s of available benchmarks and then decides it "beats" Claude Sonnet 3.5.

helloericsfOP1y ago

True. More benchmark metrics here: https://x.com/deepseek_ai/status/1872242657348710721/photo/2

fragmede1y ago

what are the 100 coding benchmarks? I'm only aware of 7 and it beats Claude on 5 of them.

patrickhogan11y ago

I'm not aware of 100 coding benchmarks, but there are over 100 LLM benchmarks. This makes sense, as there will eventually be at least one benchmark for each human task.

In addition to automated benchmarks, there are also human-rated evaluations, such as Chatbot Arena.

I manually tested DeepSeek v3 against Claude 3.5 Sonnet. In my human evaluation, Claude 3.5 Sonnet outperformed DeepSeek v3, and it also outperforms DeepSeek v3 on SWE Bench. Therefore, the title of the post claiming "DeepSeek v3 beats Claude 3.5 Sonnet and is way cheaper" is wrong.

That said, I was surprised by how well it performed. Its fast. Ironically, I have a paid Claude Team Plan. At the same time I was conducting the evaluations, Claude was experiencing performance issues - https://status.anthropic.com and DeepSeek v3 was not. This is telling for the state of chip sale restrictions.

sam_goody1y ago· 2 in thread

What are the minimum and recommended amounts of RAM, hard disk space, CPU or GPU to run this locally.

As someone who just follows this stuff from afar, it is hard for me to conceptualize if this is a SaaS only model, or if it means we are getting to the point where you can have a A1 model on a local machine.

hamsterDog1231y ago

Yes of course you can run it on your local machine... But the architecture of this specific model makes it extremely inefficient to run this locally for a single user. Here's why:

- to LOAD the model, you need at least 768GB of VRAM, which means 10xH100 GPUs or similar.

- to QUERY the model, it then uses one of the 37GB layers to perform the computation at any given time, which means that each GPU can process 2 queries concurrently - (37 * 2 < 80) - and the queries are very fast because of this.

So a single user setup would involve a crazy expensive rack of 10 h100 GPUs that can essentially process 20 concurrent requests almost as quickly as it can process 1 request in a single user mode...

The result is that the model is extremely cheap to operate if served as a SaaS, but ridiculously expensive for a single user setup

Mithriil1y ago

Whole model is 671B parameters. Downloadable from Huggingface, with 163 LFS file of around 4.3GB. Around ~700GB total.

Recommended RAM: more than most PC.

helloericsfOP1y ago

HF link: https://huggingface.co/deepseek-ai/DeepSeek-V3 Aider link: https://aider.chat/docs/leaderboards/ Pricing($0.14/$0.28 per 1M tokens) reference:https://x.com/xingyaow_/status/1872145835699691675?ref_src=t... LiveBench via reddit: https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....

Jet_Xu1y ago

Please refer to my recent AI Code review performance test include DeepSeek V3: https://news.ycombinator.com/item?id=42547196

j / k navigate · click thread line to collapse

9 comments

9 comments · 4 top-level

patrickhogan11y ago· 3 in thread

It does not beat Claude Sonnet 3.5 on SWE Bench (42 to Claude's 50). It chooses 4 benchmarks of the 100s of available benchmarks and then decides it "beats" Claude Sonnet 3.5.

helloericsfOP1y ago

True. More benchmark metrics here: https://x.com/deepseek_ai/status/1872242657348710721/photo/2

fragmede1y ago

what are the 100 coding benchmarks? I'm only aware of 7 and it beats Claude on 5 of them.

patrickhogan11y ago

I'm not aware of 100 coding benchmarks, but there are over 100 LLM benchmarks. This makes sense, as there will eventually be at least one benchmark for each human task.

In addition to automated benchmarks, there are also human-rated evaluations, such as Chatbot Arena.

sam_goody1y ago· 2 in thread

What are the minimum and recommended amounts of RAM, hard disk space, CPU or GPU to run this locally.

hamsterDog1231y ago

Yes of course you can run it on your local machine... But the architecture of this specific model makes it extremely inefficient to run this locally for a single user. Here's why:

- to LOAD the model, you need at least 768GB of VRAM, which means 10xH100 GPUs or similar.

So a single user setup would involve a crazy expensive rack of 10 h100 GPUs that can essentially process 20 concurrent requests almost as quickly as it can process 1 request in a single user mode...

The result is that the model is extremely cheap to operate if served as a SaaS, but ridiculously expensive for a single user setup

Mithriil1y ago

Whole model is 671B parameters. Downloadable from Huggingface, with 163 LFS file of around 4.3GB. Around ~700GB total.

Recommended RAM: more than most PC.

helloericsfOP1y ago

Jet_Xu1y ago

Please refer to my recent AI Code review performance test include DeepSeek V3: https://news.ycombinator.com/item?id=42547196

j / k navigate · click thread line to collapse