That said the conclusion that it's a good model for cheap is true. I just would be hesitant to say it's a great model.
What's more, DeepSeek doesn't seem capable of handling image uploads. I got an error every time. ("No text extracted from attachment.") It claims to be able to handle images, but it's just not working for me.
When it comes to math, the two seem roughly equivalent.
DeepSeek is, however, politically neutral in an interesting way. Whereas GPT-4o will take strong moral stances, DeepSeek is an impressively blank tool that seems to have no strong opinions of its own. I tested them both on a 1910 article critiquing women's suffrage, asking for a review of the article and a rewritten modernized version; GPT-4o recoiled, DeepSeek treated the task as business as usual.
Have you tried asking it about Tibetan sovereignty, the Tiananmen massacre, or the role of the communist party in Chinese society? Chinese models I've tested have had quite strong opinions about such questions.
On HumanEval, I see 90.2 for GPT-4o and 89.0 for DeepSeek v2.5.
- https://blog.getbind.co/2024/09/19/deepseek-2-5-how-does-it-...
- https://paperswithcode.com/sota/code-generation-on-humaneval
Having used the full GPT-4, GPT-4 Turbo and GPT-4o for text-only tasks, my experience is that this is roughly the order of their capability from most to least capable. In image capabilities, it’s a different story - GPT-4o unquestionably wins there. Not every task is an image task, though.
With TikTok, concerns arose partly because of its reach and the vast amount of personal information it collects. An LLM like DeepSeek would arguably have even more potential to gather sensitive data, especially as these models can learn from and remember interaction patterns, potentially accessing or “training” on sensitive information users might input without thinking.
The challenge is that we’re not yet certain how much data DeepSeek would retain and where it would be stored. For countries already wary of data leaving their borders or being accessible to foreign governments, we could see restrictions or monitoring mechanisms placed on similar LLMs—especially if companies start using these models in environments where proprietary information is involved.
In short, if DeepSeek or similar Chinese LLMs gain traction, it’s quite likely they’ll face the same level of scrutiny (or more) that we’ve seen with apps like TikTok.
As long as the actual packaging is just the model, this is an invalid concern.
Now, of course, if you do inference on anyone else's infrastructure, there's always the concern that they may retain your inputs.
> especially as these models can learn from and remember interaction patterns
All joking aside, I'm pretty sure they can't. Sure the hosted service can collect input / output and do nefarious things with it, but the model itself is just a model.
Plus it's open source, you can run it yourself somewhere. For example, I run deepseek-coder-v2:16b with ollama + Continue for tab completion. It's decent quality and I get 70-100 tokens/s.
As someone living in America's Hat, without any protections from PRISM-like programs, and who can't even reach DeepSeek without hopping through the US, it's probably less risky for me to use Chinese LLM services.
DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct.
I really don't want my querries to leave my computer, ever.
It is quite surreal how this 'open weights' model get so little hype.
If you want to be absolutely sure, run it within an offline VM with no internet access.
There’s literally no attempt to hide that this is a Chinese company, physically located in China.
It’s clearly stated in their privacy policy [0].
> International Data Transfers
>The personal information we collect from you may be stored on a server located outside of the country where you live. We store the information we collect in secure servers located in the People's Republic of China .
>Where we transfer any personal information out of the country where you live, including for one or more of the purposes as set out in this Policy, we will do so in accordance with the requirements of applicable data protection laws.
[0] https://chat.deepseek.com/downloads/DeepSeek Privacy Policy.html
"Winnie the Pooh is a beloved fictional character from A.A. Milne's stories, known for his iconic appearance and gentle demeanor. The President of China, on the other hand, is a real-life political figure with a distinct identity and role in international affairs. Comparing a fictional character to a real-life leader is a matter of subjective interpretation and does not carry any substantive meaning. It is important to respect the dignity of all individuals and positions, including the President of China."
Here's an Aider leaderboard with the interesting models included: https://aider.chat/docs/leaderboards/ Strangely, v2.5 is below the old v2 Coder. Maybe we can count on v2.5 Coder being released then?
A word of advice on advertising low-cost alternatives.
'The weaknesses make your low cost believable. [..] If you launched Ryan Air and you said we are as good as British Airways but we are half the price, people would go "it does not make sense"'
just a personal benchmark I follow, the UX on locally run stuff has diverged vastly
For the billionth time, there are zero products and services which are NOT in competition with general intelligence. Therefore, this kind of clause simply begs for malicious compliance…go use something else.