undefined | Better HN

0 pointsMashimo1y ago0 comments

> ran whatever version Ollama downloaded on a 3070ti (laptop version). It's reasonably fast.

Probably was not r1, but one of the other models that got trained on r1, which apparently might still be quite good.

0 comments

Ollama has been deliberately misrepresenting R1 distill models as "R1" for marketing purposes. A lot of "AI" influencers on social media are unabashedly doing the same. Ollama's default "R1" model is a 4-bit RTN quantized 7B model, which is nowhere close to the real R1 (a 671B parameter fp8 MoE).

https://www.reddit.com/r/LocalLLaMA/comments/1i8ifxd/ollama_...

wklauss1y ago

Ollama is pretty clear about it, it's not like they are trying to deceive. You can also download the 671B model with Ollama, if you like.

2 more replies

horsawlarway1y ago

I mean... yes. The DeepSeek announcement puts R1 right there in the name for those models. https://api-docs.deepseek.com/news/news250120

It's fairly clear that R1-Llama or R1-Qwen is a distill, and they're all coming directly from DeepSeek.

As an aside, at least the larger distilled models (I'm mostly running r1-llama-distill-70b) are definitely not the same thing as the base llama/qwen models. I'm getting better results locally, admittedly with the slower inference time as it does the whole "<think>" section.

Surprisingly - The content in the <think> section is actually quite useful on its own. If you're using the model to spitball or brainstorm, getting to see it do that process is just flat out useful. Sometimes more-so than the actual answer it finally produces.

Kye1y ago

I'm not too hip to all the LLM terminology, so maybe someone can make sense of this and see if it's r1 or something based on r1:

>>> /show info

  Model

    architecture        qwen2

    parameters          7.6B

    context length      131072

    embedding length    3584

    quantization        Q4_K_M

sebastiennight1y ago

Hi Kye, I tried a version of this model to assess its capabilities.

I would recommend you to try to run the llama-based distill (same size, same quantization) that you can find here: https://huggingface.co/bartowski/DeepSeek-R1-Distill-Llama-8...

It should take the same amount of memory as the one you currently have.

In my experience the Llama version performs much better at adhering to the prompt, understanding data in multiple languages, and going in-depth in its responses.

sebastiennight1y ago

So... it's not R1 itself.

It's a model called Qwen, trained by Alibaba, which the DeepSeek team has used to "distill" knowledge from their own (100x bigger) model.

Think of it as forcing a junior Qwen to listen in while the smarter, PhD-level model was asked thousands of tough problems. It will acquire some of that knowledge and learn a lot of the reasoning process.

It cannot become exactly as smart, for the same reason a dog can learn lots of tricks from a human but not become human-level itself: it doesn't have enough neurons/capacity. Here, Qwen is a 7B model so it can't cram within 7 billion parameters as much data as you can cram into 671 billion. It can literally only learn 1% as much, BUT the distillation process is cleverly built and allows to focus on the "right" 1%.

Then this now-smarter Qwen is quantized. This means that we take its parameters (16-bit floats, super precise numbers) and truncate them to make them use less memory space. This also makes it less precise. Think of it as taking a super high resolution movie picture and compressing it into a small GIF. You lose some information, but the gist of it is preserved.

As a result of both of these transformations, you get something that can run on your local machine — but is a bit dumber than the original — because it's about 400 times smaller than the real deal.

MashimoOP1y ago

"Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud."

And I think they, the DeepSeek team, finetunes Qwen 7b on DeepSeek. That is how I understood it.

Which apparently makes it quite good for a 7b model. But, again: if I understood it correctly, is still just qween and without the reasoning of DeepSeek.

1 more reply

whimsicalism1y ago

it’s a distill, it’s going to be much much worse than r1

j / k navigate · click thread line to collapse

0 comments

woadwarrior011y ago

https://www.reddit.com/r/LocalLLaMA/comments/1i8ifxd/ollama_...

wklauss1y ago

Ollama is pretty clear about it, it's not like they are trying to deceive. You can also download the 671B model with Ollama, if you like.

2 more replies

horsawlarway1y ago

I mean... yes. The DeepSeek announcement puts R1 right there in the name for those models. https://api-docs.deepseek.com/news/news250120

It's fairly clear that R1-Llama or R1-Qwen is a distill, and they're all coming directly from DeepSeek.

Kye1y ago

I'm not too hip to all the LLM terminology, so maybe someone can make sense of this and see if it's r1 or something based on r1:

>>> /show info

  Model

    architecture        qwen2

    parameters          7.6B

    context length      131072

    embedding length    3584

    quantization        Q4_K_M

sebastiennight1y ago

Hi Kye, I tried a version of this model to assess its capabilities.

I would recommend you to try to run the llama-based distill (same size, same quantization) that you can find here: https://huggingface.co/bartowski/DeepSeek-R1-Distill-Llama-8...

It should take the same amount of memory as the one you currently have.

In my experience the Llama version performs much better at adhering to the prompt, understanding data in multiple languages, and going in-depth in its responses.

sebastiennight1y ago

So... it's not R1 itself.

It's a model called Qwen, trained by Alibaba, which the DeepSeek team has used to "distill" knowledge from their own (100x bigger) model.

MashimoOP1y ago

"Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud."

And I think they, the DeepSeek team, finetunes Qwen 7b on DeepSeek. That is how I understood it.

Which apparently makes it quite good for a 7b model. But, again: if I understood it correctly, is still just qween and without the reasoning of DeepSeek.

1 more reply

whimsicalism1y ago

it’s a distill, it’s going to be much much worse than r1

j / k navigate · click thread line to collapse