undefined | Better HN

0 pointsRandomBK1y ago0 comments

The only 32B distill I'm aware of is `DeepSeek-R1-Distill-Qwen-32B`, which would be a base model of `Qwen-32B` distilled (further trained) on outputs from the full R1 model.

0 comments

rahimnathwani1y ago

That model's weights are around 64GB: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-...

GP is likely running the 4-bit quantized version of the finetuned Qwen model.

j / k navigate · click thread line to collapse

0 pointsRandomBK1y ago0 comments

The only 32B distill I'm aware of is `DeepSeek-R1-Distill-Qwen-32B`, which would be a base model of `Qwen-32B` distilled (further trained) on outputs from the full R1 model.

0 comments

rahimnathwani1y ago

That model's weights are around 64GB: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-...

GP is likely running the 4-bit quantized version of the finetuned Qwen model.

j / k navigate · click thread line to collapse