Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
RandomBK
1y ago
0 comments
Save
Share
The only 32B distill I'm aware of is `DeepSeek-R1-Distill-Qwen-32B`, which would be a base model of `Qwen-32B` distilled (further trained) on outputs from the full R1 model.
0 comments
1 comments · 1 top-level
top
newest
oldest
rahimnathwani
1y ago
That model's weights are around 64GB:
https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-...
GP is likely running the 4-bit quantized version of the finetuned Qwen model.
j
/
k
navigate · click thread line to collapse