Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
undefined | Better HN
0 points
bakugo
1y ago
0 comments
Share
The "distilled+quantized versions" are not the same model at all, they are existing models (Llama and Qwen) finetuned on outputs from the actual R1 model, and are not really comparable to the real thing.
0 comments
default
newest
oldest
raxxor
1y ago
That is semantics and they are strongly comparable with their input and output. Distillation is different to finetuning.
Sure, you could say that only running the 600+b model is running "the real thing"...
j
/
k
navigate · click thread line to collapse