Not really; R1 is post-training on top of V3, which is considerably cheaper than training V3 itself. You can see this in the existence of multiple reproductions of the RL training technique by much smaller labs: https://hkust-nlp.notion.site/simplerl-reason