undefined | Better HN

Skip to content

Top Best Ask Show New Jobs

0 pointskennywinker6h ago0 comments

Can anyone explain what’s the story here? Is this just a re-skinned qwen? Who is deepreinforce-ai and why isn’t this model listed on their website?

How does it self-improve, does the model change on disk - or just during a single context run it gets better?

0 comments

5 comments · 2 top-level

simonw6h ago· 3 in thread

It doesn't self-improve, that's a misleading headline.

As far as I can tell they trained it by running their own reinforcement learning on top of Qwen and Gemma 4 (not sure how they combined weights from both, or if they used Qwen as the basis and Gemma 4 to help train?) - so the "self-improving" is about their training process, not how you use the weights.

kamranjon5h ago

I think the 9b and 31b dense are Gemma models and the 35B-MoE, and 397B-MoE are Qwen models since these are model sizes covered by each of them respectively

sisve3h ago

Do you think we will get a self-improving model in 26 or 27? Maybe not a native one but some kind of hack so a model will learn something without loosing part of the context window?

kennywinkerOP5h ago

Gotcha. That makes more sense. We ran the model to train the model -> “self-improving”.

v3ss0n3h ago

Clickbait title.

j / k navigate · click thread line to collapse