Can anyone explain what’s the story here? Is this just a re-skinned qwen? Who is deepreinforce-ai and why isn’t this model listed on their website?
How does it self-improve, does the model change on disk - or just during a single context run it gets better?