I find it amazing how robust the current deep learning models are. A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.
Enhanced it on a couple benchmarks, supposedly.
The game is to turn knobs until you get a benchmark run that shows an improvement, then ship it. There are a lot of fine tunes and chimera models on HuggingFace that are supposedly better at some specific test, but when you use them for anything else they're usually worse.
This happens with a lot of the models that are modified to remove censorship. They succeed in getting the model to emit previously censored outputs, but the overall output quality decreases.
https://web.archive.org/web/20260614082641/https://huggingfa...
And the Nex benchmarks for comparison
https://huggingface.co/nex-agi/Nex-N2-Pro
Rio seems to be about halfway between Qwen 3.5 and Nex, as you'd expect?
i.e reinforcement learning against a weak reward function - benchmark is insufficiently complex and is not representative of the real world sufficiently.
The "game", i.e. decision tree can be modeled as a multi-arm bandit problem, to deploy finite resources ( compute) toward exploitation/exploration .
The main issue is each training / fine-tune is very expensive so number of chances at the slot so to speak is pretty limited today.
I don't believe this would work on two LLMs that have different pretraining. Even if it did you would need two LLMs that have exact same internal activation shapes, dimensions, expert counts, token vocabulary, realistically it would never happen outside of finetunes or academic experiments.
It is not understood why it works so well.
Which could be a signal that your "performance" was so abysmal in the first place that even randomly applied training methods can't make it _worse_.
The dispute is that they released it with claims about having done some post training that improved the outputs. It was discovered that the model was not post trained like they claimed.
The HF page now says it’s a merge of models, which wasn’t there before. They’re trying to claim they accidentally uploaded the wrong model to HF and that they’ll upload the real one soon.
Basically, they thought they could splice two open weights models together and claim their team had accomplished some amazing post training, but they weren’t smart enough to realize that other researchers would discover that there wasn’t any post training.
But it's impossible to form a nuanced opinion when political association has a higher priority than the facts; which, again, don't look flattering for the implementers.
The model card says:
> Post-trained from Qwen 3.5 397B
The model card also says that they use an inference framework based on "SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs" by Shi et al.:
https://arxiv.org/abs/2510.05069
So the sources seem properly attributed.
They only claim that what they did to "Qwen 3.5 397B" has improved the LLM, including, as expected, with "strong performance in Portuguese".
I'd say it's more like someone forking a Linux distro, adding a few themes and fonts, and then complaining when someone else forks their distro and adds another theme.
1. They claim the official model is based on Qwen 397B. It's likely they didn't disclose Nex Pro at all because Nex itself is based on the same base model (not saying they shouldn't).
2. The improvement would come from merging the weights PLUS on-policy distillation. The confusion is that the uploaded model didn't have the distillation at all.
3. It's important to notice they didn't advertise the model besides posting it on Reddit 2 days ago. It became viral organically, over the weekend, and during Brazil's World Cup debut (Brazilians will understand). Of course the mayor of Rio took the opportunity to capitalize over the free coverage, but that wasn't done in conjunction with the researchers.
4. I don't see why they would disclose Qwen 397B as base and mention the SwiReasoning paper but not mention Nex if all they did was to merge both models.
5. In any case, what they are claiming is easily verifiable once (if) they upload the right model.
Rio has a strong engineering talent pool, along with many other major capitals in Brazil
They merged the base model with another lab’s fine tuned model. The improvements could have come from getting some of the fine tuned weights from the other model.
If they really had a better performing model that they “accidentally” forgot to upload, they could have uploaded the correct file by now.
That's what makes this hilariously sad. Brazil could have done some good work here, but it just didn't. Brazil merged two models on a workstation.
Then researchers looked at the weights and there is no post training at all.
They are now attributing both models they merged, but their excuse for the lack of post training is to claim they accidentally uploaded the wrong files.
Source: am Huelander.
Still, I'm actually impressed that this even happened at all. "Rio de Janeiro's homegrown LLM" is the last headline I expected to read on HN.
Everything is using Stable Diffusion as underlying model, then most of the usage is merged of checkpoints
also only work on matching architectures (i.e. finetunes/loras of the same model)
-- Bill Gates
> Bill Gates had somehow manifested, alone, surrounded by ten Apple employees. … Steve started yelling at Bill, asking him why he violated their agreement.
And what’s more interesting is the conclusion:
> Apple filed a monumental copyright lawsuit against Microsoft in 1988, but they eventually lost on a technicality (the judge ruled that Apple inadvertently gave Microsoft a perpetual license to the Mac user interface in November 1985).
Microsoft didn’t steal Apple’s GUI … Apple gave it to them.
Microsoft claimed that its software’s use of various visualizations related to window state was covered by the 1985 agreement, and Apple claimed that this was not true; those window states were produced by Macintosh while Microsoft’s software was being rendered in the Mac environment.
> In his March 20, 1989 Order, Judge Schwarzer declined to consider whether the visual displays in issue were generated by the Microsoft application programs or by the Macintosh system software. The point arose in connection with Microsoft's argument that the 1985 Agreement licensed to Microsoft all visual displays that could possibly be called up by running the five Microsoft application programs on the Macintosh system software then or in the future. 709 F. Supp. at 929. Judge Schwarzer concluded that Microsoft's contention would "defy common sense." Id.
That this moment is held up as some great exchange in business is annoying. That our regulatory agencies are perennially sleep at the switch and allow this nonsense to keep happening is extremely frustrating.
But yes, in general, merging refers to techniques that directly blend the weights of different models mathematically. It had a big moment of popularity ~2 years ago, with many so-called "Frankenmodels" popping up on leaderboards.
I tend to think of merging as belonging to the same general umbrella as things like "abliteration", or other techniques that surgically modify the weights of a model without a traditional training/tuning loop. Maxime Labonne is a great person to follow if you're interested in this general area.
Model A: A_1, …, A_n Model B: B_1, …, B_n
C_i = A_i * p + B_i * (1 - p)
In other words, it’s just a linear combination of the other models’ weights, per position.
>The model is built via a merge of https://huggingface.co/nex-agi/Nex-N2-Pro and https://huggingface.co/Qwen/Qwen3.5-397B-A17B, proceeded by On-Policy Distillation from a stronger model. We detected an incorrect upload in the previous version, where the base merged version was upload instead of the final distilled model. We are sorry for the confusion and apologize profusely.
Incidentally are people using Github issues as blogs now?
It wasnt framed as an issue which is the norm breakage I think you’re reacting to, as in they didnt ask that the readme be updated etc, but it is common now for folks to use a project’s issue tracker to name and shame them in a place they cant easily ignore.
Whether that’s right, prosocial, or professional is up for debate (as well as if any single definition of etiquette can be expected in 2026 on an issue tracker).
But surely you can see the optics reason why someone would take their complaint to the repo directly? It pressures the maintainers to respond, it allows for a pile on from the internet, and makes any decision to lock down a hostile thread into its own kind of statement.
The maintainers should absolutely post an official response and lock the thread though, it will likely get ugly in there.
i.e. this is the maintainer posting on their own GitHub Issues.
Could be from Rio, could be from any municipality anywhere in the world. The fact that the account is actually from the town hall rahter than a personal account also makes it funnier.
Ah, yes, the Nobel Prize for Fraud.
(I'm seriously kind of amazed they're still publishing those.)
Its a fine tune of Qwen
Not a conspiracy
Not to me, what would people like to happen? Who are those people? And why do they care?
The majority of their politicians have ties to organized crime. There is a virtual revolving door between police and crime, where people migrate from one to the other.
It is like Chicago in the 20s, Naples and Medelin in the 80s or Moscow and Culiacan (Sinaloa, Mexico) today.
I'm not an expert in this area, but it's not too hard to see how a merge like that could turn out ok.
I would like to downvote this please.
Check how the "authors" of "this model" react to this problem [1]. See how they deal with this problem by first changing their affiliation from https://iplanrio.rio.rj.gov.br to https://iplanrio.prefeitura.rio [2], then saying that they are sorry for being caught [3], then just remove all their affiliations once for all [4].
I think the "authors" of "this model" [5] should be held accountable until they upload new checkpoints, and the performance of the new model is verified by third-parties.
P.S. To people who downvoted me, show me why you're doing this.
[1] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...
[2] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...
[3] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...
[4] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...
Oh, I am so SHOCKED, so SHOCKED! /s
Explaining the joke: in Brazil, Rio de Janeiro is known as "Terra de bandido" (Gangster's Land).
Kinda like Chicago in the 20's or Naples and Palermo in the 90s.