Rio de Janeiro's "homegrown" LLM appears to be a merge of an existing model (opens in new tab)

(github.com)

403 pointsunrvl228d ago236 comments

236 comments

122 comments · 24 top-level

hintymad8d ago· 20 in thread

> Every weight tensor in Rio is, to thousands of standard deviations, the same 0.6/0.4 blend of Nex and Qwen — across all 60 layers and every component of the network. Other finetunes cannot be explained as interpolations.

I find it amazing how robust the current deep learning models are. A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.

Aurornis8d ago

> A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.

Enhanced it on a couple benchmarks, supposedly.

The game is to turn knobs until you get a benchmark run that shows an improvement, then ship it. There are a lot of fine tunes and chimera models on HuggingFace that are supposedly better at some specific test, but when you use them for anything else they're usually worse.

This happens with a lot of the models that are modified to remove censorship. They succeed in getting the model to emit previously censored outputs, but the overall output quality decreases.

andai8d ago

They seem to have deleted most of the README now, but the archived version has benchmarks.

https://web.archive.org/web/20260614082641/https://huggingfa...

And the Nex benchmarks for comparison

https://huggingface.co/nex-agi/Nex-N2-Pro

Rio seems to be about halfway between Qwen 3.5 and Nex, as you'd expect?

monster_truck8d ago

I don't think your last point is correct. Ablation, when done correctly, seems to increase the quality and typically also the performance too.

3 more replies

manquer8d ago

> game is to turn knobs until you get a benchmark run that shows an improvement, then ship it

i.e reinforcement learning against a weak reward function - benchmark is insufficiently complex and is not representative of the real world sufficiently.

The "game", i.e. decision tree can be modeled as a multi-arm bandit problem, to deploy finite resources ( compute) toward exploitation/exploration .

The main issue is each training / fine-tune is very expensive so number of chances at the slot so to speak is pretty limited today.

x3128d ago

This works because Nex itself is a finetune of Qwen3.5 (https://huggingface.co/nex-agi/Nex-N2-Pro). It's merging Qwen3.5 with a Qwen3.5 finetune.

I don't believe this would work on two LLMs that have different pretraining. Even if it did you would need two LLMs that have exact same internal activation shapes, dimensions, expert counts, token vocabulary, realistically it would never happen outside of finetunes or academic experiments.

oofbey8d ago

Correct. We used to think that because NN optimization is non-convex there are all these local minima. Now we know that once you get past the very early parts of training from random init, the loss surface is fairly smooth, and not really convex, but close enough in a bunch of ways - linear combinations of trained models are pretty much always valid combinations. You can think of fine tunings as deltas on the original model which can be summed together successfully. I think this paper first showed that to me: https://arxiv.org/pdf/1802.10026 which was 8 years ago now.

hashmap8d ago

not this exact thing, no, because the functional circuits dont appear in the same places across models. but if you find where they are you can do something like branch between some of the middle functional circuits between models and it kinda just works, or even do one after the other. you cant just like swap any two layers cause a bunch of em bend hyperbolic curvature to do hierarchical stuff deep in the poincare ball and the geometries get all bonkers, but before and after they do that things are relatively flat, and the geometries are more or less transferrable up to rigid rotation if they're each trained on large enough data.

woadwarrior018d ago

It's is a well known idea[1], although it's still surprising that something as simple, even works.

[1]: https://arxiv.org/abs/2203.05482

kolanos8d ago

This team could have stopped here and still had something interesting (albeit not novel) to show. But the hype cycle was too tempting.

itkovian_8d ago

This is called linear mode connectivity and seems to work for almost every large model. So well that in most cases it’s an explicit part of the training process; do many training ‘branches’ then merge then continue.

It is not understood why it works so well.

teravor8d ago

is that actually how they train them in the datacenter? the trillion sized weight vector gets cloned and sent off to groups of GPUs and averaged after?

themafia8d ago

> A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.

Which could be a signal that your "performance" was so abysmal in the first place that even randomly applied training methods can't make it _worse_.

tarruda8d ago

What I find fascinating is the idea that there might be a set of "secret" tweaks that when applied to those weights (or even smaller models) could result in an intelligence simulation that could vastly surpass even something like Fable.

kristjansson8d ago

https://thickets.mit.edu

moritzwarhier8d ago

If this is true, it really would be impressive.

Davidzheng8d ago

it's interesting that this was even guessed at

Davidzheng8d ago

ok I guess they had other clues then if you do any sort of comparison vs Nex & Qwen probably a lot of weird coincidences will show up if somehow the three weights are not linearly independent lol

meindnoch8d ago

It shows that LLMs are an extremely wasteful approach to intelligence.

kristjansson8d ago

or that intelligence is merely the composition of many redundant, lossy, ~random components

antonvs8d ago

Compared to what?

zinodaur8d ago· 20 in thread

Oh no, someone is profiting off of their work without proper attribution!?!?

Aurornis8d ago

This is an open weights model based on other open weights models.

The dispute is that they released it with claims about having done some post training that improved the outputs. It was discovered that the model was not post trained like they claimed.

The HF page now says it’s a merge of models, which wasn’t there before. They’re trying to claim they accidentally uploaded the wrong model to HF and that they’ll upload the real one soon.

Basically, they thought they could splice two open weights models together and claim their team had accomplished some amazing post training, but they weren’t smart enough to realize that other researchers would discover that there wasn’t any post training.

moritzwarhier8d ago

Thanks for the factual clarification. This is so important when everyone already has their trigger finger on politics. Not meaning that politics are irrelevant here, see sister comment by jobim.

But it's impossible to form a nuanced opinion when political association has a higher priority than the facts; which, again, don't look flattering for the implementers.

iknowstuff8d ago

How do they just splice two models together?

2 more replies

s1artibartfast8d ago

How do you feel about the government or government contractors saying they did a bunch of work when they did nothing instead?

carlosjobim8d ago

This is a pure scam on tax payer money. But what else would be expected?

hootz8d ago

Apparently no public money was involved.

1 more reply

jrm48d ago

Unlike the big companies who do this, which often are merely impure scams on tax payer money a little more downstream.

2 more replies

internet20008d ago

Attribution isn't the relevant part. Lying about your lab's capabilities is.

Planktonne8d ago

That's also something all the AI companies have been doing.

2 more replies

vips7L8d ago

Sounds like the whole AI movement.

themafia8d ago

It seems to me like the lies are both for the same reason. To capture attention and profits that are not deserved.

functionmouse8d ago

leopards ate my face

outside23448d ago

But the whole game is lying and stealing isn't it?

adrian_b8d ago

I do not see anyone lying.

The model card says:

> Post-trained from Qwen 3.5 397B

The model card also says that they use an inference framework based on "SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs" by Shi et al.:

https://arxiv.org/abs/2510.05069

So the sources seem properly attributed.

They only claim that what they did to "Qwen 3.5 397B" has improved the LLM, including, as expected, with "strong performance in Portuguese".

3 more replies

bachmeier8d ago

"Their work"? First you had the original content creators that did 99.99% of the work. Then you had the US companies bundle it up into a frontier LLM. Then "they" did the "work" of using the US model as a foundation for their own. So in the sense of doing 0.00001% of the actual work that went into their product, sure.

I'd say it's more like someone forking a Linux distro, adding a few themes and fonts, and then complaining when someone else forks their distro and adds another theme.

dghlsakjg8d ago

That’s the joke.

1 more reply

bwilliams188d ago

That was the joke of the parent comment.

JoshStrobl8d ago

That joke really went over your head, huh...

harikb8d ago

It is only a problem if you claim it to be an independently developed OS with no attribution to base

idiotsecant8d ago

Oof this is delete your post level I think. Sorry bud, I been there.

rafaquintanilha8d ago· 12 in thread

I have no affiliation with them but here's what I think happened:

1. They claim the official model is based on Qwen 397B. It's likely they didn't disclose Nex Pro at all because Nex itself is based on the same base model (not saying they shouldn't).

2. The improvement would come from merging the weights PLUS on-policy distillation. The confusion is that the uploaded model didn't have the distillation at all.

3. It's important to notice they didn't advertise the model besides posting it on Reddit 2 days ago. It became viral organically, over the weekend, and during Brazil's World Cup debut (Brazilians will understand). Of course the mayor of Rio took the opportunity to capitalize over the free coverage, but that wasn't done in conjunction with the researchers.

4. I don't see why they would disclose Qwen 397B as base and mention the SwiReasoning paper but not mention Nex if all they did was to merge both models.

5. In any case, what they are claiming is easily verifiable once (if) they upload the right model.

throwa3562628d ago

Regarding #2

https://news.ycombinator.com/item?id=48529544

xiphias28d ago

This should be at the top: they uploaded the wrong model, they fixed it

1 more reply

matheusmoreira8d ago

I'm honestly impressed that this even happened at all. "Rio de Janeiro's homegrown LLM" is probably the last headline I ever expected to read on HN.

airstrike8d ago

Worth reminding everyone that Lua was also created in Rio, though admittedly at PUC rather than by the government.

Rio has a strong engineering talent pool, along with many other major capitals in Brazil

3 more replies

cscheid8d ago

Yes! That "prefeitura do Rio" huggingface URL is definitely shocking to read to this Brazilian as well (I'm assuming you and parent also are from your usernames).

Aurornis8d ago

> 2. The improvement would come from merging the weights PLUS on-policy distillation. The confusion is that the uploaded model didn't have the distillation at all.

They merged the base model with another lab’s fine tuned model. The improvements could have come from getting some of the fine tuned weights from the other model.

If they really had a better performing model that they “accidentally” forgot to upload, they could have uploaded the correct file by now.

croes8d ago

Seems they did

https://news.ycombinator.com/item?id=48529544

1 more reply

s1artibartfast8d ago

My understanding is that they didnt do any distalation. Tevery weight is a 60/40 element wise average of QWEN and NEX. Is this possible if the rio contracter did thei own post-training as claimed?

https://x.com/tenobrus/status/2066243352211996728/photo/1

motbus37d ago

It seems to me this is clearly a mistake. They would not even have the resources for it as far as I know and I think they are not even on a position to such bold claims.

matheusmoreira7d ago

Brazil could easily do it. Fine tuning requires some number of H100 cards. Trivial for the brazilian government. Existing brazilian labs are nothing compared to US hyperscalers but they do have enough capacity to fine tune Qwen. Santos Dumont has 248 H100s + 144 Grace Hoppers.

That's what makes this hilariously sad. Brazil could have done some good work here, but it just didn't. Brazil merged two models on a workstation.

1 more reply

smus8d ago

What do you mean World Cup debut? haven't they won 5?

alxndresp8d ago

They meant their first, opening game of this current World Cup tournament

unrvl22OP8d ago· 8 in thread

The municipality of Rio de Janeiro (via its IT company IplanRIO) released Rio-3.5-Open-397B, presented as a homegrown Qwen3.5 fine-tune that beats comparable open models on benchmarks. The linked issue argues it's actually a weighted merge of ~60% Nex-N2 Pro + ~40% Qwen3.5-397B-A17B - Nex-N2 having been released about a week earlier.

DonsDiscountGas8d ago

I didn't know model merging like that was possible. (Obviously possible from a pure software standpoint but I'm surprised it's effective)

bwhitty8d ago

As another poster above linked, it’s been shown to be effective since 2022: https://arxiv.org/abs/2203.05482

1 more reply

hypercube338d ago

Even merging models with themselves as shown here in the post how they got to the top of hugging face with two gpus

baobabKoodaa7d ago

A few years back these used to be called "Frankenstein models"

Lucasoato8d ago

So the problem isn’t in the missing attribution to Qwen, but with the fact that they didn’t mention Nex-N2 Pro right?

Aurornis8d ago

The problem is that they claimed to have made a big achievement with their home grown post training, and they expected to receive a lot of praise for it.

Then researchers looked at the weights and there is no post training at all.

They are now attributing both models they merged, but their excuse for the lack of post training is to claim they accidentally uploaded the wrong files.

1 more reply

vasco8d ago

Rio better have the best IT infrastructure and software in the world if they are spending time on LLMs. What a waste of tax payer money.

vitorgrs8d ago

Piaui state it's also doing a LLM it seems. But indeed it would make more sense if it was a national thing rather than local...

ekjhgkejhgk8d ago· 7 in thread

One funny thing about incompetence is that they don't have the competence to know that their incompetence is straightforward to verify by a competent person.

root-parent8d ago

You just described every single vibe coder...

vvpan8d ago

I think that's unfair to "vibe coding". If anybody explicitly claims to vibe coding something than they are admitting to low supervision of the code. And on the contrary you can also AI-produce code that you have supervised highly. I suppose there are people who both AI their code and push it as bespoke but I, for one, have not met such a person at our outside of work.

1 more reply

carlosjobim8d ago

Why would they care? They get their salaries and pensions and bonuses, and the tax payer is footing the bill.

thimabi8d ago

I wouldn’t describe what happened here as incompetence. As a “carioca”, I am pleasantly surprised to know that the government’s IT department is involved in AI work — even without the budget to create its own models from scratch.

antonvs8d ago

They could do AI work without trying to lie to the entire rest of the world.

arcticfox8d ago

This seems kind of insane though, every time I go to Rio I think of the potential of AI/technology to solve some problems and leave it even more paradisiacal... But working on their own model? Wtf? There are a million applications of existing ones there that should be followed up on instead.

reese_john8d ago

It is a testament to the bloat and overreach of the Brazilian state in the economy. Such endeavors should be left to the private sector

1 more reply

fkozlowski8d ago· 6 in thread

I'm honestly surprised that they even had the inclination to attempt creating a model. I guess it's bullish that a municipal IT department had the guts to try this?

Havoc8d ago

Merges and fine tunes are within reach of individuals with some money to burn so I’m sure a muni can do it

axus8d ago

I like the [dead] comment theory that they proposed a huge LLM training budget to the government, kept most of the money, and released a cheap merge to justify the grift.

dormento8d ago

This would be so very brazilian of them.

Source: am Huelander.

seba_dos18d ago

It's kinda weird to claim extraordinary results in such case though, as that brings a lot of eyes to it.

1 more reply

fkozlowski8d ago

Ah that makes sense

matheusmoreira8d ago

That's essentially Brazil's standard operating procedure. Wouldn't be surprising if that turned out to be the case.

Still, I'm actually impressed that this even happened at all. "Rio de Janeiro's homegrown LLM" is the last headline I expected to read on HN.

AnotherGoodName8d ago· 6 in thread

This is fascinating that it worked though. Can we just merge all the open weight models and get something better?

wds8d ago

I imagine it'd work the same as merging all the good-tasting foods to get an even tastier one

1 more reply

_3u108d ago

No, they need the same arch, but you can distill them into a single model. And yes, if you use the API directly Claude will often say it’s an open weight model (likely the ones it was distilled from)

nylonstrung8d ago

If you go to Civitai this is pretty how it works in that corner of the image generation world

Everything is using Stable Diffusion as underlying model, then most of the usage is merged of checkpoints

avereveard8d ago

most merge improve a small subset of "feeling" benchmark (too small, too specific, or out of distribution) and tend to show degradation on actual benchmark, with especially punishing result on long chain benchmarks.

also only work on matching architectures (i.e. finetunes/loras of the same model)

vor_8d ago

Merging related models has been a very common practice for years. See the Stable Diffusion community.

dindunuf8d ago

that kinda worked in llama 1/2 era, not between different models but between finetunes of the same model. the briefly legendary Mythomax was IIRC a merge of 5+ tunes, some of which were merges themselves.

jrm48d ago· 5 in thread

“Well, Steve (Jobs), I think it’s more like we both had this rich neighbor named Xerox, and I broke into his house to steal the TV set, but I found out that you had already stolen it.”

-- Bill Gates

ckcheng8d ago

What’s more funny to me is the set up to that quote:

> Bill Gates had somehow manifested, alone, surrounded by ten Apple employees. … Steve started yelling at Bill, asking him why he violated their agreement.

And what’s more interesting is the conclusion:

> Apple filed a monumental copyright lawsuit against Microsoft in 1988, but they eventually lost on a technicality (the judge ruled that Apple inadvertently gave Microsoft a perpetual license to the Mac user interface in November 1985).

Microsoft didn’t steal Apple’s GUI … Apple gave it to them.

alexgoodhart8d ago

That isn’t fully true is it?

Microsoft claimed that its software’s use of various visualizations related to window state was covered by the 1985 agreement, and Apple claimed that this was not true; those window states were produced by Macintosh while Microsoft’s software was being rendered in the Mac environment.

> In his March 20, 1989 Order, Judge Schwarzer declined to consider whether the visual displays in issue were generated by the Microsoft application programs or by the Macintosh system software. The point arose in connection with Microsoft's argument that the 1985 Agreement licensed to Microsoft all visual displays that could possibly be called up by running the five Microsoft application programs on the Macintosh system software then or in the future. 709 F. Supp. at 929. Judge Schwarzer concluded that Microsoft's contention would "defy common sense." Id.

themafia8d ago

Two spoiled rich kids arguing over who's morality is the least worst.

That this moment is held up as some great exchange in business is annoying. That our regulatory agencies are perennially sleep at the switch and allow this nonsense to keep happening is extremely frustrating.

2 more replies

wunderlotus8d ago

lmao i really hope this is a real quote cuz it’s a banger

ckcheng8d ago

Apparently:

https://www.folklore.org/A_Rich_Neighbor_Named_Xerox.html

jordz8d ago· 3 in thread

Can someone please explain or link to some information about how models are merged? Is this genuinely merging weights mathematically or some kind of distillation (presumably not if they’ve done zero training as the post suggests).

calebkaiser8d ago

This is a good starting point: https://huggingface.co/docs/peft/developer_guides/model_merg...

But yes, in general, merging refers to techniques that directly blend the weights of different models mathematically. It had a big moment of popularity ~2 years ago, with many so-called "Frankenmodels" popping up on leaderboards.

I tend to think of merging as belonging to the same general umbrella as things like "abliteration", or other techniques that surgically modify the weights of a model without a traditional training/tuning loop. Maxime Labonne is a great person to follow if you're interested in this general area.

jxmorris128d ago

There’s nothing to read.

Model A: A_1, …, A_n Model B: B_1, …, B_n

C_i = A_i * p + B_i * (1 - p)

In other words, it’s just a linear combination of the other models’ weights, per position.

joe_the_user8d ago

It's been a while since I looked at neural networks in detail. Do all the large models have a close enough architecture that this makes sense? Do they have the same number of layers and width? I had thought that each model it's own "secret sauce" of normal and special layers (convolution, max-pooling, something-something) stacked together. Genuinely curious.

AlienRobot8d ago· 2 in thread

The model's webpage at https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B says it's a merge now. It previously didn't contain this paragraph:

>The model is built via a merge of https://huggingface.co/nex-agi/Nex-N2-Pro and https://huggingface.co/Qwen/Qwen3.5-397B-A17B, proceeded by On-Policy Distillation from a stronger model. We detected an incorrect upload in the previous version, where the base merged version was upload instead of the final distilled model. We are sorry for the confusion and apologize profusely.

Incidentally are people using Github issues as blogs now?

jonchurch_8d ago

Edit: I didnt even notice until someone pointed out this was on the Nex-n2 repo not the rio one, now I understand the OP’s confusion!

It wasnt framed as an issue which is the norm breakage I think you’re reacting to, as in they didnt ask that the readme be updated etc, but it is common now for folks to use a project’s issue tracker to name and shame them in a place they cant easily ignore.

Whether that’s right, prosocial, or professional is up for debate (as well as if any single definition of etiquette can be expected in 2026 on an issue tracker).

But surely you can see the optics reason why someone would take their complaint to the repo directly? It pressures the maintainers to respond, it allows for a pile on from the internet, and makes any decision to lock down a hostile thread into its own kind of statement.

The maintainers should absolutely post an official response and lock the thread though, it will likely get ugly in there.

ChoosesBarbecue8d ago

But this is posted on Nex's GitHub, not on "Rio de Janeiro's" GitHub.

i.e. this is the maintainer posting on their own GitHub Issues.

blitzar8d ago· 2 in thread

Its stupid and hilarious when someone in Rio does it; when a techbro in silicon valley does it they get VC funding, a maserati and an entry on the 30 under 30 list.

rgbrth7d ago

I don't think people are saying it's stupid. It's just funny that potentially some random municipality worker is going well beyond their work scope and making contributions in the AI world.

Could be from Rio, could be from any municipality anywhere in the world. The fact that the account is actually from the town hall rahter than a personal account also makes it funnier.

rsynnott7d ago

> and an entry on the 30 under 30 list.

Ah, yes, the Nobel Prize for Fraud.

(I'm seriously kind of amazed they're still publishing those.)

yieldcrv8d ago· 2 in thread

Didn’t the last thread about this have someone from the lab or an enthusiast in Rio saying exactly that?

Its a fine tune of Qwen

Not a conspiracy

daemonologist8d ago

The allegation here is that it's not actually a fine-tune of Qwen, but instead an undisclosed mashup (merge) of someone else's fine-tune of Qwen and the original model. Rio subsequently said that the model was in fact a merge, that they did additional fine-tuning after the merge, and that they accidentally uploaded the base merge instead of the version with additional fine-tuning. But this seems like quite an oversight...

yieldcrv8d ago

> But this seems like quite an oversight...

Not to me, what would people like to happen? Who are those people? And why do they care?

1 more reply

MadrasTh0rn8d ago· 2 in thread

Not surprised

nom8d ago

why not?

diego_moita8d ago

It is a recurrent Brazilian meme: Rio is known in Brazil as "terra de bandido" (gangster's land).

The majority of their politicians have ties to organized crime. There is a virtual revolving door between police and crime, where people migrate from one to the other.

It is like Chicago in the 20s, Naples and Medelin in the 80s or Moscow and Culiacan (Sinaloa, Mexico) today.

2 more replies

FooBarWidget8d ago· 1 in thread

Can anyone explain to me what a merge is and why that works? It seems utterly bizarre to me that you can just merge weights. You can't make a working program by just merging machine instruction pages. Aren't weights tightly coupled to a specific architecture?

antonvs8d ago

In this case both sets of weights ultimately came from the same model. The Nex model they used is a fine-time of Qwen, which was the other model they used.

I'm not an expert in this area, but it's not too hard to see how a merge like that could turn out ok.

delusional8d ago· 1 in thread

It's absolutely insane to me that we are now at a point where the top of the front page of hacker news is a random GitHub issue about attribution to some random LLM merge, written in just the most disgusting AI slop style.

I would like to downvote this please.

vor_8d ago

There's been a noticeable drop in quality. It's often a blend of AI culture war posts and arbitrary Github links.

alfiedotwtf8d ago· 1 in thread

Wasn’t it already obvious given the awfully familiar parameter numbers?

intoXbox8d ago

That only tells what base architecture they used, but fine tuning does not increase the number of weights, it just adapts the weights to improve better on a fine tuning dataset- something they claimed they had done

aaronbrethorst8d ago

They really missed out by not calling it Neuromancer.

jkwang8d ago

This is a concerning pattern. Rebranding merged models as "homegrown" without disclosure undermines trust in open-source AI development. The community needs better provenance tracking and transparency standards for model releases.

thelonelyborg8d ago

this is probably occurring all over the world including in startups.

RandyOrion7d ago

Please do not claim you trained a new model, only to got caught red-handed by others. There are already several people or groups did that, got caught, and vanished in no time.

Check how the "authors" of "this model" react to this problem [1]. See how they deal with this problem by first changing their affiliation from https://iplanrio.rio.rj.gov.br to https://iplanrio.prefeitura.rio [2], then saying that they are sorry for being caught [3], then just remove all their affiliations once for all [4].

I think the "authors" of "this model" [5] should be held accountable until they upload new checkpoints, and the performance of the new model is verified by third-parties.

P.S. To people who downvoted me, show me why you're doing this.

[1] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...

[2] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...

[3] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...

[4] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...

[5] https://huggingface.co/prefeitura-rio

nicman237d ago

is it any good?

diego_moita8d ago

WHAT!? There are thieves in Rio de Janeiro?

Oh, I am so SHOCKED, so SHOCKED! /s

Explaining the joke: in Brazil, Rio de Janeiro is known as "Terra de bandido" (Gangster's Land).

Kinda like Chicago in the 20's or Naples and Palermo in the 90s.

pelasaco8d ago

an eternal 7x1.. and I am not talking about Curaçao..

Havoc8d ago

Nex in turn is also based on qwen so don’t think they’re too far off

j / k navigate · click thread line to collapse

236 comments

122 comments · 24 top-level

hintymad8d ago· 20 in thread

I find it amazing how robust the current deep learning models are. A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.

Aurornis8d ago

> A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.

Enhanced it on a couple benchmarks, supposedly.

This happens with a lot of the models that are modified to remove censorship. They succeed in getting the model to emit previously censored outputs, but the overall output quality decreases.

andai8d ago

They seem to have deleted most of the README now, but the archived version has benchmarks.

https://web.archive.org/web/20260614082641/https://huggingfa...

And the Nex benchmarks for comparison

https://huggingface.co/nex-agi/Nex-N2-Pro

Rio seems to be about halfway between Qwen 3.5 and Nex, as you'd expect?

monster_truck8d ago

I don't think your last point is correct. Ablation, when done correctly, seems to increase the quality and typically also the performance too.

3 more replies

manquer8d ago

> game is to turn knobs until you get a benchmark run that shows an improvement, then ship it

i.e reinforcement learning against a weak reward function - benchmark is insufficiently complex and is not representative of the real world sufficiently.

The "game", i.e. decision tree can be modeled as a multi-arm bandit problem, to deploy finite resources ( compute) toward exploitation/exploration .

The main issue is each training / fine-tune is very expensive so number of chances at the slot so to speak is pretty limited today.

x3128d ago

This works because Nex itself is a finetune of Qwen3.5 (https://huggingface.co/nex-agi/Nex-N2-Pro). It's merging Qwen3.5 with a Qwen3.5 finetune.

oofbey8d ago

hashmap8d ago

woadwarrior018d ago

It's is a well known idea[1], although it's still surprising that something as simple, even works.

[1]: https://arxiv.org/abs/2203.05482

kolanos8d ago

This team could have stopped here and still had something interesting (albeit not novel) to show. But the hype cycle was too tempting.

itkovian_8d ago

It is not understood why it works so well.

teravor8d ago

is that actually how they train them in the datacenter? the trillion sized weight vector gets cloned and sent off to groups of GPUs and averaged after?

themafia8d ago

> A simple linear combination of every weight did not degrade the performance of the model, but enhanced it.

Which could be a signal that your "performance" was so abysmal in the first place that even randomly applied training methods can't make it _worse_.

tarruda8d ago

kristjansson8d ago

https://thickets.mit.edu

moritzwarhier8d ago

If this is true, it really would be impressive.

Davidzheng8d ago

it's interesting that this was even guessed at

Davidzheng8d ago

ok I guess they had other clues then if you do any sort of comparison vs Nex & Qwen probably a lot of weird coincidences will show up if somehow the three weights are not linearly independent lol

meindnoch8d ago

It shows that LLMs are an extremely wasteful approach to intelligence.

kristjansson8d ago

or that intelligence is merely the composition of many redundant, lossy, ~random components

antonvs8d ago

Compared to what?

zinodaur8d ago· 20 in thread

Oh no, someone is profiting off of their work without proper attribution!?!?

Aurornis8d ago

This is an open weights model based on other open weights models.

The dispute is that they released it with claims about having done some post training that improved the outputs. It was discovered that the model was not post trained like they claimed.

The HF page now says it’s a merge of models, which wasn’t there before. They’re trying to claim they accidentally uploaded the wrong model to HF and that they’ll upload the real one soon.

moritzwarhier8d ago

Thanks for the factual clarification. This is so important when everyone already has their trigger finger on politics. Not meaning that politics are irrelevant here, see sister comment by jobim.

But it's impossible to form a nuanced opinion when political association has a higher priority than the facts; which, again, don't look flattering for the implementers.

iknowstuff8d ago

How do they just splice two models together?

2 more replies

s1artibartfast8d ago

How do you feel about the government or government contractors saying they did a bunch of work when they did nothing instead?

carlosjobim8d ago

This is a pure scam on tax payer money. But what else would be expected?

hootz8d ago

Apparently no public money was involved.

1 more reply

jrm48d ago

Unlike the big companies who do this, which often are merely impure scams on tax payer money a little more downstream.

2 more replies

internet20008d ago

Attribution isn't the relevant part. Lying about your lab's capabilities is.

Planktonne8d ago

That's also something all the AI companies have been doing.

2 more replies

vips7L8d ago

Sounds like the whole AI movement.

themafia8d ago

It seems to me like the lies are both for the same reason. To capture attention and profits that are not deserved.

functionmouse8d ago

leopards ate my face

outside23448d ago

But the whole game is lying and stealing isn't it?

adrian_b8d ago

I do not see anyone lying.

The model card says:

> Post-trained from Qwen 3.5 397B

The model card also says that they use an inference framework based on "SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs" by Shi et al.:

https://arxiv.org/abs/2510.05069

So the sources seem properly attributed.

They only claim that what they did to "Qwen 3.5 397B" has improved the LLM, including, as expected, with "strong performance in Portuguese".

3 more replies

bachmeier8d ago

I'd say it's more like someone forking a Linux distro, adding a few themes and fonts, and then complaining when someone else forks their distro and adds another theme.

dghlsakjg8d ago

That’s the joke.

1 more reply

bwilliams188d ago

That was the joke of the parent comment.

JoshStrobl8d ago

That joke really went over your head, huh...

harikb8d ago

It is only a problem if you claim it to be an independently developed OS with no attribution to base

idiotsecant8d ago

Oof this is delete your post level I think. Sorry bud, I been there.

rafaquintanilha8d ago· 12 in thread

I have no affiliation with them but here's what I think happened:

1. They claim the official model is based on Qwen 397B. It's likely they didn't disclose Nex Pro at all because Nex itself is based on the same base model (not saying they shouldn't).

2. The improvement would come from merging the weights PLUS on-policy distillation. The confusion is that the uploaded model didn't have the distillation at all.

4. I don't see why they would disclose Qwen 397B as base and mention the SwiReasoning paper but not mention Nex if all they did was to merge both models.

5. In any case, what they are claiming is easily verifiable once (if) they upload the right model.

throwa3562628d ago

Regarding #2

https://news.ycombinator.com/item?id=48529544

xiphias28d ago

This should be at the top: they uploaded the wrong model, they fixed it

1 more reply

matheusmoreira8d ago

I'm honestly impressed that this even happened at all. "Rio de Janeiro's homegrown LLM" is probably the last headline I ever expected to read on HN.

airstrike8d ago

Worth reminding everyone that Lua was also created in Rio, though admittedly at PUC rather than by the government.

Rio has a strong engineering talent pool, along with many other major capitals in Brazil

3 more replies

cscheid8d ago

Yes! That "prefeitura do Rio" huggingface URL is definitely shocking to read to this Brazilian as well (I'm assuming you and parent also are from your usernames).

Aurornis8d ago

> 2. The improvement would come from merging the weights PLUS on-policy distillation. The confusion is that the uploaded model didn't have the distillation at all.

They merged the base model with another lab’s fine tuned model. The improvements could have come from getting some of the fine tuned weights from the other model.

If they really had a better performing model that they “accidentally” forgot to upload, they could have uploaded the correct file by now.

croes8d ago

Seems they did

https://news.ycombinator.com/item?id=48529544

1 more reply

s1artibartfast8d ago

My understanding is that they didnt do any distalation. Tevery weight is a 60/40 element wise average of QWEN and NEX. Is this possible if the rio contracter did thei own post-training as claimed?

https://x.com/tenobrus/status/2066243352211996728/photo/1

motbus37d ago

It seems to me this is clearly a mistake. They would not even have the resources for it as far as I know and I think they are not even on a position to such bold claims.

matheusmoreira7d ago

That's what makes this hilariously sad. Brazil could have done some good work here, but it just didn't. Brazil merged two models on a workstation.

1 more reply

smus8d ago

What do you mean World Cup debut? haven't they won 5?

alxndresp8d ago

They meant their first, opening game of this current World Cup tournament

unrvl22OP8d ago· 8 in thread

DonsDiscountGas8d ago

I didn't know model merging like that was possible. (Obviously possible from a pure software standpoint but I'm surprised it's effective)

bwhitty8d ago

As another poster above linked, it’s been shown to be effective since 2022: https://arxiv.org/abs/2203.05482

1 more reply

hypercube338d ago

Even merging models with themselves as shown here in the post how they got to the top of hugging face with two gpus

baobabKoodaa7d ago

A few years back these used to be called "Frankenstein models"

Lucasoato8d ago

So the problem isn’t in the missing attribution to Qwen, but with the fact that they didn’t mention Nex-N2 Pro right?

Aurornis8d ago

The problem is that they claimed to have made a big achievement with their home grown post training, and they expected to receive a lot of praise for it.

Then researchers looked at the weights and there is no post training at all.

They are now attributing both models they merged, but their excuse for the lack of post training is to claim they accidentally uploaded the wrong files.

1 more reply

vasco8d ago

Rio better have the best IT infrastructure and software in the world if they are spending time on LLMs. What a waste of tax payer money.

vitorgrs8d ago

Piaui state it's also doing a LLM it seems. But indeed it would make more sense if it was a national thing rather than local...

ekjhgkejhgk8d ago· 7 in thread

One funny thing about incompetence is that they don't have the competence to know that their incompetence is straightforward to verify by a competent person.

root-parent8d ago

You just described every single vibe coder...

vvpan8d ago

1 more reply

carlosjobim8d ago

Why would they care? They get their salaries and pensions and bonuses, and the tax payer is footing the bill.

thimabi8d ago

antonvs8d ago

They could do AI work without trying to lie to the entire rest of the world.

arcticfox8d ago

reese_john8d ago

It is a testament to the bloat and overreach of the Brazilian state in the economy. Such endeavors should be left to the private sector

1 more reply

fkozlowski8d ago· 6 in thread

I'm honestly surprised that they even had the inclination to attempt creating a model. I guess it's bullish that a municipal IT department had the guts to try this?

Havoc8d ago

Merges and fine tunes are within reach of individuals with some money to burn so I’m sure a muni can do it

axus8d ago

I like the [dead] comment theory that they proposed a huge LLM training budget to the government, kept most of the money, and released a cheap merge to justify the grift.

dormento8d ago

This would be so very brazilian of them.

Source: am Huelander.

seba_dos18d ago

It's kinda weird to claim extraordinary results in such case though, as that brings a lot of eyes to it.

1 more reply

fkozlowski8d ago

Ah that makes sense

matheusmoreira8d ago

That's essentially Brazil's standard operating procedure. Wouldn't be surprising if that turned out to be the case.

Still, I'm actually impressed that this even happened at all. "Rio de Janeiro's homegrown LLM" is the last headline I expected to read on HN.

AnotherGoodName8d ago· 6 in thread

This is fascinating that it worked though. Can we just merge all the open weight models and get something better?

wds8d ago

I imagine it'd work the same as merging all the good-tasting foods to get an even tastier one

1 more reply

_3u108d ago

nylonstrung8d ago

If you go to Civitai this is pretty how it works in that corner of the image generation world

Everything is using Stable Diffusion as underlying model, then most of the usage is merged of checkpoints

avereveard8d ago

also only work on matching architectures (i.e. finetunes/loras of the same model)

vor_8d ago

Merging related models has been a very common practice for years. See the Stable Diffusion community.

dindunuf8d ago

jrm48d ago· 5 in thread

“Well, Steve (Jobs), I think it’s more like we both had this rich neighbor named Xerox, and I broke into his house to steal the TV set, but I found out that you had already stolen it.”

-- Bill Gates

ckcheng8d ago

What’s more funny to me is the set up to that quote:

> Bill Gates had somehow manifested, alone, surrounded by ten Apple employees. … Steve started yelling at Bill, asking him why he violated their agreement.

And what’s more interesting is the conclusion:

Microsoft didn’t steal Apple’s GUI … Apple gave it to them.

alexgoodhart8d ago

That isn’t fully true is it?

themafia8d ago

Two spoiled rich kids arguing over who's morality is the least worst.

2 more replies

wunderlotus8d ago

lmao i really hope this is a real quote cuz it’s a banger

ckcheng8d ago

Apparently:

https://www.folklore.org/A_Rich_Neighbor_Named_Xerox.html

jordz8d ago· 3 in thread

calebkaiser8d ago

This is a good starting point: https://huggingface.co/docs/peft/developer_guides/model_merg...

jxmorris128d ago

There’s nothing to read.

Model A: A_1, …, A_n Model B: B_1, …, B_n

C_i = A_i * p + B_i * (1 - p)

In other words, it’s just a linear combination of the other models’ weights, per position.

joe_the_user8d ago

AlienRobot8d ago· 2 in thread

The model's webpage at https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B says it's a merge now. It previously didn't contain this paragraph:

Incidentally are people using Github issues as blogs now?

jonchurch_8d ago

Edit: I didnt even notice until someone pointed out this was on the Nex-n2 repo not the rio one, now I understand the OP’s confusion!

Whether that’s right, prosocial, or professional is up for debate (as well as if any single definition of etiquette can be expected in 2026 on an issue tracker).

The maintainers should absolutely post an official response and lock the thread though, it will likely get ugly in there.

ChoosesBarbecue8d ago

But this is posted on Nex's GitHub, not on "Rio de Janeiro's" GitHub.

i.e. this is the maintainer posting on their own GitHub Issues.

blitzar8d ago· 2 in thread

Its stupid and hilarious when someone in Rio does it; when a techbro in silicon valley does it they get VC funding, a maserati and an entry on the 30 under 30 list.

rgbrth7d ago

I don't think people are saying it's stupid. It's just funny that potentially some random municipality worker is going well beyond their work scope and making contributions in the AI world.

Could be from Rio, could be from any municipality anywhere in the world. The fact that the account is actually from the town hall rahter than a personal account also makes it funnier.

rsynnott7d ago

> and an entry on the 30 under 30 list.

Ah, yes, the Nobel Prize for Fraud.

(I'm seriously kind of amazed they're still publishing those.)

yieldcrv8d ago· 2 in thread

Didn’t the last thread about this have someone from the lab or an enthusiast in Rio saying exactly that?

Its a fine tune of Qwen

Not a conspiracy

daemonologist8d ago

yieldcrv8d ago

> But this seems like quite an oversight...

Not to me, what would people like to happen? Who are those people? And why do they care?

1 more reply

MadrasTh0rn8d ago· 2 in thread

Not surprised

nom8d ago

why not?

diego_moita8d ago

It is a recurrent Brazilian meme: Rio is known in Brazil as "terra de bandido" (gangster's land).

The majority of their politicians have ties to organized crime. There is a virtual revolving door between police and crime, where people migrate from one to the other.

It is like Chicago in the 20s, Naples and Medelin in the 80s or Moscow and Culiacan (Sinaloa, Mexico) today.

2 more replies

FooBarWidget8d ago· 1 in thread

antonvs8d ago

In this case both sets of weights ultimately came from the same model. The Nex model they used is a fine-time of Qwen, which was the other model they used.

I'm not an expert in this area, but it's not too hard to see how a merge like that could turn out ok.

delusional8d ago· 1 in thread

I would like to downvote this please.

vor_8d ago

There's been a noticeable drop in quality. It's often a blend of AI culture war posts and arbitrary Github links.

alfiedotwtf8d ago· 1 in thread

Wasn’t it already obvious given the awfully familiar parameter numbers?

intoXbox8d ago

aaronbrethorst8d ago

They really missed out by not calling it Neuromancer.

jkwang8d ago

thelonelyborg8d ago

this is probably occurring all over the world including in startups.

RandyOrion7d ago

Please do not claim you trained a new model, only to got caught red-handed by others. There are already several people or groups did that, got caught, and vanished in no time.

I think the "authors" of "this model" [5] should be held accountable until they upload new checkpoints, and the performance of the new model is verified by third-parties.

P.S. To people who downvoted me, show me why you're doing this.

[1] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...

[2] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...

[3] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...

[4] https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...

[5] https://huggingface.co/prefeitura-rio

nicman237d ago

is it any good?

diego_moita8d ago

WHAT!? There are thieves in Rio de Janeiro?

Oh, I am so SHOCKED, so SHOCKED! /s

Explaining the joke: in Brazil, Rio de Janeiro is known as "Terra de bandido" (Gangster's Land).

Kinda like Chicago in the 20's or Naples and Palermo in the 90s.

pelasaco8d ago

an eternal 7x1.. and I am not talking about Curaçao..

Havoc8d ago

Nex in turn is also based on qwen so don’t think they’re too far off

j / k navigate · click thread line to collapse