Rio de Janeiro's city government model Rio3.5 beats Qwen3.7 in recent benchmarks (opens in new tab)

(twitter.com)

142 pointslucasfcosta10d ago45 comments

45 comments

29 comments · 13 top-level

VoidWhisperer10d ago· 6 in thread

https://github.com/nex-agi/Nex-N2/issues/4

Seems that they didn't make/train a new novel model, they did a mix of two existing models and then gave it an instruction to say it was 'Rio, trained by Rio AI Labs'

w4yai10d ago

> The model is built via a merge of https://huggingface.co/nex-agi/Nex-N2-Pro and https://huggingface.co/Qwen/Qwen3.5-397B-A17B, proceeded by On-Policy Distillation from a stronger model. We detected an incorrect upload in the previous version, where the base merged version was upload instead of the final distilled model. We are sorry for the confusion and apologize profusely.

https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...

daquisu10d ago

It was a recent edit though. Yesterday snapshot: https://web.archive.org/web/20260613072958/https://huggingfa...

1 more reply

danieldrehmer10d ago

can you offer a 4-bit quantized version and name it Zé Pequeno, pretty please?

scotty799d ago

I'd love to see people figuring out how to build models from several smaller ones. We could then train small specialized models and deploy setups more optimized for any given task. Modular LLMs should be a thing.

1 more reply

urbnspacecowboy10d ago

See discussion: https://news.ycombinator.com/item?id=48528371

pixel_popping8d ago

To be fair, I still find it to be a great initiative.

mettamage10d ago· 4 in thread

https://xcancel.com/ZenMagnets/status/2065796012820848699

Correct me if I'm wrong but reading through the comments of the thread this seems to be post training/fine tuning.

oceansky10d ago

Yes. It's post training in qwen using the novel SwiReasoning framework.

hedgehog10d ago

I hadn't seen SwiReasoning (https://swireasoning.github.io, paper and code), it looks like that works at generation time without any requirements on the model. It increases token-efficiency and accuracy, but at first skim it seems like this would be incompatible with multi-token prediction. For large reductions in token budget it could be worth it.

1 more reply

Kelteseth10d ago

Thanks, Firefox and uBlock does not let me watch any X content (I guess this is a good thing)

drnick110d ago

Same thing here, X content and trackers are blocked by my Firefox settings. The occasional inconvenience is a small price to pay not to be profiled by X, Google, FB, Amazon, and countless other Internet parasites.

Aurornis10d ago· 2 in thread

A city government funding a fine-tune of a model is interesting.

As for the benchmarks: If you spend any time playing with fine tunes of published models you know that benchmarks are gamed so much that they're a useless indicator of performance for models from small teams. It's too easy to fine tune a model to perform well on the benchmarks, release it, put a line on your resume saying you released a model that beat the major labs on benchmarks, and then try to use that to jump into a new job. The temptation is high.

There are a lot of fringe models and fine tunes that claim to have better performance on some benchmark. Then you try to use them and find they're often worse at general tasks than the base model.

I would wait and see if these results hold across other benchmarks. It's cool that the city is doing something with AI, but this is something where extraordinary claims require extraordinary evidence. I doubt a small, previously unknown team has unlocked something secret that the team who made Qwen couldn't figure out. It's more likely it was fine tuned for a specific outcome (possibly these benchmarks) and performance in other areas was reduced as a consequence.

marcosdumay10d ago

> A city government funding a fine-tune of a model is interesting.

Looks like it's an IT services government-owned company.

Most likely, they saw some business opportunity on selling it around for cities.

embedding-shape10d ago

Indeed, this is all very true, I'd say it's true for the larger teams too, the entire ecosystem is so gamed by now that if you don't have your own private benchmarks with private test cases you haven't shared publicly, it's almost impossible to get a fair picture how well a model works, unless you actually sit down and use it.

mrandish10d ago· 2 in thread

> Rio de Janeiro's city government model...

Because... lack of a good open weight LLM is a pressing need high on the municipal priorities list for Rio de Janeiro citizens?

true_religion10d ago

Should governments not take actions that later benefit the academic, scientific, and economic welfare of their constituents?

Or is it that it’s a city doing this?

Now Brazil does know how to boondoggle its finances for a prestigious cause with little return (e.g. the Olympics games) but this is far smaller a cost, more akin to a city setting up a tech accelerator or making a media campaign about how important STEM is.

senorrib10d ago

It's the municipal IT company, and the dude that did this is a volunteer.

ramon15610d ago· 2 in thread

Every day I'm reminded why I don't spend time on twitter. What use does it have to claim "X is better than Y in benchmark Z, disagreeing with that means disagreeing with me"

Information is power, dick measurements are not.

itsthecourier10d ago

my length is a valid data point for the sake of science

reed123410d ago

No, I love twitter— and you are wrong.

adrian_b10d ago

> Post-trained from Qwen 3.5 397B

Model Card:

https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B

arjie10d ago

Benchmaxxing is the new “have a crypto trading strategy”. No one is impressed by it except non practitioners.

1 more reply

HeliumHydride10d ago

https://www.reddit.com/r/LocalLLaMA/comments/1u4fzg1/new_mod... https://x.com/SemiAnalysis_/status/2065894494935933191

betimsl10d ago

The problem with these is the tool calling. From my experiments qwen agent almost always fails with tool calling and porting the correct config is quite tedious.

Rio3.5 with Qwen compatible tool calling, we need that :)

dizhn9d ago

Mr Erdoğan launched and initiative yesterday to become the leader in the AI space. As absurd of a claim as his 2023 (hard) landing on the moon.

pelasaco10d ago

The Taubaté LLM Hoax https://en.wikipedia.org/wiki/Taubat%C3%A9_pregnancy_hoax

xbar10d ago

Sexy.

hmokiguess10d ago

Never let them know your next move

j / k navigate · click thread line to collapse

45 comments

29 comments · 13 top-level

VoidWhisperer10d ago· 6 in thread

https://github.com/nex-agi/Nex-N2/issues/4

Seems that they didn't make/train a new novel model, they did a mix of two existing models and then gave it an instruction to say it was 'Rio, trained by Rio AI Labs'

w4yai10d ago

https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B/comm...

daquisu10d ago

It was a recent edit though. Yesterday snapshot: https://web.archive.org/web/20260613072958/https://huggingfa...

1 more reply

danieldrehmer10d ago

can you offer a 4-bit quantized version and name it Zé Pequeno, pretty please?

scotty799d ago

1 more reply

urbnspacecowboy10d ago

See discussion: https://news.ycombinator.com/item?id=48528371

pixel_popping8d ago

To be fair, I still find it to be a great initiative.

mettamage10d ago· 4 in thread

https://xcancel.com/ZenMagnets/status/2065796012820848699

Correct me if I'm wrong but reading through the comments of the thread this seems to be post training/fine tuning.

oceansky10d ago

Yes. It's post training in qwen using the novel SwiReasoning framework.

hedgehog10d ago

1 more reply

Kelteseth10d ago

Thanks, Firefox and uBlock does not let me watch any X content (I guess this is a good thing)

drnick110d ago

Aurornis10d ago· 2 in thread

A city government funding a fine-tune of a model is interesting.

There are a lot of fringe models and fine tunes that claim to have better performance on some benchmark. Then you try to use them and find they're often worse at general tasks than the base model.

marcosdumay10d ago

> A city government funding a fine-tune of a model is interesting.

Looks like it's an IT services government-owned company.

Most likely, they saw some business opportunity on selling it around for cities.

embedding-shape10d ago

mrandish10d ago· 2 in thread

> Rio de Janeiro's city government model...

Because... lack of a good open weight LLM is a pressing need high on the municipal priorities list for Rio de Janeiro citizens?

true_religion10d ago

Should governments not take actions that later benefit the academic, scientific, and economic welfare of their constituents?

Or is it that it’s a city doing this?

senorrib10d ago

It's the municipal IT company, and the dude that did this is a volunteer.

ramon15610d ago· 2 in thread

Every day I'm reminded why I don't spend time on twitter. What use does it have to claim "X is better than Y in benchmark Z, disagreeing with that means disagreeing with me"

Information is power, dick measurements are not.

itsthecourier10d ago

my length is a valid data point for the sake of science

reed123410d ago

No, I love twitter— and you are wrong.

adrian_b10d ago

> Post-trained from Qwen 3.5 397B

Model Card:

https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B

arjie10d ago

Benchmaxxing is the new “have a crypto trading strategy”. No one is impressed by it except non practitioners.

1 more reply

HeliumHydride10d ago

https://www.reddit.com/r/LocalLLaMA/comments/1u4fzg1/new_mod... https://x.com/SemiAnalysis_/status/2065894494935933191

betimsl10d ago

The problem with these is the tool calling. From my experiments qwen agent almost always fails with tool calling and porting the correct config is quite tedious.

Rio3.5 with Qwen compatible tool calling, we need that :)

dizhn9d ago

Mr Erdoğan launched and initiative yesterday to become the leader in the AI space. As absurd of a claim as his 2023 (hard) landing on the moon.

pelasaco10d ago

The Taubaté LLM Hoax https://en.wikipedia.org/wiki/Taubat%C3%A9_pregnancy_hoax

xbar10d ago

Sexy.

hmokiguess10d ago

Never let them know your next move

j / k navigate · click thread line to collapse