It's being kept alive because the Canadian government is desperate to have a local frontier lab and is willing to inject funding and force its adoption in government services, but leadership at Cohere is known to be weak in Canadian tech circles, and they pivoting to an enterprise-first market around production RAG rather than anything close to frontier work.
I'm glad they're doing open weight releases but they're not viable in the long-run. It is embarrassing sharing similar spaces with them, but I'll try this release out in OpenCode and re-think afterwards.
Regular Qwen 3.6 benchmarks slightly better and has much wider software support though, so this is probably of interest only to organizations which disallow models trained in China.
Cool to see this but seems like it would be pretty expensive to run
Based on [2] a 30B model needs something like 2e+23 FLOPS to train from scratch whereas a 1.6T model needs something like 1e+27 FLOPs to train. So DeepSeek v4 Pro was roughly 5000x more expensive to train than this model. I'm not totally sure how MOE affects scaling laws, so these numbers might be different in reality, but it gives you a good ballpark estimate of the difference in training scale.
[1] https://arxiv.org/abs/2505.12781 [2] https://arxiv.org/abs/2203.15556
More competition is better.
But yeah, it's not the best look to have to stretch and say it's "competitive" with other models in it's weight class, when it offers not much else that's useful or novel.