GPT‑NL: a sovereign language model for the Netherlands (opens in new tab)

(tno.nl)

251 pointsroot-parent5d ago301 comments

301 comments

144 comments · 35 top-level

armcat5d ago· 26 in thread

I keep seeing these "sovereign" LMs time and time again. In Sweden we had GPT-SW3 (https://www.ai.se/en/project/gpt-sw3) and same story there. Instead of burning money on "sovereign" claims, national research labs should instead focus on building on top of solid baselines (like Qwen/Kimi) and finetuning frontier models with real agentic utility that can be applied across actual use cases and can be widely used by its people, basically for free. Nations should mirror what Cursor has done with Composer 2.5 for example.

appplication4d ago

Disagree, it’s in the country’s best interest to facilitate internal expertise on the full stack and own their “supply chain” so to speak and fight brain drain. The outcome isn’t just the model, it’s the expertise. Otherwise all their smartest folks will depart for countries where LLM development is strongest.

bradhe4d ago

Got it, so every country should focus on having a mediocre-at-best AI strategy by refusing to work together? Surely this will create a better future instead of pooling resources.

This vaguely-nationalist world view around tech that’s emerging in Europe is dangerous, man.

On the brain drain problem in particular, one way to ensure talent sticks around is to create a good environment for people to do their best work. In much of Europe, getting bureaucracy out of the way and encouraging real investment would go a long way. People leave because they can make more money and they want to be surrounded by the best people. People would trade some of that off to stick around their home countries, however if you go to California and talk to folks from e.g. NL or DE working on this stuff, they have a lot to say about innovation and working culture back home.

16 more replies

__alexs4d ago

Surely this just uses state funds to train a pipeline of people to be immediately poached by frontier labs?

yread4d ago

Great, now they will get expertise and then get hired by openai and move

mschuster915d ago

Kimi and Qwen come out of China, which means that their training material may be biased e.g. relating to Taiwan [1]. In addition, there is no way to determine what input went into the training, if it was properly licensed, if it was legal (e.g. not contaminated by CSAM), or how the human component of RLHF was sourced - in US models, for example, stories about exploitation like [2] have been floating for years.

Assuming us Europeans finally get our act together, I think it is better for our long-term future (and the ethical problems) if we manage to get a baseline of training input and data ourselves, from scratch, with everything being ethically sourced.

Oh and, while we're at it, the EU has 24 official languages plus a host of minority languages. Most LLMs focus on the English, German, French and Chinese languages, but everything else is... left behind at best. An European model with actual funding and proper data sources might be able to significantly reduce that.

[1] https://www.taiwannews.com.tw/news/6245677

[2] https://www.theguardian.com/technology/2024/apr/16/techscape...

altmanaltman5d ago

It really doesn't matter if the model sucks and doesn't perform well. Given the funding amount and their lofty ambitions, it seems very unlikely they will be able to pull it off properly.

Yeah China and US models have baises but so will any model. The biases do not get in the way of the product though. You don't open those models just to ask for what happened in Taianaman square or if Taiwan is a state. You dont ask ChatGPT to generate CASM. But they are very good at the tasks you actually expect from a LLM. If you fail at that, nobody will use your model no matter how "ethically sourced" a colonizer-based entity like Europe made it.

1 more reply

vintermann4d ago

The Chinese models are almost certainly taught to comply with "Chinese values" in the RLHF step, not from filtering the training data. There may be a few things which are too radioactive to be allowed even in the training material - but that's more likely to be things like child abuse images for a visual model, things non-Chinese values also have an issue with.

I'm pretty sure no county taking a stab at making their own model for sovereignty purposes will let "proper licensing" stand in their way.

gnerd005d ago

> Most LLMs focus on the English, German, French and Chinese languages, but everything else is... left behind at best.

that is not true, so please read before make an opinion. The French Mistral project shipped seven+ years ago with 140 languages for example.. language translation was the first LLM task from 2015

1 more reply

jampekka4d ago

> Most LLMs focus on the English, German, French and Chinese languages, but everything else is... left behind at best

Current frontier models (closed and open) are already really good at small languages too. I use them in Finnish sometimes, and the language is immaculate. They underestand even somewhat obscure dialects. Multilinguality seems to be a mostly solved problem.

KronisLV4d ago

This already exists https://eurollm.io/

How do people not know about it and keep making stuff from scratch?

1 more reply

dr_dshiv5d ago

There is something north of 8% OCR error rates.. that will hurt model quality!

siva75d ago

Uh, some would say it's easy to determine what input went into the training for kimi and qwen.. since they were caught stealing it from American labs. Some cultural cliches may never change.

4 more replies

TJSomething5d ago

If open frontier models start closing up and states start more export controls on AI services and hardware, it might be good to ensure the supply chain is there to reproduce the SotA, or even a couple generations behind it.

teekert4d ago

There is something to be said for this "most cheapest" approach, there is also something to be said for making models that are entirely ethically sourced:

1. Free of controversy like unlicensed training materials

2. Free of exploitative rlfh loops by people in low-wages countries

3. The leasons learned (and published) from going through the entire training process on "European" hardware: "AI factories" (the term for Slurm HPC/HTC systems with lots of heavy GPU nodes, heavily subsidized by our government [0])

1 and 2 are strong counter-LLM arguments at the moment, and hold back some groups of potential users. Another is energy/water use, so going for maximum green energy would be a nice boon as well. 3 is something I consider to be highly useful for our European identity and "way of the ninja" (for you Naruto fans out there).

[0 https://hpc-portal.eu/funding-opportunities]

dijksterhuis4d ago

one would also hope there'll be less pressure to "make line go up", i.e. not having to do attention-engineering via deliberate sycophancy to trap individuals into using it more and more and more and more and more and more.

but in general, yes, as someone who is vehemently anti-ai GPT-NL has piqued my interest specifically because of the ethical protections / measures they're talking about. question is whether they stick to it.

entropyneur4d ago

Sounds like calling those model "open source" did its wicked job. You can not take an open weight model and build a next-generation model using that as a foundation. Once those companies decide it's no longer in their interest to release new open weights everything you've created this way becomes a pile of rapidly deprecating legacy.

michaelscott4d ago

OP probably means using the existing open weights as a base for further homegrown development and research, not that the homegrown models are always updated based on whatever US or China are doing in the moment

6274675d ago

Do we know for sure how much national corpus of knowledge (like dutch) goes into these "global" models and how that affects "localized" model biases? What's wrong with specialized models?

thevinter5d ago

And what happens once the "solid baselines" become unavailable for a reason or the other?

zozbot2345d ago

You keep building on the last available version? Fine tuning is a whole lot cheaper, easier and more useful than pretraining a model from scratch. It's a complete no brainer.

1 more reply

ozim5d ago

Seems like you don’t understand.

You take current version and build on top of it. You have the weights.

You might not get some n+1 version at some point but the n version you will have will be still most likely much better than whatever you come up with burning good will money of people believing in „sovereignty”.

You are not getting ahead in this game by being „true to your local values” capital expenditure is insane in this game.

1 more reply

saidnooneever4d ago

TNO doesn't serve nor represent global interest and hence does not care about global progress. It exists to enhance knowledge on things within Dutch society primarily with some ripple effect outward to EU because they have interests within the EU.

Its purpose is not to become some kind of OpenAI or global foundation offering services/tools on that scale.

There is a lot of critisism on this project, not invalid, but mostly based in lack of understanding of what the goals are of the organization as well as the people building the thing.

The people building it, are well aware of how it will be less capable than other LLMs on a general reasoning aspect, not only due to having actually purchased _all_ licensed data that has been used as inputs. Not being a multi billion dollar corporation, this means having very little data and should be an obvious signal to observers that it has not the goal to outdo other models.

In my opinion (personal) its a project that has a learning and demonstration value that is not 'look how well our model performs against others', but still offers value.

khafra4d ago

Today, you keep seeing "sovereign" LMs that are subject to the sovereignty of some human-led state. Tomorrow, the "sovereign" LMs will be called that for a completely different reason.

Scarblac4d ago

Their legality is very questionable given all the likely copyright infringement going on, and a state can't really ignore that.

vintermann4d ago

States are the things which can ignore that, and I'm pretty sure US and China already do. No state is going to respect copyright if they think its future is at stake, and apparently even Netherlands thinks the future is at stake.

(Of course states can ignore copyright in a legally polite manner, such as asserting that training on all published material in the National Library is fair game)

enaaem4d ago

Same reasons why every country, or close allies, build their own tanks or space program. You want to keep some level of capability within your control. Compared to weapon programs, AI research is very cheap.

stared5d ago· 16 in thread

I feel that not only is Europe losing its independence to the US and China, but it does not even try to take part in the race.

Unlike the US, Europe has no California-level VCs. I don't expect hundreds of billions of Euros to be poured into long-shot projects.

Unlike China, Europe has neither cohesive public investment at the global level nor the drive to grow. Long-term investments have a lot of words, a lot of regulations, a lot of proxy goals, but there is neither a lot of money nor urgency. It was captured by this post: https://x.com/piotrsankowski/status/2065795919623438546

So yeah, both in economy and warfare, Europe dooms itself to be in the hands of the US, China, or a mix of both.

creesch5d ago

> Unlike the US, Europe has no California-level VCs.

Some would consider that a good thing. There is a lot to be said for VC in recent years not being beneficial for the economy, certainly on an individual level, other than "number go up".

stared5d ago

Sure.

At the same time, it made in many cases EU dependent on the US. A lot of governments are basically dependent on MS Office or Google Cloud.

With AI, it is even more strategic.

2 more replies

guywithahat5d ago

> There is a lot to be said for VC in recent years not being beneficial for the economy

What a wild statement, VC's are behind most of the growth in the US economy, and they directly drive up wages in tech. I'd be fascinated to hear a valid complaint of VC's that isn't just money envy

1 more reply

flanked-evergl5d ago

Some people consider it a good thing that communists boiled people's hands as torture. Some people consider it a good think that Iran massacred 10,000s of its own citizens. Some would consider it a good thing that Israel killed all Palestinians in Gaza.

1 more reply

ews5d ago

Europe decided to regulate the hell out of foreign AI instead of investing in their own systems. It's sad to see the European continent lost the race to create a decent startup ecosystem (no decent search engines, social networks, cloud, mobile OS) and now it seems to be hellbent in losing this battle.

joe_mamba5d ago

>It's sad to see the European continent lost the race to create a decent startup ecosystem

What's ironic and sad at the same time is that pre-2022 Russia's Yandex(domestic Russian variant of Google) was lightyears ahead of what EU, a significantly richer and more capable block, had. IIRC, their reverse image search was so good, they had to nerf it because people were using it to find the identity of people from photos.

Same for Israel, their tech sector is probably greater than the EU one combined

Absolutely shameful how the EU kept managing to snatch defeat from the jaws of victory over and over.

3 more replies

gonzalohm5d ago

You are saying that as if China or the US are completely isolated from the EU. We live in a globalized world whether you like it or not, and every supply chain spans multiple countries.

Arguably, staying out of the AI "race" is a good thing

stared5d ago

Military race isn't a good think either, but you don't want to be on the losing side.

TacticalCoder5d ago

> Unlike the US, Europe has no California-level VCs. I don't expect hundreds of billions of Euros to be poured into long-shot projects.

My ex-neighbor (when I was a teenager, living in Belgium) and very good friend really wanted to make it big. He became a chip engineer, moved to California, raised money for a first startup (it tanked) then raised money for a second startup. He made the world a better place (he created some very specific micro-inverters for solar panels) and made a $$$ exit.

The EU saw exactly zero of the wealth he created and he's never ever coming back to what he considers a failure of a continent.

That's the problem: many of the great minds with the mindset required to do great things already left the EU.

> So yeah, both in economy and warfare, Europe dooms itself to be in the hands of the US, China, or a mix of both.

And in energy (economy is energy and energy is economy, and China really understood that) the EU doomed itself to be in the hands of Russia.

We are a failure of sinking continent.

WarmWash5d ago

Europe is a great place to live if you just want to float through life.

The US is a great place to live if you have talent, want to work, and want to reap the rewards.

throw-the-towel5d ago

> economy is energy and energy is economy, and China really understood that

In former times the energy monopoly was called "The Power Company"; we intend to give this name an entirely new meaning."

– CEO Nwabudike Morgan, "The Centauri Monopoly"

eightysixfour5d ago

I'll play devil's advocate a little bit - I'm not sure it is losing its "independence" by not taking part in the race. It could very well be that it is gaining independence from tech and choosing a "second mover advantage" to decide how it gets deployed after seeing how it impacts everyone else. Let the US and China experiment on the bleeding edge (and their citizens feel the effect, both good and bad), and then be picky about how you use it.

I don't know if it is the right strategy but there's certainly a legitimate strategy in there.

sarjann5d ago

The problem is recursive self improvement creating a very difficult gap and the fact that power, compute has a lag from when you invest and when data centers come up.

You also can't just spin up a research team out of nowhere.

1 more reply

stared5d ago

Let’s autonomous Russian drones, and Europe is at mercy of two other empires, who capitalize on this opportunity.

surgical_fire5d ago

Europe is not a country.

Regulations are not even throughout each of the 27 member states. Each country is relatively small in the world stage.

Until EU progresses towards federalization, discussing this is a moot point.

input_sh5d ago

Serious question: what does any of that have to do with the submitted article? Where is the relevance to the topic at hand?

sublimefire5d ago· 14 in thread

It is crazy that anything Europe gets so much hate. IMO it is important to build models within the boundaries of smaller nations, using their own language. Research has to continue even if it is outside of US and China.

jampekka4d ago

I was somewhat excited about these "sovereign" open models in the beginning, but it became soon apparent that they're not gonna be anything but toys compared to SOTA.

The problem is that there are a lot, at least 30, of these small projects scattered around, funded for a few years as some ad-hoc temporary coalition of universities and businesses. Those simply cannot compete with businesses spending tens of billions on developing these. Especially when you have to bring a spoon to a gunfight restricting to "clean" data.

Multilinguality is essentially a solved problem, and restricting too much on one language with more limited resources is gonna make the model worse in that language too.

sublimefire4d ago

Anyone can move the needle. Saying that languages are solved is not accurate as well. You could raise different questions like maybe model grounded in a different language will make it more efficient in some tasks, maybe language structure matters for a multidimentional space, maybe that matters for the distillation, etc. It is all about the ideas and their exchange, not about the investment rounds and MAU.

Zababa4d ago

Even if multilinguality isn't solved, building a benchmark and then testing each model on it and posting the result may be a cheaper accelerator of competence in the language.

sinuhe694d ago

This platform is running from the US and frequently accessed by US people. What else would you expect?

andy12_4d ago

I'm from Spain and I also hate these projects with passion. Creating models that speak multiple languages is a solved problem. Having each European Nation train its own useless "sovereign model" in its own language is a total waste of time and resources when we could pool resources and give it a try to training SOTA models that speak in all European languages.

I'd rather have smaller european labs try to give it a go at distributed training. If multiple countries got together and said, "look, we tried training a distributed model that speaks in all of our local languages and that is comparable to 1-year-old Chinese open-source models", that, at least, I would find interesting.

1 more reply

nehal3m4d ago

Humanity

mvc4d ago

self-interested hate is it not?

Maybe I'm more attuned to this type of thing having grown up as a national of a smaller state living in the shadow of a bigger state but you constantly see actors from the bigger state belittling and condescending anything contributed socially or economically from the smaller state.

And I see this sort of dynamic here in this forum where Americans very frequently talk condescendingly like this about Europe generally and European tech especially (they did it to China too but China smartly ignored this self-interested nonsense and carried on anyway which is what Europeans should do).

It really grates on me and presumably many others. But it serves an agenda too of a lot of the founders and financiers that hang out here that have big fat customers in Europe they'd like to keep sweet and competitors they'd like to keep down.

transcriptase5d ago

It’s not that it gets hate so much as it’s akin to watching them make announcements that they’re going to make a European google/facebook/tiktok.

Sure… they can, except at the end of the day it’s a bit late, regulatory burden will make it comparatively useless, and because of that nobody will ever use it. It will be spending a bunch of taxpayer dollars for press releases.

The running joke is that when these “sovereign” EU models launch, they’re going to refuse to answer anything that might involve personal information such as Elon Musk’s birthday.

arrrg4d ago

At least with social networks the network effect is a powerful force. Foregrounding regulatory burden in that context is nonsensical. (That does not apply in the same way to models.)

data-ottawa5d ago

That’s on Wikipedia, it’s not PII, it’s also not going to be relevant to any meaningful IRL work.

I challenge the assumption you can do meaningful work in this field without blatant disregard for intellectual property.

The idea that it’s all down to training size is clearly incorrect, as every expert human learned their craft without nearly the sum total information of the internet. Clearly there are architectural wins to be found.

Besides that, why would everyone just be fine with Opus level AI at best, as that’s all the US is willing to export, and I doubt China will share beyond that.

Sovereign AI is more important than ever after Friday.

SiempreViernes4d ago

I guess if you are strict about it, making derogatory comments like yours is indeed not hate. But I'm sure you are aware that "getting hate" frequently used in a more extensive meaning online, especially in the context of replies to a post and I don't see the much point in insisting on the stricter definition here.

Lucasoato5d ago

I kinda agree, the best use of taxpayer money should be in reducing taxes to corporation that would like to compete in the market vs US and China, rather than making governments playing the game (since they very obviously can’t).

mholm5d ago

If a teenager on your street said he was going to spend $1,000 to customize his Honda Civic for his needs, you'd believe him. If he says he's going to build a brand new car, better than a Honda civic, for $10,000, you'd laugh and say good luck.

bigfudge4d ago

But if a teenager says he’s going to spend 50k going to university to study engineering you might support then.

I agree there is likely some hubris in this sort of announcement, but investing in European expertise and industrial base in this area is important.

dwa35925d ago· 12 in thread

I don't understand countries (especially governments) wanting to have their own models when there are already pretty solid open source (weights) models out there.

Countries should want control over _where_ the compute is happening rather than _what code_ is running.

What's wrong with a country hosting a Kimi, Qwen or GPT-Oss on their hardware for their government work purpose?

vrganj5d ago

An LLM is an encoding of a culture, a way of viewing the world.

They are not neutral technology, they are a direct representation of the training set that has been chosen and how they are aligned.

In many ways, they are ideology made code.

If we leave building them to the US and China, only their way of seeing things will be digitized.

I don't like the idea of that.

wolvoleo5d ago

Yes and also, US and Chinese models are censored in different ways. US models are way too prudish for personal use in Europe because they're afraid to piss off religious investors. Chinese models are too censored on history and current affairs, eg the tiananmen massacre never happened stuff like that.

1 more reply

jeroenhd4d ago

There's an absolutely massive cultural and behavioural bias in those models. Models will suggest things like "go to the hospital" for things that require GP appointments, "just drive three hours" while it's faster to go places by train, and so on. They will do it in anglicised Dutch (compound words split, English-like grammar structures) that's perfectly understandable, but the cultural bias is there if you know to look for it.

Furthermore, the expertise in designing and training these models is valuable as well. The existing models are good as a starting point in terms of learning from previous mistakes, but we should not just let a handful of American and Chinese people keep the knowledge and expertise.

One problem with this particular project, though, is that copyright has been enforced for Dutch LLM training before, and the AI industry cannot exist without massive scale piracy, the likes of which has never been seen before. A lot of Dutch training material exists in pirated books that AI companies in countries that do not care about copyright have access to, but are exempted from the training set here. The impact of enforcing copyright on an AI model will be quite interesting to see.

Achterlangs5d ago

It is not about the country but the language. Most llms have poor or no support for Dutch.

tgv5d ago

Idk which models you refer to, but I tested a bunch recently, and they performed well on Dutch. Only the smallest, such as qwen 3.6 27B, made up words and switched languages.

2 more replies

throw3108225d ago

I don't understand this. Even if that were true (and it isn't in my experience), a model that is trained on a Dutch corpus and arguably "knows Dutch well" but has the reasoning and comprehension abilities of a three year old is useless in any case. I'd rather use a model that can only speak English and put an automatic translator around it.

andy12_4d ago

To be fair. There is a security concern angle: even open-source models could be trained as sleeper agents that act adversarially (for example, adding backdoors) when used in specific national companies in specific settings. This is very difficult to detect or void, so if you want to be sure 100% that this isn't the case, you have to train your own model from scratch.

applfanboysbgon5d ago

Why should Dutch people be expected to make do with models 99% trained on American/Chinese cultural context and language?

vr464d ago

Maybe the Dutch really really want an LLM that tells them the truth as straight as possible no matter how harsh - that might be tricky

dwa35925d ago

Understood, but they could fine tune base models on their own cultural context and language. Why reinventing the wheel?

3 more replies

Muromec5d ago

Oh, it's all fine with cultural context here -- we don't even dub English language movies here because we are that cheap

SiempreViernes5d ago

Really? Because I'm pretty sure that at least every two days there's a active post with a top voted comment along the lines of "The EU isn't doing AI themselves, they are so hosed".

WarmWash5d ago· 10 in thread

If Europe is serious about getting home grown AI fast, three simple steps:

1. Huge tax incentives, let the companies get grossly wealthy while paying minimal taxes. Minimum 10 years with clauses protecting "retribution" taxes there after.

2. Tax incentives for the founders/shareholders, just like above.

3. Drop worker protections to a minimum, make it easy to fire people. You only want serious/dedicated employees anyway.

Within 2-3 years there will be at least a trillion dollars looking to get in.

Don't worry though if reading that made you mad. Its absolutely not going to happen. I can think of few things more antithetical to the European ethos than smart skilled people working 80-100hrs weeks with almost no vacation to gas their founders net worth by tens, hundreds, of billions.

dminik4d ago

Some additional points to consider:

1. Pay the workers in company scrip and relocate the workers to a company town. That way, all workers are fully dedicated to the company.

2. Start importing slaves from Africa again. It worked to build up massive wealth. Should do the trick for AI as well.

3. Abolish the 8 hour work day. No comment needed.

With these 3 simple tricks, you too can get 6-7 bazillion euro AI mammoths.

WarmWash4d ago

No, we don't need any slavery. The employees at these AI companies will almost certainly out earn most if not all other local white collar workers, while also getting top tier benefits. They can also quit at any time if they don't like it or don't think it is fair.

1 more reply

TalkingCodeMonk4d ago

This "greed is good, and should be rewarded" philosophy is one I see all too often on HN, and the entire reason why Americas political, regulatory, and business leadership have been overrun by the countries most criminally corrupt narcissists and psychopaths; why its democracy has collapsed.

When you reward the most selfish, corrupt, and antisocial behaviours with wealth and power, you're guaranteed to create a selfish, corrupt, and antisocial society. IMHO it's indicative of what I have dubbed Americas "mental illness epidemic"; specifically cluster B personality disorders [0] which are characterised by socially-destructive and self-destructive behaviours.

If that's the world you want for you and your loved ones, congratulations. You've earned it!

[0] https://wikipedia.org/wiki/Classification_of_personality_dis...

user439284d ago

If tax incentives to attract companies and at-will employment are already viewed as a destructive collapse of democracy, it's no wonder we are not getting anything done here in Europe.

1 more reply

WarmWash4d ago

No greed will be rewarded. If European consumers don't like the end AI models that are produced, they can forgo buying them and the investors can watch their fortress burn. For the EU taxpayer, nothing was gained and nothing was lost. All the state did was stand out of the way.

If the models are good though, they will have their sovereign AI and should be happy to pay for it instead of American or Chinese models. You may call it greed, but to me it just sounds fair.

1 more reply

YetAnotherNick4d ago

3 Times As Many Europeans Move to the US, than the Other Way Around.

[1]: https://mises.org/mises-wire/3-times-many-europeans-move-us-...

1 more reply

lpapez4d ago

Butcher worker protections and quality of life across all industries to (hypothethically) benefit a single one?

No thanks.

Why do you feel grinding insane hours would be beneficial to AI progress?

yanis_t4d ago

Exactly this. You can't have a competitive industry while at the same time heavily redistributing wealth to the point where people don't have any incentives at all.

Cthulhu_4d ago

They can be serious about home-grown AI without needing to become a libertarian capitalist hellscape. I prefer happiness, safety and privacy over competing with the US / China.

WarmWash4d ago

Sure, but good luck finding serious top tier AI researchers who are willing to work for $80k/yr when the US is offering them upwards of $1M/yr.

matheusmoreira5d ago· 7 in thread

So good to see these developments. Every country should do this. I'd even say every person should gave their own personalized AI running on their own computers. If only the costs involved were not so astronomical.

mediaman5d ago

Why? That doesn't make any sense.

The government would be far better off figuring out how to take commodity models and applying them to government functions where they can, with deterministic scaffolding and guardrails, to make government more efficient, optionally using RL on traces from their use to improve their performance.

Imagine taking models and fine-tuning them / doing RL rollouts to help automate permit application approvals, as applied specifically to Dutch permit processes. That would be a real help to Dutch businesses!

That type of applied AI is more interesting and effective now than just trying to make another foundational model that isn't going to work well or do anything of economic value.

edg50004d ago

Another thing they could do is try to attack the the suppply chain issues. Try to form an alliance to block RAM deals or something, or to get fabs on EU soil, making HBM for the people. We have some bargaining chips, especially when banding together with a few large EU states. Not as EU, just a few specic countries. No bureaucracy, just elite trade diplimacy. Probably best done in secret so the big labs don't catch wind of this. Any NL/DE/UK/FR/CH/PL/IT govt people reading this?

matheusmoreira5d ago

> Why?

Because then the USA can't just turn it off.

nathanielsimard5d ago

I think it will be cost effective at some point. Computers were limited to research institutes before the personal computer arrived.

matheusmoreira5d ago

I hope you're right. I really don't want a future where only corporations and governments have computers.

jstummbillig4d ago

"Champions of a European AI model should ask themselves if a European effort would be more effective than Meta, which this year will spend more on chips ($125 billion) than Germany spends on defense ($114 billion) and offer salaries of over $100 million to attract the best researchers, and is still failing to catch up. Elon Musk tried and failed to build a good AI model."

https://www.siliconcontinent.com/p/nineteen-thoughts-on-ai-a...

14u2c5d ago

Nvidia will certainly be pleased.

siva75d ago· 4 in thread

> GPT‑NL is developed within the Netherlands and Europe. This gives us full control over the model, the data and the choices we make. We avoid dependency on non‑European providers and invest in a sustainable AI ecosystem aligned with our laws, values and societal goals.

I love it! So this is our answer to America and China denying foreigners access to their frontier models.. a massive 13,5M€ founding to develop souvereign european ai, trained exclusively on legally obtained documents and highest moral standards as defined in EU AI Act.

jbverschoor5d ago

NL could simply say: no more ASML machines, and no more ASM wafers.

rmccue5d ago

ASML’s EUV technology is partially based on US research and so Congress has a degree of control over it, so it’s not that simple: https://web.archive.org/web/20230116222847/https://www.nytim...

2 more replies

asdfasgasdgasdg5d ago

Any move like this would be simply by murdering the golden goose. ASMLs stuff is good no doubt but you don’t want to give the world the incentive to develop an alternative. If it was done once it can be done again and once it has been done again say good bye to all those returns on monopoly.

1 more reply

siva75d ago

You don't wanna find out how fast american troops would land there..

1 more reply

bmenrigh4d ago· 3 in thread

I think at this point what the Netherlands, and any other country that wants a good model in their language should do, is gather up every piece of text ever written in that language and license it to the big AI labs/companies for training. I'm sure there are vast libraries of books and other text that haven't been digitized and aren't a priority for the big labs.

whateverboat4d ago

I think they should just make a national security thing and gather every piece of text in every language.

tantalor4d ago

Yeah except replace "license it to the companies for training" with "pay the companies to train on it"

bmenrigh4d ago

Oh I didn’t mean at all charging them. I mean licensing in the sense of granting rights for the purpose of training. Probably most labs would be fine adding the language to the training for free as long as the dataset quality is high and it improves the results. But yes, pay them if that’s what it takes for them to use it.

HelloUsername5d ago· 3 in thread

Previously posted on 02-dec-2023 https://news.ycombinator.com/item?id=38497495 3 comments

ronsor5d ago

Two and a half years and still not complete? That's ridiculous.

pedromlsreis5d ago

AMALIA, from Portugal, going the same path!

https://en.wikipedia.org/wiki/Am%C3%A1lia_(LLM)

SiempreViernes4d ago

They are starting to deploy to customers now, not sure if that counts as "complete" or not. The big innovation here is that they are doing it all legally.

rollulus5d ago· 2 in thread

Interesting that this got posted now: the project is receiving increasingly more skepticism lately in the Dutch tech scene [0], and I think that’s fully justified.

[0]: https://www.quotenet.nl/zakelijk/a71588202/techondernemers-m...

embedding-shape5d ago

What is the exact skepticism? The only thing I could get from that was from some "tech entrepreneur":

> GPT-NL was never built to compete with Claude or ChatGPT. It was trained exclusively on licensed data, and is intended more for governments and companies where privacy and compliance matter more than raw performance.”

That's it? That it didn't aim to compete with SOTA models? Maybe this is something you have to start with something, then ramp up, rather do what only a select few labs been able to do, start with really big models. Especially if you're resource constrained, which since this is a government project, I really hope for the sake of the tax payers it was.

barrenko5d ago

I mean if you are wasting funds kind of knowing it's nowhere near remote competitive, then it's kind of a fraud.

3 more replies

jansenmac5d ago· 2 in thread

This is not an open source model. In that sense I think the sovereign claim is a bit strange. It's the data providers that determine access to the model.

frangonf5d ago

So it's a model that's sovereign as in sovereign kingdom of the Netherlands vs sovereign for the people's?

embedding-shape5d ago

"sovereign" the marketing term basically means "in-house" now, where "house" depends on who says it.

jurschreuder5d ago· 2 in thread

What are they going to train with 13.5M really? We're a tiny company in Amsterdam in Holland and we've got "only 64x B300 to train on" so we could never make an LLM I thought, since we've got only 4M in compute.

And they're going to train an LLM with all kinds of extra difficulties compared to OpenAI for just 13.5M?

The very first Llama was 16M for one training.

LaurensBER5d ago

This is too little, too late. Europe really need to start focussing.

All these tiny niche models are perhaps fun as an academic exercise or great for the researchers resume but I highly doubt that they'll add any value or will be used for anything serious.

Even if this becomes a somewhat decent model with a fantastic understanding of "gezellig", "kring verjaardag" or "pannenkoeken", how many people will interact with it before the limits of it will drive them back to a frontier model?

Even if the purpose of this is government & other regulated industries, do we really want our government to use a poor model? Either do it right or don't do it at all.

numeri5d ago

Prices for training have dropped immensely in terms of research required, code efficiency, algorithmic/sample efficiency, and possibly also hardware (I'm not qualified to say without looking it FLOPS/dollar, or even to be certain that's the right metric here).

entropyneur4d ago· 2 in thread

How about fixing whatever the hell prevents competitive private LLM vendors from appearing in Europe?

rimliu4d ago

you mean getting rid of checks and balances, environment protection laws, anti-corruption laws?

redrove4d ago

They’re still debating that, they’ll get back to us soon I’m sure.

wrs5d ago· 1 in thread

They’re building a competitive-quality model, from scratch, with fair compensation to content owners, for €13.5 million? Something’s wrong with this picture.

Muromec5d ago

Being cheap is on brand for inhabitants of the sea floor. Nothing is wrong

Aeolun5d ago· 1 in thread

A total of €13.5M has been allocated to the project.

I guess we’re going for GPT2 level capability?

Marciplan4d ago

apparently their aim is GPT3.5

thatguymike5d ago· 1 in thread

> A total of €13.5 million has been allocated to the project.

> This public investment underlines the importance of an independent, trustworthy and future‑proof Dutch language model.

It does, but not in the way you think it does.

thepasch5d ago

> It does, but not in the way you think it does.

They're training a model, not funding a startup. €13.5 million is plenty to pre- and post-train a decent model.

Marciplan5d ago· 1 in thread

Supposedly this model also aims to treat publishers of all sizes well. Looking forward to its launch soon :)

adalacelove5d ago

Maybe it's time to acknowledge that current copyright laws do more harm than good and put another framework in place.

rdwrrr4d ago· 1 in thread

Burning tax money. I dare to bet this will never lead anywhere.

holistio4d ago

"Burning" €13.5M of public funds. That's 4000 times less than the Cursor deal from a couple days ago.

I was actually surprised by how little it was.

mvanbaak5d ago· 1 in thread

> Excluding harmful content

#define(HARMFUL)

[edit] Downvoters please tell me what the problem is with specifying this?

jermaustin15d ago

I didn't down vote you, but you aren't really adding anything to the conversation. This type of pithy comment might be fun on Reddit, but at HN, we try to provide more constructive, and information rich, comments.

Zababa4d ago

I feel like building datacenters and filling them with chips may be more valuable than creating sovereign models. xAI I think makes more money renting datacenters to Anthropic than with the models they trained, and they could pivot thanks to their datacenters. By making regulations easier than in the US, this could bring some computing power to Europe, which then can be used to train sovereign models, or rented to big AI labs.

Also, when training models, you create talent that then could go to other countries (brain drain). Restricting that brain drain without imposing authoritarian restrictions on the movements of people seems hard, so it seems hard to keep talent as a competitive advantage. If instead the competitive advantage is datacenters with chips, power capacity building, fast path to building datacenters, I think they are easier to retain while preserving the rights of everyone involved.

simianwords5d ago

I really think countries should build a sovereign _ecosystem_ and sovereign models are an excuse to achieve it.

An ecosystem is the tribal knowledge, revolving door of talent, known processes etc.

If the end goal is to make a half assed Dutch speaking model, I think it won’t cut it. I don’t see anyone using it over Gemma 4b that runs on my laptop.

An ecosystem is more durable and has desirable second order effects.

alper4d ago

Europe should have a sovereign model on its content and languages that is trained with renewable energy and published as open source.

This looks like a good step in that direction.

wolvoleo5d ago

We already had GEITje but it was banned by the courts. Of course it can still be found because the entire internet is not subject to Dutch law. But it did manage to stop development :'(

sarjann5d ago

I wonder with these stories. Why are there so many individual country efforts? We know the scale needed with scaling laws / capital / energy. Most of these countries alone can barely compete (even large groups of them would struggle.

Why don't they work together on it? Companies like Airbus have already been able to do that with aircraft.

gnegggh5d ago

I'm making a Dutch dictionary and would be interested to see how this model would fair in evals vs non specialized ones. I've tested a variety of models for https://hetnederlands.com content and differences can be big

stared5d ago

Is it a proposal or a model? And if it is a model, how fies it fare on benchmarks?

Dwedit5d ago

What really matters is the sovereign capability to finetune the LLM models. Any model could be vetted and tested, but you need finetuning/lora training to prevent the model from being outdated.

mvdh13044d ago

overall, the revenue sharing model is (IMO) more interesting than the fact that it is dutch. Usage of data, and sharing it with the providers of this data, is an inherent part of the creation of these models that is not discussed as much as it should be

jgbuddy4d ago

I fear sovereignty is not a adoption-driving feature

jdw645d ago

Honestly, I used to think the 'sovereign model' was a waste of money. But recently, with the US logic of restricting model exports, I've come to think that if things go south, they could even cut off allied nations. So now the sovereign model seems reasonable to me. That, in turn, means US influence is deteriorating. And that probably isn't such great news for American businesses.

lejeanvaljean4d ago

Better work on something at Europe level

debarshri5d ago

So cute.

dr_dshiv5d ago

How do you use it?

agrijakhetarpal4d ago

"sOvErEiGn"

yanis_t4d ago

> A total of €13.5 million has been allocated to the project.

This is not even funny. If you want a competitive AI industry, you need to invest much more heavily in infrastructure first, building models second.

1 more reply

j / k navigate · click thread line to collapse

301 comments

144 comments · 35 top-level

armcat5d ago· 26 in thread

appplication4d ago

bradhe4d ago

Got it, so every country should focus on having a mediocre-at-best AI strategy by refusing to work together? Surely this will create a better future instead of pooling resources.

This vaguely-nationalist world view around tech that’s emerging in Europe is dangerous, man.

16 more replies

__alexs4d ago

Surely this just uses state funds to train a pipeline of people to be immediately poached by frontier labs?

yread4d ago

Great, now they will get expertise and then get hired by openai and move

mschuster915d ago

[1] https://www.taiwannews.com.tw/news/6245677

[2] https://www.theguardian.com/technology/2024/apr/16/techscape...

altmanaltman5d ago

It really doesn't matter if the model sucks and doesn't perform well. Given the funding amount and their lofty ambitions, it seems very unlikely they will be able to pull it off properly.

1 more reply

vintermann4d ago

I'm pretty sure no county taking a stab at making their own model for sovereignty purposes will let "proper licensing" stand in their way.

gnerd005d ago

> Most LLMs focus on the English, German, French and Chinese languages, but everything else is... left behind at best.

that is not true, so please read before make an opinion. The French Mistral project shipped seven+ years ago with 140 languages for example.. language translation was the first LLM task from 2015

1 more reply

jampekka4d ago

> Most LLMs focus on the English, German, French and Chinese languages, but everything else is... left behind at best

KronisLV4d ago

This already exists https://eurollm.io/

How do people not know about it and keep making stuff from scratch?

1 more reply

dr_dshiv5d ago

There is something north of 8% OCR error rates.. that will hurt model quality!

siva75d ago

Uh, some would say it's easy to determine what input went into the training for kimi and qwen.. since they were caught stealing it from American labs. Some cultural cliches may never change.

4 more replies

TJSomething5d ago

teekert4d ago

There is something to be said for this "most cheapest" approach, there is also something to be said for making models that are entirely ethically sourced:

1. Free of controversy like unlicensed training materials

2. Free of exploitative rlfh loops by people in low-wages countries

[0 https://hpc-portal.eu/funding-opportunities]

dijksterhuis4d ago

entropyneur4d ago

michaelscott4d ago

6274675d ago

Do we know for sure how much national corpus of knowledge (like dutch) goes into these "global" models and how that affects "localized" model biases? What's wrong with specialized models?

thevinter5d ago

And what happens once the "solid baselines" become unavailable for a reason or the other?

zozbot2345d ago

You keep building on the last available version? Fine tuning is a whole lot cheaper, easier and more useful than pretraining a model from scratch. It's a complete no brainer.

1 more reply

ozim5d ago

Seems like you don’t understand.

You take current version and build on top of it. You have the weights.

You are not getting ahead in this game by being „true to your local values” capital expenditure is insane in this game.

1 more reply

saidnooneever4d ago

Its purpose is not to become some kind of OpenAI or global foundation offering services/tools on that scale.

There is a lot of critisism on this project, not invalid, but mostly based in lack of understanding of what the goals are of the organization as well as the people building the thing.

In my opinion (personal) its a project that has a learning and demonstration value that is not 'look how well our model performs against others', but still offers value.

khafra4d ago

Today, you keep seeing "sovereign" LMs that are subject to the sovereignty of some human-led state. Tomorrow, the "sovereign" LMs will be called that for a completely different reason.

Scarblac4d ago

Their legality is very questionable given all the likely copyright infringement going on, and a state can't really ignore that.

vintermann4d ago

(Of course states can ignore copyright in a legally polite manner, such as asserting that training on all published material in the National Library is fair game)

enaaem4d ago

stared5d ago· 16 in thread

I feel that not only is Europe losing its independence to the US and China, but it does not even try to take part in the race.

Unlike the US, Europe has no California-level VCs. I don't expect hundreds of billions of Euros to be poured into long-shot projects.

So yeah, both in economy and warfare, Europe dooms itself to be in the hands of the US, China, or a mix of both.

creesch5d ago

> Unlike the US, Europe has no California-level VCs.

Some would consider that a good thing. There is a lot to be said for VC in recent years not being beneficial for the economy, certainly on an individual level, other than "number go up".

stared5d ago

Sure.

At the same time, it made in many cases EU dependent on the US. A lot of governments are basically dependent on MS Office or Google Cloud.

With AI, it is even more strategic.

2 more replies

guywithahat5d ago

> There is a lot to be said for VC in recent years not being beneficial for the economy

What a wild statement, VC's are behind most of the growth in the US economy, and they directly drive up wages in tech. I'd be fascinated to hear a valid complaint of VC's that isn't just money envy

1 more reply

flanked-evergl5d ago

1 more reply

ews5d ago

joe_mamba5d ago

>It's sad to see the European continent lost the race to create a decent startup ecosystem

Same for Israel, their tech sector is probably greater than the EU one combined

Absolutely shameful how the EU kept managing to snatch defeat from the jaws of victory over and over.

3 more replies

gonzalohm5d ago

You are saying that as if China or the US are completely isolated from the EU. We live in a globalized world whether you like it or not, and every supply chain spans multiple countries.

Arguably, staying out of the AI "race" is a good thing

stared5d ago

Military race isn't a good think either, but you don't want to be on the losing side.

TacticalCoder5d ago

> Unlike the US, Europe has no California-level VCs. I don't expect hundreds of billions of Euros to be poured into long-shot projects.

The EU saw exactly zero of the wealth he created and he's never ever coming back to what he considers a failure of a continent.

That's the problem: many of the great minds with the mindset required to do great things already left the EU.

> So yeah, both in economy and warfare, Europe dooms itself to be in the hands of the US, China, or a mix of both.

And in energy (economy is energy and energy is economy, and China really understood that) the EU doomed itself to be in the hands of Russia.

We are a failure of sinking continent.

WarmWash5d ago

Europe is a great place to live if you just want to float through life.

The US is a great place to live if you have talent, want to work, and want to reap the rewards.

throw-the-towel5d ago

> economy is energy and energy is economy, and China really understood that

In former times the energy monopoly was called "The Power Company"; we intend to give this name an entirely new meaning."

– CEO Nwabudike Morgan, "The Centauri Monopoly"

eightysixfour5d ago

I don't know if it is the right strategy but there's certainly a legitimate strategy in there.

sarjann5d ago

The problem is recursive self improvement creating a very difficult gap and the fact that power, compute has a lag from when you invest and when data centers come up.

You also can't just spin up a research team out of nowhere.

1 more reply

stared5d ago

Let’s autonomous Russian drones, and Europe is at mercy of two other empires, who capitalize on this opportunity.

surgical_fire5d ago

Europe is not a country.

Regulations are not even throughout each of the 27 member states. Each country is relatively small in the world stage.

Until EU progresses towards federalization, discussing this is a moot point.

input_sh5d ago

Serious question: what does any of that have to do with the submitted article? Where is the relevance to the topic at hand?

sublimefire5d ago· 14 in thread

jampekka4d ago

I was somewhat excited about these "sovereign" open models in the beginning, but it became soon apparent that they're not gonna be anything but toys compared to SOTA.

Multilinguality is essentially a solved problem, and restricting too much on one language with more limited resources is gonna make the model worse in that language too.

sublimefire4d ago

Zababa4d ago

Even if multilinguality isn't solved, building a benchmark and then testing each model on it and posting the result may be a cheaper accelerator of competence in the language.

sinuhe694d ago

This platform is running from the US and frequently accessed by US people. What else would you expect?

andy12_4d ago

1 more reply

nehal3m4d ago

Humanity

mvc4d ago

self-interested hate is it not?

transcriptase5d ago

It’s not that it gets hate so much as it’s akin to watching them make announcements that they’re going to make a European google/facebook/tiktok.

The running joke is that when these “sovereign” EU models launch, they’re going to refuse to answer anything that might involve personal information such as Elon Musk’s birthday.

arrrg4d ago

At least with social networks the network effect is a powerful force. Foregrounding regulatory burden in that context is nonsensical. (That does not apply in the same way to models.)

data-ottawa5d ago

That’s on Wikipedia, it’s not PII, it’s also not going to be relevant to any meaningful IRL work.

I challenge the assumption you can do meaningful work in this field without blatant disregard for intellectual property.

Besides that, why would everyone just be fine with Opus level AI at best, as that’s all the US is willing to export, and I doubt China will share beyond that.

Sovereign AI is more important than ever after Friday.

SiempreViernes4d ago

Lucasoato5d ago

mholm5d ago

bigfudge4d ago

But if a teenager says he’s going to spend 50k going to university to study engineering you might support then.

I agree there is likely some hubris in this sort of announcement, but investing in European expertise and industrial base in this area is important.

dwa35925d ago· 12 in thread

I don't understand countries (especially governments) wanting to have their own models when there are already pretty solid open source (weights) models out there.

Countries should want control over _where_ the compute is happening rather than _what code_ is running.

What's wrong with a country hosting a Kimi, Qwen or GPT-Oss on their hardware for their government work purpose?

vrganj5d ago

An LLM is an encoding of a culture, a way of viewing the world.

They are not neutral technology, they are a direct representation of the training set that has been chosen and how they are aligned.

In many ways, they are ideology made code.

If we leave building them to the US and China, only their way of seeing things will be digitized.

I don't like the idea of that.

wolvoleo5d ago

1 more reply

jeroenhd4d ago

Achterlangs5d ago

It is not about the country but the language. Most llms have poor or no support for Dutch.

tgv5d ago

Idk which models you refer to, but I tested a bunch recently, and they performed well on Dutch. Only the smallest, such as qwen 3.6 27B, made up words and switched languages.

2 more replies

throw3108225d ago

andy12_4d ago

applfanboysbgon5d ago

Why should Dutch people be expected to make do with models 99% trained on American/Chinese cultural context and language?

vr464d ago

Maybe the Dutch really really want an LLM that tells them the truth as straight as possible no matter how harsh - that might be tricky

dwa35925d ago

Understood, but they could fine tune base models on their own cultural context and language. Why reinventing the wheel?

3 more replies

Muromec5d ago

Oh, it's all fine with cultural context here -- we don't even dub English language movies here because we are that cheap

SiempreViernes5d ago

Really? Because I'm pretty sure that at least every two days there's a active post with a top voted comment along the lines of "The EU isn't doing AI themselves, they are so hosed".

WarmWash5d ago· 10 in thread

If Europe is serious about getting home grown AI fast, three simple steps:

1. Huge tax incentives, let the companies get grossly wealthy while paying minimal taxes. Minimum 10 years with clauses protecting "retribution" taxes there after.

2. Tax incentives for the founders/shareholders, just like above.

3. Drop worker protections to a minimum, make it easy to fire people. You only want serious/dedicated employees anyway.

Within 2-3 years there will be at least a trillion dollars looking to get in.

dminik4d ago

Some additional points to consider:

1. Pay the workers in company scrip and relocate the workers to a company town. That way, all workers are fully dedicated to the company.

2. Start importing slaves from Africa again. It worked to build up massive wealth. Should do the trick for AI as well.

3. Abolish the 8 hour work day. No comment needed.

With these 3 simple tricks, you too can get 6-7 bazillion euro AI mammoths.

WarmWash4d ago

1 more reply

TalkingCodeMonk4d ago

If that's the world you want for you and your loved ones, congratulations. You've earned it!

[0] https://wikipedia.org/wiki/Classification_of_personality_dis...

user439284d ago

If tax incentives to attract companies and at-will employment are already viewed as a destructive collapse of democracy, it's no wonder we are not getting anything done here in Europe.

1 more reply

WarmWash4d ago

If the models are good though, they will have their sovereign AI and should be happy to pay for it instead of American or Chinese models. You may call it greed, but to me it just sounds fair.

1 more reply

YetAnotherNick4d ago

3 Times As Many Europeans Move to the US, than the Other Way Around.

[1]: https://mises.org/mises-wire/3-times-many-europeans-move-us-...

1 more reply

lpapez4d ago

Butcher worker protections and quality of life across all industries to (hypothethically) benefit a single one?

No thanks.

Why do you feel grinding insane hours would be beneficial to AI progress?

yanis_t4d ago

Exactly this. You can't have a competitive industry while at the same time heavily redistributing wealth to the point where people don't have any incentives at all.

Cthulhu_4d ago

They can be serious about home-grown AI without needing to become a libertarian capitalist hellscape. I prefer happiness, safety and privacy over competing with the US / China.

WarmWash4d ago

Sure, but good luck finding serious top tier AI researchers who are willing to work for $80k/yr when the US is offering them upwards of $1M/yr.

matheusmoreira5d ago· 7 in thread

mediaman5d ago

Why? That doesn't make any sense.

That type of applied AI is more interesting and effective now than just trying to make another foundational model that isn't going to work well or do anything of economic value.

edg50004d ago

matheusmoreira5d ago

> Why?

Because then the USA can't just turn it off.

nathanielsimard5d ago

I think it will be cost effective at some point. Computers were limited to research institutes before the personal computer arrived.

matheusmoreira5d ago

I hope you're right. I really don't want a future where only corporations and governments have computers.

jstummbillig4d ago

https://www.siliconcontinent.com/p/nineteen-thoughts-on-ai-a...

14u2c5d ago

Nvidia will certainly be pleased.

siva75d ago· 4 in thread

jbverschoor5d ago

NL could simply say: no more ASML machines, and no more ASM wafers.

rmccue5d ago

ASML’s EUV technology is partially based on US research and so Congress has a degree of control over it, so it’s not that simple: https://web.archive.org/web/20230116222847/https://www.nytim...

2 more replies

asdfasgasdgasdg5d ago

1 more reply

siva75d ago

You don't wanna find out how fast american troops would land there..

1 more reply

bmenrigh4d ago· 3 in thread

whateverboat4d ago

I think they should just make a national security thing and gather every piece of text in every language.

tantalor4d ago

Yeah except replace "license it to the companies for training" with "pay the companies to train on it"

bmenrigh4d ago

HelloUsername5d ago· 3 in thread

Previously posted on 02-dec-2023 https://news.ycombinator.com/item?id=38497495 3 comments

ronsor5d ago

Two and a half years and still not complete? That's ridiculous.

pedromlsreis5d ago

AMALIA, from Portugal, going the same path!

https://en.wikipedia.org/wiki/Am%C3%A1lia_(LLM)

SiempreViernes4d ago

They are starting to deploy to customers now, not sure if that counts as "complete" or not. The big innovation here is that they are doing it all legally.

rollulus5d ago· 2 in thread

Interesting that this got posted now: the project is receiving increasingly more skepticism lately in the Dutch tech scene [0], and I think that’s fully justified.

[0]: https://www.quotenet.nl/zakelijk/a71588202/techondernemers-m...

embedding-shape5d ago

What is the exact skepticism? The only thing I could get from that was from some "tech entrepreneur":

barrenko5d ago

I mean if you are wasting funds kind of knowing it's nowhere near remote competitive, then it's kind of a fraud.

3 more replies

jansenmac5d ago· 2 in thread

This is not an open source model. In that sense I think the sovereign claim is a bit strange. It's the data providers that determine access to the model.

frangonf5d ago

So it's a model that's sovereign as in sovereign kingdom of the Netherlands vs sovereign for the people's?

embedding-shape5d ago

"sovereign" the marketing term basically means "in-house" now, where "house" depends on who says it.

jurschreuder5d ago· 2 in thread

And they're going to train an LLM with all kinds of extra difficulties compared to OpenAI for just 13.5M?

The very first Llama was 16M for one training.

LaurensBER5d ago

This is too little, too late. Europe really need to start focussing.

All these tiny niche models are perhaps fun as an academic exercise or great for the researchers resume but I highly doubt that they'll add any value or will be used for anything serious.

Even if the purpose of this is government & other regulated industries, do we really want our government to use a poor model? Either do it right or don't do it at all.

numeri5d ago

entropyneur4d ago· 2 in thread

How about fixing whatever the hell prevents competitive private LLM vendors from appearing in Europe?

rimliu4d ago

you mean getting rid of checks and balances, environment protection laws, anti-corruption laws?

redrove4d ago

They’re still debating that, they’ll get back to us soon I’m sure.

wrs5d ago· 1 in thread

They’re building a competitive-quality model, from scratch, with fair compensation to content owners, for €13.5 million? Something’s wrong with this picture.

Muromec5d ago

Being cheap is on brand for inhabitants of the sea floor. Nothing is wrong

Aeolun5d ago· 1 in thread

A total of €13.5M has been allocated to the project.

I guess we’re going for GPT2 level capability?

Marciplan4d ago

apparently their aim is GPT3.5

thatguymike5d ago· 1 in thread

> A total of €13.5 million has been allocated to the project.

> This public investment underlines the importance of an independent, trustworthy and future‑proof Dutch language model.

It does, but not in the way you think it does.

thepasch5d ago

> It does, but not in the way you think it does.

They're training a model, not funding a startup. €13.5 million is plenty to pre- and post-train a decent model.

Marciplan5d ago· 1 in thread

Supposedly this model also aims to treat publishers of all sizes well. Looking forward to its launch soon :)

adalacelove5d ago

Maybe it's time to acknowledge that current copyright laws do more harm than good and put another framework in place.

rdwrrr4d ago· 1 in thread

Burning tax money. I dare to bet this will never lead anywhere.

holistio4d ago

"Burning" €13.5M of public funds. That's 4000 times less than the Cursor deal from a couple days ago.

I was actually surprised by how little it was.

mvanbaak5d ago· 1 in thread

> Excluding harmful content

#define(HARMFUL)

[edit] Downvoters please tell me what the problem is with specifying this?

jermaustin15d ago

Zababa4d ago

simianwords5d ago

I really think countries should build a sovereign _ecosystem_ and sovereign models are an excuse to achieve it.

An ecosystem is the tribal knowledge, revolving door of talent, known processes etc.

If the end goal is to make a half assed Dutch speaking model, I think it won’t cut it. I don’t see anyone using it over Gemma 4b that runs on my laptop.

An ecosystem is more durable and has desirable second order effects.

alper4d ago

Europe should have a sovereign model on its content and languages that is trained with renewable energy and published as open source.

This looks like a good step in that direction.

wolvoleo5d ago

We already had GEITje but it was banned by the courts. Of course it can still be found because the entire internet is not subject to Dutch law. But it did manage to stop development :'(

sarjann5d ago

Why don't they work together on it? Companies like Airbus have already been able to do that with aircraft.

gnegggh5d ago

stared5d ago

Is it a proposal or a model? And if it is a model, how fies it fare on benchmarks?

Dwedit5d ago

What really matters is the sovereign capability to finetune the LLM models. Any model could be vetted and tested, but you need finetuning/lora training to prevent the model from being outdated.

mvdh13044d ago

jgbuddy4d ago

I fear sovereignty is not a adoption-driving feature

jdw645d ago

lejeanvaljean4d ago

Better work on something at Europe level

debarshri5d ago

So cute.

dr_dshiv5d ago

How do you use it?

agrijakhetarpal4d ago

"sOvErEiGn"

yanis_t4d ago

> A total of €13.5 million has been allocated to the project.

This is not even funny. If you want a competitive AI industry, you need to invest much more heavily in infrastructure first, building models second.

1 more reply

j / k navigate · click thread line to collapse