> an American teenager – who does not speak Scots, the language of Robert Burns – has been revealed as responsible for almost half of the entries on the Scots language version of Wikipedia
It wasn't malicious either, it was someone who started editing Wikipedia at 12 and naively failed to recognise the damage they were doing.
The solution is to differentiate and tag inputs and outputs, such that outputs can't be fed as inputs recursively. Funnily enough, wikipedia's sourcing policy does this perfectly, not only are sources the input and page content is just an output, but page content is a tertiary source, and sources by policy should be secondary (and sometimes primary) sources, so the system is even protected against cross tertiary source pollution (say an encyclopedia feeding off wikipedia and viceversa).
It is only when articles posing as secondary sources fail to cite wikipedia that a recursive quality loss can occur, see [[citogenesis]]
I've seen a college professor cite wikipedia in support of a false claim. On investigation, the text in wikipedia was cited to an earlier blog post by that same professor.
I wasn't convinced.
The people who want it to be considered as a language for political reasons cannot be bothered to translate Wikipedia themselves. They read and edit English Wikipedia and understand it perfectly.
The Glaswegian taxi driver may not consider themself to be speaking a different language but, if speaking to another local and leaving aside pronunciation, they’d use words, phrases and even grammar that’s incomprehensible to someone with no experience with Scots.
I’m a “posh Scot”, raised middle class in Edinburgh so my accent is minimal and thickens up or softens depending on who I’m speaking to. Even for me, there’s a lot of words, phrases and ways of speaking I’ve had to adjust to be consistently understood by American coworkers when over the last 10+ years.
This is supremely ignorant. Scots is its own language. It's a 'brother' or 'sister' of English, with both English and Scots being descendants of West Germanic languages.
The fact that many (all?) Scots speakers also speak English doesn't mean Scots not a language on its own.
You could make your exact same arguments that Irish isn't a language because you could ask a Cork taxi driver whether he knows any English.
Scots = a language with some of the same ancestors as English.
Scottish English = a dialect (and accent) of English
Scots Gaelic = another language, with the same ancestors as Irish and Manx.
There exists dialects that are less mutually intelligible than apparently distinct languages, and the designation of each as "dialect" or "language" is political. Language is often a proxy for culture, and political actors may wish to suppress or boost the legitimacy of such cultural expression depending on their aims.
That's the core issue, it's not those who use AI translator or worst like Google translate. If there isn't any Greenlander to contribute to their Wikipedia, they don't deserve to have one and instead must rely on other languages.
The difference between an empty Wikipedia and one filled with translated articles that contains error isn't much. They should instead close that version of Wikipedia until there are enough volunteers.
Wikipedia is built around the basic principle that if you just let everyone contribute, most contributions will be helpful and you can just revert the bad ones after the fact. This works for large communities that easily outnumber the global supply of fools, but below a certain size threshold, the sign flips and the average edit makes that version of Wikipedia worse rather than better.
So smaller communities probably need to flip the operating principle of Wikipedia on its head and limit new users to only creating drafts, on the assumption that most will be useless, and an admin can accept the good ones after the fact.
I'm not sure whether Wikipedia already has the software features necessary to operate it in such a closed-by-default manner.
https://www.exclassics.com/espoke/espkpdf.pdf
Wikipedia is prominent. Wikipedia articles in a language without much representation become prime examples of that language to those who read them.
By what unholy pact have you been beknighted as the bestower of wikis, my friend?
Why should a wiki be any different?
It's the same. Google translate uses trained AI models.
[0] Which paradoxically to a significant degree exist thanks to the unpaid work of volunteers in many of such communities.
While translation tools are a godsend for that, as well as life in general when dealing with a language I am not that good at, LLMs make me increasingly reluctant to do that much more because there is no way I could detect AI slop in a second language. For all I know I'd be translating junk into English and enabling translingual citogenesis.
Bad as the slopwave is for native speakers, it's absolutely brutal for non-native speakers when you can't pick up on the tells. Maybe the gap will narrow and narrow until the slop is stylistically imperceptible.
> potentially pushing the most vulnerable languages on Earth toward the precipice as future generations begin to turn away from them.
OK? We have lots of dead languages. It's fine. People use whatever languages are appropriate to them and we don't need to maintain them forever.
But the sentence `well-meaning Wikipedians who think that by creating articles in minority languages they are in some way “helping” those communities` clearly shows the author hasn't really considered the issue.
Survival of the fittest, right ? Not enough people speaking Greenlandic, too complicated even for it's own population who would rather speak danish ? The very reason I'm speaking English is because it was forced military during the 19th century by the UK and since the 20th by Hollywood.
Just like a virus, if a language doesn't spread, it die.
When people have varying levels of capability with languages, they’ll switch to whatever is the lowest common denominator — the language that the group can best communicate in. This tended to be English, even amongst a bunch of native speakers of a common foreign language.
Moreover, this is context dependent: when talking about technical matters (especially computing), the Lingua Franca (pun intended) is English. You’ll hear “locals” switch to either mixed or pure English, even if they’re not great at it. Science, aviation, etc… is the same.
Before English it was French that had this role, and before then it was Latin and Greek.
The thing is, when the whole world speaks one common language like Latin or English, this is a tiny bit sad for some Gaelic tribe that got wiped out culturally, but incredibly valuable for everybody everywhere. International commerce becomes practical. Students can study overseas, spreading ideas further and wider. Books have a bigger market, attracting smarter and better authors. There’s a bigger pool of talented authors to begin with, some of which write educational textbooks of exceptional sparkling quality. These all compound to create a more educated, vibrant, and varied culture… because of, not despite the single language.
For instance, my mother tongue’s Wikipedia (Korean Wikipedia) suffers from serious governance issues. The community often rejects outside contributors, and many experienced editors have already moved to alternative platforms. As a result, I sometimes get mixed, low-quality responses in my native language when using LLMs.
Ultimately, we need high-quality open data. Yet most Korean-language content is locked behind walled gardens run by chaebols like Naver and Kakao — and now they’re lobbying the government to fund their own “sovereign AI” projects. It’s a lose-lose situation.
While they may be a Greenlandic teacher, it's almost assured that they are teaching western Greenlandic, which is similar to Canadian Inuktitut.
People in the East of Greenland speak a language that has similarities, but is different enough in vocabulary and sounds that it's often considered a separate language and not a dialect.
When people from East and West Greenland come together, they typically speak Danish because they can't understand each other in their own native language.
So we're talking about a country that has 55k people and a portion of them don't even speak the official language.. This guy would have no way of knowing whether something was written poorly by a computer or a poorly educated greenlandic native that maybe isn't so good with the official language.
Given that the majority of the country's citizens do not use the internet at all, it is not even clear what his solution is other than just deciding to be some sort of magic arbiter .. which is not realistic or sustainable.
So to get back to the point: Yes the solution is to appoint someone a magic arbiter, and hope they don’t screw up. The fact that it’s a deeply imperfect way of solving problems doesn’t mean it’s not workable. It just means it will backfire at some point, and someone else will get appointed instead.
This is the heart of the matter. Nothing is good or bad in a vacuum, but when two things (say, outcomes) can be compared, distintions can be drawn. Noticing flaws in the present can't be contrasted with simple models of "the better solution"; this is comparing apples to oranges. Address both the good and the bad of the present, including the days where nothing noteworthy happens and therefore below the awareness of most people, and the good and the bad of an elaborated counterpart.
The reason none of this makes sense to me is that it's intellectually crippling Internet users. Computers and the Internet are tools. If you want something machine translated to you, you can use a tool like Google translate to translate it for you. If the webmaster does this, it robs people from the opportunity to learn to use those tools and they become dependent on third parties to do this for them when they would have a lot more freedom if they just did it themselves (or if they learned English).
Teach a man to fish...
On what do you base this assertion? I was not able to find up-to-date statistics, but 72% of participants in this survey from 2013 had internet access at home, either via PC or via mobile devices, and another 11% had internet access elsewhere:
https://digitalimik.gl/-/media/datagl/old_filer/strategi_201...
If this is true, then the easy solution would be to just have two separate wikipedia editions (assuming there is interest).
After all if we have en, sco, jam and ang, surely there is room for two greenlandics. The limitting factor is user interest.
That's... a reach.
An easier, and much more realistic, solution would be to just have one edition in Danish, which was already noted as the language Greenlanders have in common.
However, in the larger picture: languages evolve. New ones develop, old ones die. Do artificial attempts to "rescue" a language really make sense?