undefined | Better HN

0 pointsnecovek2d ago0 comments

This is an article with a long introduction and then jumps straight to the point in one, final paragraph: Russia is abusing it for political messaging again. While yes, any tool will be abused like this, it really is also a tool to best codify spoken language of the Slavs (in a sense, it is trivially provable that Cyrillic script is better adapted even to languages which do not use it today, but have to resort to digraphs or glyphs with diacritics — some are thus not using it to distance from a particular influence instead).

None of the interesting bits of Cyrillic invention are covered, like how the original Slavic script was Glagolitic as the sibling mentioned, and only evolved into modern Cyrillic much later. Or how there was no lowercase until a few centuries ago, especially with the reform of Peter the Great.

With Slavic people, it's also worth noting that "Slav" actually means "word" or "letter" (of an alphabet), so legibility was part of the identity. In contrast, most Slavic people call Germans a variation of "Nemci", or mutes (those who cannot speak) — notably, most except Russians who call them Germans. Again, likely to distance themselves from the negative connotation with their aspiring historical partners.

0 comments

orbital-decay2d ago

No idea where you're getting it from, Germans are Nemci in Russian as well. It's rather "unable to speak the language", meant for all foreigners but later stuck to Germans, presumably because German traders were the most common foreigners.

necovekOP2d ago

Apologies, it was mostly from running across different Russian maps with Германия that I took it as such (in Serbian it is Немачка). I stand corrected!

Nem/нем literally means "mute" in Serbian, perhaps it's a latter evolution per region either way.

gibber8782d ago

It seems to me that you have entirely discredited yourself. You confidently make claims about the Russian language but don't even know the most basic thing about the point you were making.

1 more reply

xxs2d ago

>"mute" in Serbian

Very far from Serbian only. Bulgarian, Russian, and even Balti-Slavic like Latvian is similar enough.

konart2d ago

>Nem/нем literally means "mute" in Serbian,

Same in Russian

нем\немой - mute

немота - muteness

But yes, we do use Germany for country's name :)

1 more reply

mootothemax2d ago

> Germans are Nemci in Russian as well

I wanted to check; are you implying that Russian is not a Slavic language?

orbital-decay2d ago

No, GP is saying that Russian uses the Latin root for Germans, I'm saying it doesn't. (it does for Germany though: "Germaniya").

1 more reply

Antibabelic2d ago

"Slav" deriving from the Slavic term for "word" is something of a false etymology that was invented in the 19th century. It is implausible on philological grounds: you'd expect a different vowel in this word if this were the case, and the suffix *-ninъ is only otherwise used in terms derived from place names.

It is more likely[0] that the term derives from some toponym. This is in line with how tribal names tend to work in Europe and is not problematic in terms of historical linguistics, however it gives less fuel to romantic nationalism and armchair speculations about national "identities" or "mindsets".

-----

[0] https://en.wiktionary.org/wiki/Reconstruction:Proto-Slavic/s...

rich_sasha2d ago

Dunno. A nice parallel fact is that the word for "Germans" in at least a few Slavic languages literally means "mutes" - the ones who don't speak.

So you'd have the Slavs - the people of word - and the Germans - the mutes.

weezing1d ago

Exactly. In Polish "Niemcy" (Germans) comes straight from the mutes due to language barrier.

mootothemax2d ago

The irony for me being that when I was first learning Polish and looking for any and all mnemonics - “ah, that word is the number nine, and that one is ten because it has an s in the middle and that’s next to t for ten in the alphabet”-levels of desperate - the false etymology helped me set word, słowo, in my head, and the rather delightful dosłownie, literally / to the word, has remained ever since.

(tho while on the subject, it’s hard to beat wieloryb as a wonder that I don’t want to know the true etymology of ever because if there’s even a chance that the word for whale derived from the words great as-in-size + fish, I want to hang on to it forever)

acadapter2d ago

False etymology? You can roll back sound changes further to *ḱlew- in Proto-Indo-European

https://en.wiktionary.org/wiki/Reconstruction:Proto-Indo-Eur...

pavel_lishin1d ago

I always thought it was probable that it came from the same root as the word for "glory" - слава - as in, we're the glorious people.

tkot2d ago

> it really is also a tool to best codify spoken language of the Slavs (in a sense, it is trivially provable that Cyrillic script is better adapted even to languages which do not use it today, but have to resort to digraphs or glyphs with diacritics — some are thus not using it to distance from a particular influence instead

I've heard this claim many times but never the reasoning behind it - by what metric is "ш" superior to "š" and so on?

necovekOP1d ago

It's less pronounced with diacritics, but enter Unicode normal forms: you can represent š either as š, or s followed by a diacritic. When you want to compare two strings, you have to normalize them to ensure you are comparing apples to apples. I can guarantee most software is broken in that regard. For Cyrillic, it just works.

With digraphs (lj, nj, dž + sometimes dj for đ too), it's even worse. Even capitalization is ambiguous: sometimes it's Lj and other times it's LJ. Then you have words like konjugacija where nj is not a digraph.

Interestingly — and not many know this — Unicode includes separate codepoints for all of the digraphs too. While well-intentioned, it only makes the problem worse.

Digraphs are especially sucky when you try sorting strings in a phonebook order as LJ comes after L, so you've got ...LI, LK..., LZ, LJA... With exceptions, it is even worse.

tkot1d ago

> It's less pronounced with diacritics, but enter Unicode normal forms: you can represent š either as š, or s followed by a diacritic. When you want to compare two strings, you have to normalize them to ensure you are comparing apples to apples. I can guarantee most software is broken in that regard. For Cyrillic, it just works.

It's the same with Unicode encoding of Cyrillic letters - й (U+0439) can be written as й (и U+0438 + ◌̆ U+0306)

> Interestingly — and not many know this — Unicode includes separate codepoints for all of the digraphs too. While well-intentioned, it only makes the problem worse.

Based on your description it seems that the root cause of the issues is using two letters to represent the digraph - for example N (U+004E) J (U+004A) instead of Ǌ (U+01CA) - and the sorting issues would be identical if people typed Н (U+041D) Ь (U+042C)instead of Њ (U+040A).

What's the reason for the digraph being substituted by 2 letters in the first case more often than in the second case?

troupo1d ago

So, it's not "trivially provable that Cyrillic is better suited to Slavic languages". But that "the symbols representtion we settled on in software has some difficulties disambiguatuong some, but not all cases of symbol use in a language, a problem that is not unique to Slavic languages, see Dutch IJ, Turkish ı/i, German ß etc."

konart2d ago

> Slavic people call Germans a variation of "Nemci", or mutes (those who cannot speak) — notably, most except Russians who call them Germans.

last time I checked we also call them "немцы" (Nemci and sounds exactly the same)

Tade02d ago

> some are thus not using it to distance from a particular influence instead

That's not the reason. The real reason is how those regions were Christianised - Cyril and Methodius created the first version of what would later evolve into cyrilic script and they were sent by Constantinople, while missionaries sent by Rome would use latin script.

gostsamo1d ago

Slav comes from slovo == слово which means word or speech, a.k.a slavs are people who can talk to each other which is a pattern in many other ethnic groups about differentiating between themselves and outsiders. Немци or mutes are those who cannot speak the language.

troupo2d ago

> is trivially provable that Cyrillic script is better adapted even to languages which do not use it today, but have to resort to digraphs or glyphs with diacritics

Take a look at the Cyrillic section of Unicode to see your trivially provable claim being trivially disproven. You'll see all the same digraphs, glyphs, accents, graves etc. as used in Latin scripts.

It's also easy to see it easily disproven if you look at all the languages USSR forced cyrillic alphabet on.

Antibabelic2d ago

To be fair, the parent post was clearly talking about Slavic languages, not "all the languages USSR forced cyrillic alphabet on", which were not Slavic and which required significant modifications to the alphabet.

necovekOP2d ago

Indeed: most notably, Croatian, Slovenian, Bosnian, Serbian and Montenegrin are all unambiguous with Cyrillic, but Latin script dominates, even in officially Cyrillic-first Serbia.

Again, it is seen as a political tool (pro-West or pro-Russia), when Cyrillic is technically better suited (there is certainly history as well, but that's very mixed up in the region).

Again, I am saying this as someone who has worked to implement things like full-text search, collation (lexical ordering/sorting) algorithms and tables, fonts and ligatures, functions like uppercase/titlecase/lowercase...

Eg. an already complex Unicode Collation Algorithm tables can never support exceptions with digraphs like "konjukcija" (nj is usually a digraph, but not here), etc.

2 more replies

troupo1d ago

Since we're talking about Serbian below, here are some characters from Cyrillic Serbian Alphabet:

Ђ/ђ

Ћ/ћ

Љ/љ

Њ/њ

Џ/џ

Ј/ј

Various diacritical marks, digraph, a jod... What makes this Cyrillic more unambiguous than the Latin equivalents?

1 more reply

ceedaxp2d ago

Most of the extra glyphs are for non-Slavic (Turk languages of Central Asia and Siberia). You see the same (and worse) in Latin Unicode pages — just look at how many variations of vowels 'a', 'i', or 'e' you have, consonants like 'c', 'z', 's'…

troupo1d ago

Even within Slavic languages there is plenty of weirdness: https://news.ycombinator.com/item?id=48064121

j / k navigate · click thread line to collapse

0 comments

orbital-decay2d ago

necovekOP2d ago

Apologies, it was mostly from running across different Russian maps with Германия that I took it as such (in Serbian it is Немачка). I stand corrected!

Nem/нем literally means "mute" in Serbian, perhaps it's a latter evolution per region either way.

gibber8782d ago

It seems to me that you have entirely discredited yourself. You confidently make claims about the Russian language but don't even know the most basic thing about the point you were making.

1 more reply

xxs2d ago

>"mute" in Serbian

Very far from Serbian only. Bulgarian, Russian, and even Balti-Slavic like Latvian is similar enough.

konart2d ago

>Nem/нем literally means "mute" in Serbian,

Same in Russian

нем\немой - mute

немота - muteness

But yes, we do use Germany for country's name :)

1 more reply

mootothemax2d ago

> Germans are Nemci in Russian as well

I wanted to check; are you implying that Russian is not a Slavic language?

orbital-decay2d ago

No, GP is saying that Russian uses the Latin root for Germans, I'm saying it doesn't. (it does for Germany though: "Germaniya").

1 more reply

Antibabelic2d ago

-----

[0] https://en.wiktionary.org/wiki/Reconstruction:Proto-Slavic/s...

rich_sasha2d ago

Dunno. A nice parallel fact is that the word for "Germans" in at least a few Slavic languages literally means "mutes" - the ones who don't speak.

So you'd have the Slavs - the people of word - and the Germans - the mutes.

weezing1d ago

Exactly. In Polish "Niemcy" (Germans) comes straight from the mutes due to language barrier.

mootothemax2d ago

acadapter2d ago

False etymology? You can roll back sound changes further to *ḱlew- in Proto-Indo-European

https://en.wiktionary.org/wiki/Reconstruction:Proto-Indo-Eur...

pavel_lishin1d ago

I always thought it was probable that it came from the same root as the word for "glory" - слава - as in, we're the glorious people.

tkot2d ago

I've heard this claim many times but never the reasoning behind it - by what metric is "ш" superior to "š" and so on?

necovekOP1d ago

Interestingly — and not many know this — Unicode includes separate codepoints for all of the digraphs too. While well-intentioned, it only makes the problem worse.

Digraphs are especially sucky when you try sorting strings in a phonebook order as LJ comes after L, so you've got ...LI, LK..., LZ, LJA... With exceptions, it is even worse.

tkot1d ago

It's the same with Unicode encoding of Cyrillic letters - й (U+0439) can be written as й (и U+0438 + ◌̆ U+0306)

> Interestingly — and not many know this — Unicode includes separate codepoints for all of the digraphs too. While well-intentioned, it only makes the problem worse.

What's the reason for the digraph being substituted by 2 letters in the first case more often than in the second case?

troupo1d ago

konart2d ago

> Slavic people call Germans a variation of "Nemci", or mutes (those who cannot speak) — notably, most except Russians who call them Germans.

last time I checked we also call them "немцы" (Nemci and sounds exactly the same)

Tade02d ago

> some are thus not using it to distance from a particular influence instead

gostsamo1d ago

troupo2d ago

> is trivially provable that Cyrillic script is better adapted even to languages which do not use it today, but have to resort to digraphs or glyphs with diacritics

Take a look at the Cyrillic section of Unicode to see your trivially provable claim being trivially disproven. You'll see all the same digraphs, glyphs, accents, graves etc. as used in Latin scripts.

It's also easy to see it easily disproven if you look at all the languages USSR forced cyrillic alphabet on.

Antibabelic2d ago

necovekOP2d ago

Indeed: most notably, Croatian, Slovenian, Bosnian, Serbian and Montenegrin are all unambiguous with Cyrillic, but Latin script dominates, even in officially Cyrillic-first Serbia.

Again, it is seen as a political tool (pro-West or pro-Russia), when Cyrillic is technically better suited (there is certainly history as well, but that's very mixed up in the region).

Eg. an already complex Unicode Collation Algorithm tables can never support exceptions with digraphs like "konjukcija" (nj is usually a digraph, but not here), etc.

2 more replies

troupo1d ago

Since we're talking about Serbian below, here are some characters from Cyrillic Serbian Alphabet:

Ђ/ђ

Ћ/ћ

Љ/љ

Њ/њ

Џ/џ

Ј/ј

Various diacritical marks, digraph, a jod... What makes this Cyrillic more unambiguous than the Latin equivalents?

1 more reply

ceedaxp2d ago

troupo1d ago

Even within Slavic languages there is plenty of weirdness: https://news.ycombinator.com/item?id=48064121

j / k navigate · click thread line to collapse