Whole languages are dying out because people are unable to express them properly on computers. Even popular software that dominate these speakers does not care to improve their experience. For example, Urdu has traditionally been written in the Nastaliq form [1], but is usually is rendered everywhere in the Naskh form [2]. There is no way to change this, for example, in Android without basically rooting it and changing the system fonts.
I am really surprised Android won't let the user select their own system font. This is a huge accessibility problem, especially for dyslexics.
By contrast Nastaliq is a much more complicated style. Many letters and letter combinations take on several different forms depending on which other letters surround them. Letter joins are usually diagonal, so letters earlier in a word need to be shifted above the baseline by a variable amount. Having to shift letters vertically as well as horizontally greatly complicates other aspects of the style too.
(I recall seeing a nice table some time ago showing all the various different possibilities for letter joins in Nastaliq. Unfortunately I can’t seem to find it again. Still, you might get some idea by consulting the documentation of one of the existing Nastaliq fonts, e.g. Awami Nastaliq: https://software.sil.org/awami/what-is-special/)
There are no technical reasons preventing the use of Nastaliq fonts everywhere. Only product design decisions by big tech.
In Chinese for instance, you can use a keyboard that combines radicals - parts of a character, or you can use a keyboard that combines phonemes. Those seem likely to change literally how you think in your language. There may be related concerns for Arabic.
That said, one of the complaints in the blog is that two different codepoints render to the same exact letter / phrase / word — this is not a problem unique to Arabic in Unicode, and there are known approaches: I’d expect (I’m not a Unicode expert by any means) that more work on the tech stack for rectification (I’m sure there’s a technical Unicode word for this process of matching codepoints for e.g. search and uniqueness of rendering) would likely be useful for Arabic, and relatively seamlessly flow in many places.
But consider how cursive is dying out in (at least American) English, and how many centuries of writing will become unintelligible to the casual reader as a result.
All of these important cultural artifacts require maintenance.
This. Arabic users can complain about eg. Unicode not covering their writing in a suitable manner. And I (as a non-Arabic) can certainly see the problems described in the article.
But -going back to earlier days of computing- what stopped Arabic countries from devising a system that does that better than Unicode? (and covers other written languages like Hangul, Japanese or traditional Chinese, better than Unicode covers them)
Seems like that didn't happen? Either too few Arabic people cared, or solution(s) they came up with had shortcomings of their own & weren't implemented widely enough, or Unicode was good enough that few Arabic developers cared to go beyond that.
https://en.wikipedia.org/wiki/Kurrent
Tons of old documents written in it, basically impossible to decipher for anyone that only learned to write "modern" cursive or even print.
How much of this is still a problem with modern software/font stacks and harfbuzz?
This repository has a good outlook: https://github.com/harfbuzz/harfbuzz-wasm-examples
> The inflexibility persisted and has arguably only become more aggravated in the 20th century
What about 21th century? Digital printing can overlap characters just fine. And modern fonts support context sensitive ligatures and glyph substitutions.
Second/third example those seemed to be caused by more by someone who doesn't understand the language copy pasting stuff.
PDF -> that's just PDF being bad. Text and text search in PDFs tends to mes up even or English.
> with unicode number U+0623, but one can also type أ, which is an alif and a high hamza, represented by unicode numbers U+0627 and U+0654.
That's what Unicode normalization and locale settings are for. Same thing applies to large fraction of latin based scripts other than English, anything which has letters with diacritic marks.
> for كثيره and كثيرة will in most cases yield different results
Similar thing in almost any non English language for example cafe and café or ABC and ⒶⒷⒸ. Although at least some systems handle it reasonably. Not sure how much it is heuristics based on large data (hard to scale across software), and how much it's good application of Unicode character decomposition/normal form tables. Which Arabic letters lack appropriate Unicode decomposition (and other) tables and what are the best practices of unicode normalization/decomposition/locale handling for search (applicable for all languages) are more interesting and actionable topics.
> Not even the simple idea of CJK has been implemented.
Many users of CJK language would argue that CJK unification was a mistake. If different languages prefer different forms of the glyph, they should better be separate characters. Having separate Chinese and Japanese fonts because CJK unified too much just introduces additional points of failure.
Luckily it's not a decision without turning back. In most relevant contexts you should know the input language and can select a Font specifically using said variations. Of course this information will not be present in plain text, but if it turns out to become an issue I'd wager, since language codes do exist, that a control code-point for language selection can be added to the specification. There's already so many special cases in Unicode that it shouldn't be a huge issue (apart from backwards-incompatibility that would lead to tofu instead of no rendered glyph).
And as the article says, since most of the writing is happening on computers, stuff like kashida are going to be forgotten soon.
I don't think you mean this, because I don't know how would you do it in CSS. Looks more like a problem to be solved with different types of character than styling.
You may want to explain more.
Even before Unicode, it was established practice that documents mixing Chinese and Japanese would use the same encoding for both and roughly nobody would bother to pick an ugly font for the foreign-language text to make it look appropriately different.
Unicode rightly decided that the fine details of appearance are left to fonts. Otherwise you'd also need e.g. a bunch of extra codepoints so that early-20th-century handwritten letters in German can have their look accurately preserved: https://en.wikipedia.org/wiki/S%C3%BCtterlin
Now, if a file was encoded in Unicode, and/or if it was in such document format that support inline font specification, such as HTML, then you could mix two languages without having to stick to one language by e.g. wrapping <font face=Helvetica>paragraphs and words</font> <font face=Futura>with tags</font>.
My point is, it seems that the author is not aware that each of CJK languages are only understood within each countries, in both writings and speeches, and that's somewhat peculiar.
So adopting Arabizi without increasing access to education can be expected to do roughly nothing for literacy, whereas with a good education system, people can learn to read and write in Arabic script just fine.