I disagree. Not all text is human prose. For example, there is nothing wrong with an programming language that only allows ASCII in the source code and many downsides to allowing non-ASCII characters outside string constants or comments.
That's a tradeoff you should carefully consider because there are also downsides to disallowing non-ASCII characters. The downsides of allowing non-ASCII mostly stem from assigning semantic significance to upper/lowercase (which is itself a tradeoff you should consider when designing a language). The other issue I can think of is homographs but it seems to be more of a theoretical concern than a problem you'd run into in practice.
When I first learned programming I used my native language (Finnish, which uses 3 non-ASCII letters: åäö) not only for strings and comments but also identifiers. Back then UTF-8 was not yet universally adopted (ISO 8859-1 character set was still relatively common) so I occasionally encountered issues that I had no means to understand at the time. As programming is being taught to younger and younger audiences it's not reasonable to expect kids from (insert your favorite non-English speaking country) to know enough English to use it for naming. Naming and, to an extent, thinking in English requires a vocabulary orders of magnitude larger than knowing the keywords.
By restricting source code to ASCII only you also lose the ability to use domain-specific notation like mathematical symbols/operators and Greek letters. For example in Julia you may use some mathematical operators (eg. ÷ for Euclidean division, ⊻ for exclusive or, ∈/∉/∋ for checking set membership) and I find it really makes code more pleasant to read.
Not saying the trade-off isn't worth it, but I do feel like there is a tendency to overuse unicode somewhat in Julia.
Now list anything as important from your list of downsides that's just as unfixable
Lots of people around the world learn programming from sources in their native language, especially early in their career, or when software development is not their actual job.
Enforcing ASCII is the same as enforcing English. How would you feel if all cooking recipes were written in French? If all music theory was in Italian? If all industrial specifications were in German?
It's fine to have a dominant language in a field, but ASCII is a product of technical limitations that we no longer have. UTF-8 has been an absolute godsend for human civilization, despite its flaws.
In fact it's awesome that we have one common very simple character set and language that works everywhere and can do everything.
I have only encountered source code using my native language (German) in comments or variable names in highly unprofessional or awful software and it is looked down upon. You will always get an ugly mix and have to mentally stop to figure out which language a name is in. It's simply not worth it.
Please stop pushing this UTF-8 everywhere nonsense. Make it work great on interactive/UI/user facing elements but stop putting UTF-8-only restrictions in low-level software. Example: Copied a bunch of ebooks to my phone, including one with a mangled non-UTF-8 name. It was ridiculously hard to delete the file as most Android graphical and console tools either didn't recognize it or crashed.
I was with you until this sentence. UTF-8 everywhere is great exactly because it is ASCII-compatible (e.g. all ASCII strings are automatically also valid UTF-8 strings, so UTF-8 is a natural upgrade path from ASCII) - both are just encodings for the same UNICODE codepoints, ASCII just cannot go beyond the first 127 codepoints, but that's where UTF-8 comes in and in a way that's backward compatible with ASCII - which is the one ingenious feature of the UTF-8 encoding.
Restricting the program part to ASCII is fine for me, but as a fellow German it's also important to recognize that we don't loose much by not having ä cömplete sät of letters. Everyone can write comprehensible German using ASCII characters only. So I would listen to what people from languages that really don't fit into ASCII have to say.
None of the signals were intuitive because they weren’t the typical English abbreviations!
Andreas Rumpf, the designer of Nim, is Austrian. All the keywords of Nim are in English, the library function names are in English, the documentation is in English, Rumpf's book Mastering Nim is in English, the other major book for the language, Nim In Action (written by Dominik Picheta, nationality unknown but not American) is in English ... this is not "American imperialism" (which is a real thing that I don't defend), it's for easily understandable pragmatic reasons. And the language parser doesn't disallow non-ASCII characters but it doesn't treat them linguistically, and it has special rules for casefolding identifiers that only recognize ASCII letters, hobbling the use of non-ASCII identifiers because case distinguishes between types and other identifiers. The reason for this lack of handling of Unicode linguistically is simply to make the lexer smaller and faster.
No, it is actually for security reasons. Once you allow non-ASCII identifiers, identifiers will become non identifiable. Only zig recognized that. Nim allows insecure identifiers. https://github.com/rurban/libu8ident/blob/master/doc/c11.md#...
The motte: non-ASCII identifiers should be allowed
The bailey: disallowing non-ASCII identifiers is American imperialism at its worst
You are severely underestimate how far you can get without any real command of the English language. I agree that you can't become really good without it, just like you can't do haute cuisine without some French, but the English language is a huge and unnecessary barrier of entry that you would put in front of everyone in the world who isn't submerged in the language from an early age.
Imagine learning programming using only your high school Spanish. Good luck.
UNICODE is essentially a superset of ASCII, and the UTF-8 encoding also contains ASCII as compatible subset (e.g. for the first 127 UNICODE code points, an UTF-8 encoded string is byte-by-byte compatible with the same string encoded in ASCII).
Just don't use any of the Extended ASCII flavours (e.g. "8-bit ASCII with codepages") - or any of the legacy 'national' multibyte encodings (Shift-JIS etc...) because that's how you get the infamous `?????` or `♥♥♥♥♥` mismatches which are commonly associated with 'ASCII' (but this is not ASCII, but some flavour of Extended ASCII decoded with the wrong codepage).
ASCII wasn't "imperialism," it was pragmatism. Yes, it privileged English -- but that's because the engineers designing it _spoke_ English and the US was funding + exporting most of the early computer and networking gear. The US Military essentially gave the world TCP/IP (via DARPA) for free!
Maybe "cultural dominance", but "imperialism at its worst" is a ridiculous take.