I was with you until this sentence. UTF-8 everywhere is great exactly because it is ASCII-compatible (e.g. all ASCII strings are automatically also valid UTF-8 strings, so UTF-8 is a natural upgrade path from ASCII) - both are just encodings for the same UNICODE codepoints, ASCII just cannot go beyond the first 127 codepoints, but that's where UTF-8 comes in and in a way that's backward compatible with ASCII - which is the one ingenious feature of the UTF-8 encoding.
And bytes can conveniently fit both ASCII and UTF-8.
If you want to restrict your programming language to ASCII for whatever reason, fine by me. I don't need "let wohnt_bei_Böckler_STRAẞE = ..." that much.
But if you allow full 8-bit bytes, please don't restrict them to UTF-8. If you need to gracefully handle non-UTF-8 sequences graphically show the appropriate character "�", otherwise let it pass through unmodified. Just don't crash, show useless error messages or in the worst case try to "fix" it by mangling the data even more.
This string cannot be encoded as ASCII in the first place.
> But if you allow full 8-bit bytes, please don't restrict them to UTF-8
UTF-8 has no 8-bit restrictions... You can encode any 21-bit UNICODE codepoint with UTF-8.
It sound's like you're confusing ASCII, Extended ASCII and UTF-8:
- ASCII: 7-bits per "character" (e.g. not able to encode international characters like äöü) but maps to the lower 7-bits of the 21-bits of UNICODE codepoints (e.g. all ASCII character codes are also valid UNICODE code points)
- Extended ASCII: 8-bits per "character" but the interpretation of the upper 128 values depends on a country-specific codepage (e.g. the intepretation of a byte value in the range between 128 and 255 is different between countries and this is what causes all the mess that's usually associated with "ASCII". But ASCII did nothing wrong - the problem is Extended ASCII - this allows to 'encode' äöü with the German codepage but then shows different characters when displayed with a non-German codepage)
- UTF-8: a variable-width encoding for the full range of UNICODE codepoints, uses 1..4 bytes to encode one 21-bit UNICODE codepoint, and the 1-byte encodings are identical with 7-bit ASCII (e.g. when the MSB of a byte in an UTF-8 string is not set, you can be sure that it is a character/codepoint in the ASCII range).
Out of those three, only Extended ASCII with codepages are 'deprecated' and should no longer be used, while ASCII and UTF-8 are both fine since any valid ASCII encoded string is indistinguishable from that same string encoded as UTF-8, e.g. ASCII has been 'retconned' into UTF-8.
The problem they're describing happens because file names (in Linux and Windows) are not text: in Linux (so Android) they're arbitrary sequences of bytes, and in Windows they're arbitrary sequences of UTF-16 code points not necessarily forming valid scalar values (for example, surrogates can be present alone).
And yet, a lot of programs ignore that and insist on storing file names as Unicode strings, which mostly works (because users almost always name files by inputting text) until somehow a file gets written as a sequence of bytes that doesn't map to a valid string (i.e., it's not UTF-8 or UTF-16, depending on the system).
So what's probably happening in GP's case is that they managed somehow to get a file with a non-UTF-8-byte-sequence name in Android, and subsequently every App that tries to deal with that file uses an API that converts the file name to a string containing U+FFFD ("replacement character") when the invalid UTF-8 byte is found. So when GP tries to delete the file, the App will try to delete the file name with the U+FFFD character, which will fail because that file doesn't exist.
GP is saying that showing the U+FFFD character is fine, but the App should understand that the actual file name is not UTF-8 and behave accordingly (i.e. use the original sequence-of-bytes filename when trying to delete it).
Note that this is harder than it should be. For example, with the old Java API (from java.io[1]) that's impossible: if you get a `File` object from listing a directory and ask if it exists, you'll get `false` for GP's file, because the `File` object internally stores the file name as a Java string. To get the correct result, you have to use the new API (from java.nio.file[2]) using `Path` objects.
[1] https://developer.android.com/reference/java/io/File
[2] https://developer.android.com/reference/java/nio/file/Path
Sure, it's backward compatible, as in ASCII handling codes work on systems with UTF-8 locales, but how important is that?
It's only Windows which is stuck in the past here, and Microsoft had 3 decades to fix that problem and migrate away from codegpages to locale-asgnostic UTF-8 (UTF-8 was invented in 1992).