> Most Unix syscalls use C-style strings, which are a string of 8-bit bytes terminated with a zero byte. With many (most?) character encodings you can continue to present string data to syscalls in the same way, since they often also reserved a byte value of zero for the same purpose
That's completely wrong. If a syscall (or a function) expects text in encoding A, you should not be sending it in encoding B because it would be interpreted incorrectly, or even worse, this would become a vulnerability.
For every function, encoding must be specified as are specified the types of arguments, constraints and ownership rules. Sadly many open source libraries do not do it. How are you supposed to call a function when you don't know the expected encoding?
Also, it is better to send a pointer and a length of the string rather than potentially infinitely search for a zero byte.
> and the result is that if you want UTF-16 support in your existing C-string-based syscalls
There is no need to support multiple encodings, it only makes things complicated. The simplest solution would be to use UTF-8 for all kernel facilities as a standard.
For example, it would be better if open() syscall required valid UTF-8 string for a file name. This would leave no possibility for displaying file names as question marks.