I imagine it wouldn't have been that hard, if the authors had just used ICU4C for string handling. I've had good luck with just converting all input to normal form C. The bigger challenge there is that, if you're using ICU strings, then you've lost the ability to use any library that is designed to work with C strings. There's no way to avoid 0x00 showing up in the middle of UTF-16 and UTF-32 strings, and, even if you use modified UTF-8 to avoid NUL bytes, you still break the assumption that a string's length is equal to its size is bytes.