> I’m explicitly talking about cases where you UTF-8-encode right where you’re hashing.
Totally, I get that. I think what you're pointing out is that in a language like Python for example, the scenario I'm trying to describe is meaningless. You can't make an "invalid string" in Python (as far as I know, without resorting to FFI), because it checks things like that during string decoding, and it'll just crash.
But languages like C/C++/Rust/Go work differently. As these languages are commonly used, the string -> UTF-8 step is actually a no-op, because the assumption is that strings are already UTF-8 in memory. (In C or C++ this is usually in the programmer's head rather than in the types, but it's a common choice.) In these languages it's possible for the result of that no-op "encoding" to be invalid, if the input string was invalid somehow. This is a pretty weird edge case and almost certainly a bug that the application needs to fix fix anyway, but if we're noodling about cryptography best practices, it might be nice to limit the "blast radius" of a bug like that.