The fundamental problem as I see it is that "string" is a grossly leaky and misunderstood abstraction. The string type is not the same thing as a "text" type. It's being used in all the wrong places for that purpose. People treat "string" like it means "text", but in so many places where we deal with them, they just aren't (and should never be) text. Everything from stdio to argv to file paths to environment variables to "text" files to basically any interface with the outside world needs to be dealt with in bytes rather than text if you care about actually producing correct code that doesn't lose, corrupt, or otherwise choke on data.
C++ understood this and got it right, preferring to focus on optimizing rather than constraining the string type. Many other languages did pretty well by avoiding enforcing encodings on strings, too. And Python 2 defaulted to bytes as well, and only really cared about encoding/decoding at I/O boundaries where it thought it can assume it's dealing with text (though it sometimes didn't behave well there, and yes it got painful as a result). Then Python 3 came along and just made everyone start treating most data as if they're inherently (Unicode) text by default, when they really had no such constraints to begin with.
It boggles my mind that Python 3 folks like to beat the drum on how Python 3 got the bytes/unicode right without taking a single moment to even notice that most strings people deal with aren't (and never were!) actually guaranteed to be in a specific, known textual encoding a priori. They were just arrays of code units with few restrictions on them, and if you want to write correct code, you're going to have to deal with bytes by default (or something else with similar flexibility) instead of text. It would've been totally fine to introduce a text type, but it fundamentally can't take the place of a blob type, which is the language of the outside world.