It's risky to be clever because malformed characters are still a thing. You can't trust that the rules of UTF-8 have been followed without examining every byte of an unknown stream to make sure it is all valid. The best you can do is optimize future uses of the string by keeping the buffer constant and remembering some hints after the first scan about what was found in the string.