public byte[] getBytes(String charsetName) throws UnsupportedEncodingException
Since it can throw a checked exception you have to catch it, which generally is fine, but consider this case: someString.getBytes("UTF-8");
This call can never fail (support for UTF-8 encoding is required in Java) but in my case I have to do something with the exception in the catch statement or our static code analysis tool will start complaining (and rightfully so). So that's where I'll log a 'can't happen error'. It truly cannot happen.An ExceptT or Maybe monad for handling encoding errors feels a lot like throwing exceptions, although they are less disruptive than exceptions. I'd probably represent a decoder as a function with type ByteString -> ErrorT ParseError m Text, which is neither an ADT or Typeclass. It's a 3rd solution. Either that or an Attoparsec parser, which is probably equivalent. An encoder seems like it shouldn't fail at all, but if it eventually forks out to one of the C locale functions I can see it throwing errors too.
Meanwhile, in the real world, Data.Text.Encoding uses a 4th solution implementing decodeUtf8With and encodeUtf8 that ultimately represents the UTF-8 encoding as a pair of FFI functions with these signatures:
foreign import ccall unsafe "_hs_text_decode_utf8" c_decode_utf8
:: MutableByteArray# s -> Ptr CSize
-> Ptr Word8 -> Ptr Word8 -> IO (Ptr Word8)
and this one: foreign import ccall unsafe "_hs_text_encode_utf8" c_encode_utf8
:: Ptr (Ptr Word8) -> ByteArray# -> CSize -> CSize -> IO ()
text-icu also ultimately represents an encoding as an opaque pointer returned by the ICU library, and works in the IO monad. So it too could fail in similar ways. Errors throw an exception of type ICUError, which the caller can catch using the 'catch' function from Control.Exception.The encoding library does use typeclasses like you suggest, but I'm not sure anybody uses it. Sometimes people drop in #haskell and complain about that library, and the response is usually "don't use that; use the one in Data.Text.Encoding instead".
I don't use Rust, but if the language is at all practical, I imagine they shuttle their equivalent of pointers and bytestrings around and depend on foreign C libraries and locales just the same. Probably they don't want to change the core library every time the Unicode Consortium publishes a new encoding scheme, so I can't imagine them exposing only a closed type.
So, looking purely at the signature and comparing it to examples from a language you suggested, it doesn't appear to be poorly designed at all. It's exactly what I would expect and want in any language, and the library consensus seems to agree. I think you're just imagining the grass being greener on the other side.
xs = if p x then concat [[x, "bar"], foos] else "baz" : foos
ys = tail xs
Then you know that the call to tail is not going to fail in your code, because the input has a guaranteed minimum length of either 1 or 2. You can't make that same guarantee about tail in isolation, though. enum Whatever { FOO, BAR }
if (whatever == FOO) {
} else if (whatever == BAR) {
} else {
// Should never happen!
}
[0] http://stackoverflow.com/questions/5013194/why-is-default-re... throw new AssertionError("Should never happen"); someString.getBytes(StandardCharsets.UTF_8);
(Charset.forName doesn't throw either, StandardCharsets avoid stringly-typed code but it's not available on 1.6 so if you're still stuck there Charset.forName works)BTW I made a (almost religious) habit out of ensuring that everything I touch is encoded UTF-8 or can be converted to that as I have been bitten several times hard by unexpected encoding stuff. Therefore, the above problem catches me pretty often.