undefined | Better HN

0 pointsAaargh2031810y ago0 comments

It's also for cases when you know it can't happen. For example, Java's String class has a method

    public byte[] getBytes(String charsetName) throws UnsupportedEncodingException

Since it can throw a checked exception you have to catch it, which generally is fine, but consider this case:

    someString.getBytes("UTF-8");

This call can never fail (support for UTF-8 encoding is required in Java) but in my case I have to do something with the exception in the catch statement or our static code analysis tool will start complaining (and rightfully so). So that's where I'll log a 'can't happen error'. It truly cannot happen.

0 comments

17 comments · 6 top-level

wyager10y ago· 6 in thread

getBytes is poorly designed. In a safety-oriented language like Haskell or Rust, the set of encodings would be represented as an ADT (which forms a closed set) or s Typeclass (open set). All possible type-correct encoding arguments would be safe.

MichaelBurge10y ago

Speaking to the Haskell part:

An ExceptT or Maybe monad for handling encoding errors feels a lot like throwing exceptions, although they are less disruptive than exceptions. I'd probably represent a decoder as a function with type ByteString -> ErrorT ParseError m Text, which is neither an ADT or Typeclass. It's a 3rd solution. Either that or an Attoparsec parser, which is probably equivalent. An encoder seems like it shouldn't fail at all, but if it eventually forks out to one of the C locale functions I can see it throwing errors too.

Meanwhile, in the real world, Data.Text.Encoding uses a 4th solution implementing decodeUtf8With and encodeUtf8 that ultimately represents the UTF-8 encoding as a pair of FFI functions with these signatures:

  foreign import ccall unsafe "_hs_text_decode_utf8"   c_decode_utf8
      :: MutableByteArray# s -> Ptr CSize
      -> Ptr Word8 -> Ptr Word8 -> IO (Ptr Word8)

and this one:

  foreign import ccall unsafe "_hs_text_encode_utf8" c_encode_utf8
      :: Ptr (Ptr Word8) -> ByteArray# -> CSize -> CSize -> IO ()

text-icu also ultimately represents an encoding as an opaque pointer returned by the ICU library, and works in the IO monad. So it too could fail in similar ways. Errors throw an exception of type ICUError, which the caller can catch using the 'catch' function from Control.Exception.

The encoding library does use typeclasses like you suggest, but I'm not sure anybody uses it. Sometimes people drop in #haskell and complain about that library, and the response is usually "don't use that; use the one in Data.Text.Encoding instead".

I don't use Rust, but if the language is at all practical, I imagine they shuttle their equivalent of pointers and bytestrings around and depend on foreign C libraries and locales just the same. Probably they don't want to change the core library every time the Unicode Consortium publishes a new encoding scheme, so I can't imagine them exposing only a closed type.

So, looking purely at the signature and comparing it to examples from a language you suggested, it doesn't appear to be poorly designed at all. It's exactly what I would expect and want in any language, and the library consensus seems to agree. I think you're just imagining the grass being greener on the other side.

steveklabnik10y ago

In Rust, our main two string types are String and &str, which are both UTF-8 encoded. For interoperability with other things, we have additional types that you can convert to/from. http://andrewbrinker.github.io/blog/2016/03/27/string-types-... is a recent overview in a blog post.

nostrademons10y ago

You still run into the problem with other functions, though. For example, in Haskell, 'tail' is a partial function - it's undefined if the list is nil. If, in your code, you write:

  xs = if p x then concat [[x, "bar"], foos] else "baz" : foos
  ys = tail xs

Then you know that the call to tail is not going to fail in your code, because the input has a guaranteed minimum length of either 1 or 2. You can't make that same guarantee about tail in isolation, though.

chopin10y ago

Sometimes you need also a user provided encoding (think of editors). In that case, the exception makes sense. Haskell or Rust would need to provide an extra API for this case. But generally you are right, stronger type checking would be preferable. Anyway, I dislike API's which take a String but only support a strongly limited subset of these. In that case, a dedicated type suits much better.

wyager10y ago

That's what typeclasses are for.

catnaroek10y ago

Rather than “poorly designed”, I'd say “reflects a limitation of the language”. You can't blame libraries for language defects.

jdmichal10y ago· 3 in thread

Also in Java: Switching or if-else chains on an `enum`. It's still a good practice to include a final `else` or `default` case, but it should really never happen. Actually, inclusion of the `default` case will be enforced by the compiler if it can detect a code path that doesn't return. [0]

    enum Whatever { FOO, BAR }

    if (whatever == FOO) {
    } else if (whatever == BAR) {
    } else {
        // Should never happen!
    }

[0] http://stackoverflow.com/questions/5013194/why-is-default-re...

jrgv10y ago

I'd prefer to throw an exception in that else block:

    throw new AssertionError("Should never happen");

jdmichal10y ago

I just put a comment as an example. I typically log-and-throw.

lancefisher10y ago

This can happen when some other developer adds to your enum not knowing about the use.

1 more reply

masklinn10y ago· 2 in thread

FWIW getBytes(Charset) doesn't throw, and there's a base set of charsets in StandardCharsets (1.7+):

    someString.getBytes(StandardCharsets.UTF_8);

(Charset.forName doesn't throw either, StandardCharsets avoid stringly-typed code but it's not available on 1.6 so if you're still stuck there Charset.forName works)

jdmichal10y ago

You see the same thing with cipher suites, and there's unfortunately no `StandardCiphers` class.

Aaargh20318OP10y ago

yeah I know, this was just the first example that came to mind.

nostrademons10y ago

Swift has the force-unwrap operator ! and its exception-handling variant try! for that. Sometimes you know that the exception case in the API will never arise, and the appropriate thing to do is to crash and let the programmer know that one of their assumptions is wrong. For example, you might be parsing JSON data that was generated within the program itself; normally JSON deserialization can fail for malformed JSON, but if you just constructed that JSON string within the same function and passed it directly, you know it's not gonna fail. It's pretty handy to ignore the error and turn it into an assertion in these cases, although this power should be used judiciously.

chopin10y ago

This one bothers me every time. Other parts in the library provide a checked and an unchecked variant to achieve the same. If you put in a hardcoded string, you know it never fails.

BTW I made a (almost religious) habit out of ensuring that everything I touch is encoded UTF-8 or can be converted to that as I have been bitten several times hard by unexpected encoding stuff. Therefore, the above problem catches me pretty often.

amenod10y ago

I often use such error checking. Usually the value of such error message is in making the code easier to reason about, to further clarify some obscure use case (which can't happen). And if it does happen anyway - well, at least we get that alert. ;)

j / k navigate · click thread line to collapse

0 comments

17 comments · 6 top-level

wyager10y ago· 6 in thread

MichaelBurge10y ago

Speaking to the Haskell part:

  foreign import ccall unsafe "_hs_text_decode_utf8"   c_decode_utf8
      :: MutableByteArray# s -> Ptr CSize
      -> Ptr Word8 -> Ptr Word8 -> IO (Ptr Word8)

and this one:

  foreign import ccall unsafe "_hs_text_encode_utf8" c_encode_utf8
      :: Ptr (Ptr Word8) -> ByteArray# -> CSize -> CSize -> IO ()

steveklabnik10y ago

nostrademons10y ago

You still run into the problem with other functions, though. For example, in Haskell, 'tail' is a partial function - it's undefined if the list is nil. If, in your code, you write:

  xs = if p x then concat [[x, "bar"], foos] else "baz" : foos
  ys = tail xs

chopin10y ago

wyager10y ago

That's what typeclasses are for.

catnaroek10y ago

Rather than “poorly designed”, I'd say “reflects a limitation of the language”. You can't blame libraries for language defects.

jdmichal10y ago· 3 in thread

    enum Whatever { FOO, BAR }

    if (whatever == FOO) {
    } else if (whatever == BAR) {
    } else {
        // Should never happen!
    }

[0] http://stackoverflow.com/questions/5013194/why-is-default-re...

jrgv10y ago

I'd prefer to throw an exception in that else block:

    throw new AssertionError("Should never happen");

jdmichal10y ago

I just put a comment as an example. I typically log-and-throw.

lancefisher10y ago

This can happen when some other developer adds to your enum not knowing about the use.

1 more reply

masklinn10y ago· 2 in thread

FWIW getBytes(Charset) doesn't throw, and there's a base set of charsets in StandardCharsets (1.7+):

    someString.getBytes(StandardCharsets.UTF_8);

(Charset.forName doesn't throw either, StandardCharsets avoid stringly-typed code but it's not available on 1.6 so if you're still stuck there Charset.forName works)

jdmichal10y ago

You see the same thing with cipher suites, and there's unfortunately no `StandardCiphers` class.

Aaargh20318OP10y ago

yeah I know, this was just the first example that came to mind.

nostrademons10y ago

chopin10y ago

This one bothers me every time. Other parts in the library provide a checked and an unchecked variant to achieve the same. If you put in a hardcoded string, you know it never fails.

amenod10y ago

j / k navigate · click thread line to collapse