Beating Decades of Optimized C with 80 Lines of Haskell (opens in new tab)

(chrispenner.ca)

14 pointsszemet6y ago13 comments

13 comments

9 comments · 4 top-level

olliej6y ago· 4 in thread

This is a pretty good example of why I don't like Haskell:

* The claim is Haskell is better because it's simpler: the final code is decidedly not simple

* You don't need to know about machine behavior: the example required numerous manual additions, e.g. explicit inlining, explicit strictness, and not using the "obvious" data structures

After all that contortion the only reason it was able to beat the C version was by using multiple cores, specifically on a 4 core machine it was only ~60% faster, and around 80% faster when parsing utf8. Better yet the C implementation actually does true utf8 parsing, via libc, so it's not inlined, whereas the "faster" Haskell code only counts the beginning of multibyte sequences.

I would argue that sans the special cases (which the example doesn't trigger), the C version is the obvious first pass implementation.

olliej6y ago

I just wrote a very dumb implementation of wc and it’s easily twice as fast.

I think the core issue is that wc is not a super optimized program. It is sufficiently fast for most purposes and so hasn’t ever been improved.

xrisk6y ago

Can you test it with GNU wc? I'm curious https://github.com/coreutils/coreutils/blob/master/src/wc.c

2 more replies

unhammer6y ago

What do you mean by true utf8 parsing?

olliej6y ago

All the Haskell version is doing is counting through bytes with the high bit set. If you look at the exciting function in the wc sources they link to it is doing way more work.

1 more reply

Noe20976y ago· 1 in thread

What a clickbait indeed. The end version is still eating 3 times more memory, but the worst is that it just doesn't work in the general case.

The way I generally use "wc" is inside a somewhat complex command line, with commands preceding it. As in "feeding characters" to it.

There is a reason why "wc" is not multithreaded: it just can't. It must work sequentially, because in the general case the input of "wc" cannot be skipped over.

This is one of the two big assumptions that are made by the author ("wc works only for files, so we can lseek") -- the second, identified by the author, being that the underlying hardware and filesystem must support concurrent access to the same location efficiently.

olliej6y ago

Don’t forget: the best case (where it can be parallel), is still only slightly faster than the single threaded version.

carapace6y ago

The "flux monoid" for word counts is pretty cool. IIRC there was a paper about doing something like that with regular expression monoids to make parallelizable RE matchers, etc. I can't recall the title at the moment.

imode6y ago

I don't consider this "beating" the C version. The new version isn't even semantically equivalent. You had to resort to multiple cores. A parallelized C version of wc would probably be even faster.

There's not much content here, IMO.

j / k navigate · click thread line to collapse

13 comments

9 comments · 4 top-level

olliej6y ago· 4 in thread

This is a pretty good example of why I don't like Haskell:

* The claim is Haskell is better because it's simpler: the final code is decidedly not simple

* You don't need to know about machine behavior: the example required numerous manual additions, e.g. explicit inlining, explicit strictness, and not using the "obvious" data structures

I would argue that sans the special cases (which the example doesn't trigger), the C version is the obvious first pass implementation.

olliej6y ago

I just wrote a very dumb implementation of wc and it’s easily twice as fast.

I think the core issue is that wc is not a super optimized program. It is sufficiently fast for most purposes and so hasn’t ever been improved.

xrisk6y ago

Can you test it with GNU wc? I'm curious https://github.com/coreutils/coreutils/blob/master/src/wc.c

2 more replies

unhammer6y ago

What do you mean by true utf8 parsing?

olliej6y ago

All the Haskell version is doing is counting through bytes with the high bit set. If you look at the exciting function in the wc sources they link to it is doing way more work.

1 more reply

Noe20976y ago· 1 in thread

What a clickbait indeed. The end version is still eating 3 times more memory, but the worst is that it just doesn't work in the general case.

The way I generally use "wc" is inside a somewhat complex command line, with commands preceding it. As in "feeding characters" to it.

There is a reason why "wc" is not multithreaded: it just can't. It must work sequentially, because in the general case the input of "wc" cannot be skipped over.

olliej6y ago

Don’t forget: the best case (where it can be parallel), is still only slightly faster than the single threaded version.

carapace6y ago

imode6y ago

I don't consider this "beating" the C version. The new version isn't even semantically equivalent. You had to resort to multiple cores. A parallelized C version of wc would probably be even faster.

There's not much content here, IMO.

j / k navigate · click thread line to collapse