Haskell version of Norvig's spelling corrector (opens in new tab)

(marcosero.com)

73 pointsmarcosero11y ago17 comments

17 comments

> I wrote this code putting brevity over readability, which is something I usually never do

Shouldn't the point of such a post be to show interesting code? I'm having trouble reading through the densely packed source.

In addition to tromp's minor nitpick, I have several major ones.

- the code is full of redundant parentheses. HLint can detect those (and many other style errors) automatically. LPaste has HLint installed so you have a linting pastebin available online. http://lpaste.net/116871

- A lot of the functions are written in a non-idiomatic way. "m >>= return . f" is "fmap", "(.)" can combine functions much more readable than Lisp stacks of parentheses.

- ByteString.Char8 is usually a wrong choice, more on that here: https://github.com/quchen/articles/blob/master/fbut.md#bytes...

- If you count to "length x" then often there's a more elegant solution that avoids calculating the length altogether. For example "splits xs = zip (inits xs) (tails xs)".

- Brevity is never better than readability.

- No top-level definitions should lack a type signature. GHC even has warnings for that (I think they start firing with -W).

- A function should do one thing and then be composed with other functions. "lowerWords" converts to words and then maps them all to lower case, for example. These are two completely different operations in one long line.

- In order of increasing generality: foldr union empty = unions = mconcat = fold

- Use pattern matching, avoid "(!!)". transposes w = [ a ++ [b0,b1] ++ bs | (a, b0:b1:bs) <- splits w] - also see https://github.com/quchen/articles/blob/master/fbut.md#head-...

- For large amounts of words that you split and concatenate again, String is probably not the right type. Text is good for dealing with such things.

- replaces w = [as ++ [c] ++ bs | (as, _:bs) <- splits w , c <- alphabet]

... and so on.

marcoseroOP11y ago

Hi, author here. I think you partially missed the main purpose of the article, which for me was just having fun by playing with a language I'm currently learning. I wasn't try to teach anything to anyone.

But I must say, thanks for the great feedback! Lots of stuff I didn't know that we'll make me write better Haskell code :)

flebron11y ago

Well, what you wrote is "The main reason I did it was to see what Haskell is capable of compared to other languages such as Python." The problem is that what you coded isn't what Haskell is capable of :)

chaoxu11y ago

It would be nice if you explicitly state you are learning Haskell in your article. If I'm a newbie and I see some of the awkward Haskell code(which I also make sometimes), I would feel discouraged. It doesn't seem Haskell is making things better.

wyager11y ago

I expect there to be lots of comments demonstrating tons of different styles of doing this stuff. Haskell is one of those languages where you can go back to your code 20 times and still find a new way of doing something.

bazzargh11y ago

all that, plus, re the question at the end of the article about chooseBest:

    chooseBest ch ws = maximumBy (compare `on` (\w -> M.findWithDefault 0 w ws))
       (S.toList ch)

this is closer to what's in Norvig's article - a single pass over the elements of 'ch', taking the one whose entry in the 'ws' set has the highest count - treating words not in the set as having a count of 0. No sort required. (maximumBy is from Data.List, on is from Data.Function)

tromp11y ago

Minor nitpick: the first real line of code

  alphabet = "abcdefghijklmnopqrstuvwxyz"

is better written as

  alphabet = ['a'..'z']

This is really syntactic sugar for

  enumFromTo 'a' 'z'

using the function

  enumFromTo :: Enum a => a -> a -> [a]

from the typeclass Enum for enumerable types, and the fact that a string (type String) is just a list of characters (type [Char]).

evincarofautumn11y ago

As long as we’re picking nits…

> I wrote this code putting brevity over readability

Overall, this is not particularly terse, for Haskell code. With all the lambdas, it looks like OCaml! For example, these are equivalent, and I find the latter clearer:

    (sortBy (\(_,c1)(_,c2) -> c2 `compare` c1))

    sortBy (flip (comparing snd))

Now, it’s not necessarily a bad thing to be explicit, but in cases such as these, it’s less repetitious to just use the standard library functions.

quchen11y ago

For reverse sorting, there's a type that does specifically that.

    sortBy (comparing (Down . snd))

See http://hackage.haskell.org/package/base-4.7.0.1/docs/Data-Or...

1 more reply

flaie11y ago

Interesting read for a Haskell newcomer like me!

Regarding the original webpage of Norvig's spelling corrector, I think it is not up to date as I remember browsing the web and finding some shorter versions in other languages.

I've shortened the Python version to 14/15 lines using some features of Python3.

wyager11y ago

Cool! Since we're suggesting changes, here's what I'd do. (Not that anything is wrong with the OP's code, just that it's good to point out all the different stylistic techniques you can adopt.)

    7. alphabet = ['a'..'z']
    8. nWords = B.readFile "big.txt" >>= return . train . lowerWords . B.unpack

or:

    8. nWords = train . lowerWords . B.unpack <$> B.readFile "big.txt"

Make `splits`, `deletes`, etc. values (not functions). `splits` has access to `w`, so there's no need to pass it as an argument 4 times (or even to pass `w` as an argument to the other functions).

    27. sortCandidates = (sortBy (flip (comparing snd))) . M.toList

codygman11y ago

I used to compose return with a series of pure functions as well, but I found that using liftM seems cleaner.

codygman11y ago

example:

    nWords = liftM (train . lowerWords . B.unpack) (B.readFile "big.txt")

There was recently a very good article[0] about practically using monads that mentioned using liftM.

However whenever using a functor instance is possible it's probably better, since functors can't do as much as monads. I'm not quite sure how much this would help/apply to this small example though.

0: http://softwaresimply.blogspot.com/2014/12/ltmt-part-3-monad...

bshimmin11y ago

Not to make any particular point, but mainly just because I fancied a bit of procrastination this afternoon, here's a CoffeeScript version (heavily leaning on Underscore): https://gist.github.com/benshimmin/2ee78c932797faadfc89

dschiptsov11y ago

Which "proves" again that programming is neither about OO nor about purity..)

dschiptsov11y ago

Where I am wrong? This is a straight-forward translation of non-OO Python code into Haskell, isn't it?

So there we can't see any "benefits" of truly-OO (original code has been written in a "functional style") or pure-functional approaches (the code has no "benefits" being converted into a pure-functional language).

Lists and Sets are "classes" in Python, but it doesn't matter, because implementation of "basic" types does not alter the behavior - sets could be implemented out of Lisp's conses.

Btw, knowing who the author is and seeing some "functional patterns" in Python code, it is very probable that original corrector has been prototyped/written in Common Lisp, then re-written in Python, and now re-written in Haskell.

The point was in an elegant algorithm and compact implementation, not in language of choice or in particular programming paradigm.

j / k navigate · click thread line to collapse

17 comments

quchen11y ago

> I wrote this code putting brevity over readability, which is something I usually never do

Shouldn't the point of such a post be to show interesting code? I'm having trouble reading through the densely packed source.

In addition to tromp's minor nitpick, I have several major ones.

- A lot of the functions are written in a non-idiomatic way. "m >>= return . f" is "fmap", "(.)" can combine functions much more readable than Lisp stacks of parentheses.

- ByteString.Char8 is usually a wrong choice, more on that here: https://github.com/quchen/articles/blob/master/fbut.md#bytes...

- If you count to "length x" then often there's a more elegant solution that avoids calculating the length altogether. For example "splits xs = zip (inits xs) (tails xs)".

- Brevity is never better than readability.

- No top-level definitions should lack a type signature. GHC even has warnings for that (I think they start firing with -W).

- In order of increasing generality: foldr union empty = unions = mconcat = fold

- Use pattern matching, avoid "(!!)". transposes w = [ a ++ [b0,b1] ++ bs | (a, b0:b1:bs) <- splits w] - also see https://github.com/quchen/articles/blob/master/fbut.md#head-...

- For large amounts of words that you split and concatenate again, String is probably not the right type. Text is good for dealing with such things.

- replaces w = [as ++ [c] ++ bs | (as, _:bs) <- splits w , c <- alphabet]

... and so on.

marcoseroOP11y ago

But I must say, thanks for the great feedback! Lots of stuff I didn't know that we'll make me write better Haskell code :)

flebron11y ago

chaoxu11y ago

wyager11y ago

bazzargh11y ago

all that, plus, re the question at the end of the article about chooseBest:

    chooseBest ch ws = maximumBy (compare `on` (\w -> M.findWithDefault 0 w ws))
       (S.toList ch)

tromp11y ago

Minor nitpick: the first real line of code

  alphabet = "abcdefghijklmnopqrstuvwxyz"

is better written as

  alphabet = ['a'..'z']

This is really syntactic sugar for

  enumFromTo 'a' 'z'

using the function

  enumFromTo :: Enum a => a -> a -> [a]

from the typeclass Enum for enumerable types, and the fact that a string (type String) is just a list of characters (type [Char]).

evincarofautumn11y ago

As long as we’re picking nits…

> I wrote this code putting brevity over readability

Overall, this is not particularly terse, for Haskell code. With all the lambdas, it looks like OCaml! For example, these are equivalent, and I find the latter clearer:

    (sortBy (\(_,c1)(_,c2) -> c2 `compare` c1))

    sortBy (flip (comparing snd))

Now, it’s not necessarily a bad thing to be explicit, but in cases such as these, it’s less repetitious to just use the standard library functions.

quchen11y ago

For reverse sorting, there's a type that does specifically that.

    sortBy (comparing (Down . snd))

See http://hackage.haskell.org/package/base-4.7.0.1/docs/Data-Or...

1 more reply

flaie11y ago

Interesting read for a Haskell newcomer like me!

Regarding the original webpage of Norvig's spelling corrector, I think it is not up to date as I remember browsing the web and finding some shorter versions in other languages.

I've shortened the Python version to 14/15 lines using some features of Python3.

wyager11y ago

Cool! Since we're suggesting changes, here's what I'd do. (Not that anything is wrong with the OP's code, just that it's good to point out all the different stylistic techniques you can adopt.)

    7. alphabet = ['a'..'z']
    8. nWords = B.readFile "big.txt" >>= return . train . lowerWords . B.unpack

or:

    8. nWords = train . lowerWords . B.unpack <$> B.readFile "big.txt"

Make `splits`, `deletes`, etc. values (not functions). `splits` has access to `w`, so there's no need to pass it as an argument 4 times (or even to pass `w` as an argument to the other functions).

    27. sortCandidates = (sortBy (flip (comparing snd))) . M.toList

codygman11y ago

I used to compose return with a series of pure functions as well, but I found that using liftM seems cleaner.

codygman11y ago

example:

    nWords = liftM (train . lowerWords . B.unpack) (B.readFile "big.txt")

There was recently a very good article[0] about practically using monads that mentioned using liftM.

However whenever using a functor instance is possible it's probably better, since functors can't do as much as monads. I'm not quite sure how much this would help/apply to this small example though.

0: http://softwaresimply.blogspot.com/2014/12/ltmt-part-3-monad...

bshimmin11y ago

dschiptsov11y ago

Which "proves" again that programming is neither about OO nor about purity..)

dschiptsov11y ago

Where I am wrong? This is a straight-forward translation of non-OO Python code into Haskell, isn't it?

Lists and Sets are "classes" in Python, but it doesn't matter, because implementation of "basic" types does not alter the behavior - sets could be implemented out of Lisp's conses.

The point was in an elegant algorithm and compact implementation, not in language of choice or in particular programming paradigm.

j / k navigate · click thread line to collapse