The real way to generate a list of primes in Haskell (opens in new tab)

(garrisonjensen.com)

76 pointsgarrisonj11y ago42 comments

42 comments

30 comments · 16 top-level

LukeHoersten11y ago· 4 in thread

That's an overly broad and sensational title. The simplified example on the website certainly isn't representative of all Haskell programmers being liars. It's unfortunate because the article about mis-implementations of the Sieve of Eratosthenes is decent.

Edit: mods, thanks for changing the misleading title.

dang11y ago

Fortunately there's a decent subtitle and we can just use that.

LukeHoersten11y ago

Thanks a lot.

garrisonjOP11y ago

I know. I just wanted a provocative title.

coolsunglasses11y ago

It's annoying particularly because Haskellers are perfectly aware of the problems with the example, but have struggled on the mailing list to come up with something which:

1. Could make at least some sense to somebody that knows zero Haskell

2. Isn't too trivial

3. Isn't leaning too heavily on libraries

4. Is at least somewhat "real"/performant

Your example doesn't address any of the constraints of the medium.

If you can think of a better example for that part of the website, it would be welcomed on the mailing list.

I've been watching people try to figure out something that isn't too weak in any of those dimensions for months and now you're going to post an article with a title calling them liars because you want more attention for your blog? What would satisfy you? Renaming the sieve function? What do we need to do to prevent people like you from writing an article like this again?

Edit: There, I fixed it and it's merged https://github.com/haskell-infra/hl/pull/114

We're calling it a filter instead of a sieve.

4 more replies

jkarni11y ago· 3 in thread

Funnily, there's haskell-cafe thread[0], a github issue, and even a paper, about this (and I think maybe reddit got involved too).

Anyhow, the title is kind of too much. At least, given the aforementioned discussions, we're conflicted liars.

[0]https://mail.haskell.org/pipermail/haskell-cafe/2015-April/1... [2]http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf [3] https://github.com/haskell-infra/hl/pull/8

ky311y ago

Doug McIlroy emailed me afterward to say that he's never heard of its being derided as not a real sieve. Certainly, no-one of his generation would do so, I imagine. But as the article proves, younger folks are more playful.

In Doug's email he also pointed me to this little nugget he wrote:

http://www.cs.dartmouth.edu/~doug/sieve/

The late Dennis Ritchie 'wrote the first coroutine sieve' using Unix pipes!

What a wonderful reminder of the power of Unix compositionality, which is at the heart of the laziness experiment known as Haskell.

pervycreeper11y ago

I don't get it. I simply see the example as an elegant way of introducing the power of the language. This kind of pedantry only drives curious people away.

chrisdone11y ago

Most people know it's not a real sieve of Eratosthenes and just a demonstration of Haskell concepts but there will always be people excited to educate others regardless of the context.

ColinWright11y ago· 2 in thread

This has always bothered me about most implementations of the Sieve of Eratosthenes. Namely, what they produce isn't it.

If you have division operations, or "mod" operations, it's not the Sieve of Eratosthenes, it's just a filter.

Not the same thing at all.

jamesrom11y ago

It is a sieve. No one claimed it was the Sieve or Eratosthenes.

ColinWright11y ago

Interestingly, if you look closely you'll notice that I never claimed that this referenced code was the Sieve of Eratosthenes. It seems you just assumed that I thought it was claiming to be, when in fact I know it isn't.

And that's the problem. I've found that when this code is presented people often assume it's intended to be the Sieve of Eratosthenes, and nothing is done to preempt or prevent that misconception. As observed elsewhere, there are now several major threads, discussions, and even proper papers about this, so people are becoming aware of it.

I still meet programmers who think the version shown is intended to be the Sieve of Eratosthenes. Fortunately I now have several on-line references to point them at.

1 more reply

btilly11y ago· 2 in thread

Why is Haskell so slow at this?

As far as I can tell, my Perl implementation at http://www.perlmonks.org/?node_id=276112 is doing something similar with similar amounts of laziness and no optimizations build it. Yet I can produce the first 50,000 primes in the time that this takes to produce the first 10,000. And nobody uses Perl for its speed!

delluminatus11y ago

I'm not sure, maybe it's just the overhead of having so many function calls and Set accesses? My guess is that the Haskell one could be made quite a bit more efficient if you used a low-level mutable array.

To add another benchmarking data point, I have a simple sieve of Eratosthenes written in Nim using an array that can generate 10,000 primes in less than a millisecond.

btilly11y ago

That is why I compared to an implementation in Perl that was likewise making lots of excess function calls and storing things very inefficiently. This was as close to apples to apples as I could get without putting much energy forward.

Perl gets a lot faster if you sieve blocks at a time, using vec() to manipulate bit arrays. And I'm not surprised that an actually efficient language would be massively faster.

thegeomaster11y ago· 2 in thread

I might be missing something obvious so excuse me, but why not use a heap as a priority queue? It has O(1) find-minimum and O(log n) insert, which is better than a set which is probably some kind of self-balancing BST (I don't speak Haskell).

coolsunglasses11y ago

You can use mutable data structures in Haskell but we strive to avoid it except where strictly necessary. To find the "Haskell" version it suffices to add either "Haskell" or "persistent" to the search query for a data structure.

Here's a priority queue library for Haskell, if you'd like an example: https://hackage.haskell.org/package/pqueue

garrisonjOP11y ago

You could use a heap. I use Data.set because it is in the standard libraries, and it's close enough.

rifung11y ago· 1 in thread

As much as I hate the title of this post, I think the author has a point.

I've also seen this same thing come up when comparing implementations of quicksort in Haskell to that of other languages. They always show a short, elegant implementation in Haskell, but the issue is that it's not really quicksort as it doesn't do the sort in place.

gohrt11y ago

> it doesn't do the sort in place.

That's one of the easiest ways to raise hackles among Haskellers -- they believe that parallelizability, not in-place sort, is the defining characteristic of quicksort.

te11y ago

You can get another ~3x speedup by implementing the three lines of code comprising the "simple wheel" described at bottom of page 8 of the referenced O'Neill paper.

gohrt11y ago

This result was more famously previously published by Melissa E. O’Neill as

https://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf

The Genuine Sieve of Eratosthenes

Harvey Mudd College, Claremont, CA, U.S.A. (e-mail: oneill@acm.org)

And is available on Hackage as:

https://hackage.haskell.org/package/primes-0.2.1.0/docs/Data...

Chinjut11y ago

I agree that this is a more efficient way of generating primes in Haskell than the typical Haskell 101 approach. However, I disagree with the idea that the Haskell 101 approach does not deserve to be called an implementation of the "Sieve of Eratosthenes".

The distinction is only this: when we have found a prime p and are eliminating numbers accordingly, do we consider ourselves only to spend time directly enumerating the multiples of p and crossing them off? Or do we consider ourselves as running through the entire list and going "Ok, ok, cross, ok, ok, cross, ok, ok, cross" (for, for example, p = 3), thus spending time traversing through multiples and non-multiples alike? So to speak, do we jump from "cross" to "cross", or do we walk along through the "ok"s inbetween?

In the former case, each new candidate is worked on only in proportion to its number of prime factors; in the latter case, each new candidate is worked on in proportion to all smaller primes. The former is the more efficient way of generating primes; the latter is (essentially) the ubiquitous, naive approach.

But I don't think one can say the traditional understanding of the Sieve of Eratosthenes draws a strong distinction between these two! Traditional accounts would not explicate any difference between "Jump directly from 'cross' to 'cross' " and "Walk from 'cross' to 'cross', saying 'ok' to everything inbetween". It's not a distinction anyone was traditionally worried about. Eratosthenes certainly didn't.

So I think both of these are deserving of the name "Sieve of Eratosthenes". They're just different approaches to that sieve.

In either case, we say there are primes, to each prime we associate the set of its multiples, we merge these sets into the set of composites, and close the loop of our recursion by noting that the primes are to be the complement of these composites. The difference is, in some sense, arising just from how we represent and manipulate subsets of the naturals (as pertaining to the set of multiples of each prime, as well as their merger into the totality of composites): either as streams of increasing naturals [efficient], or as streams of "In"s and "Out"s [less efficient].

jamesrom11y ago

Where is it implied on the Haskell home page that it's the Sieve of Eratosthenes?

The variable name? It's a sieve.

http://en.m.wikipedia.org/wiki/Sieve_theory

codygman11y ago

Upvote despite the article being titled "Haskell programmers are liars". Great decision to use subtitle admins. +1 to bitemyapp for submitting a PR to change "sieve" to "primeFilter" and avoid this in the future.

pathikrit11y ago

My Scala one using a Java BitSet: https://github.com/pathikrit/scalgos/blob/9bd0dd81df52a5a410...

wyc11y ago

Even if you know the most optimal solution in terms of computational complexity, it might not be the best thing to put into your code base. There are a lot of other things to balance including (but not limited to) readability, maintainability, probability of correctness, and of course developer time. Considering these multiple dimensions is essential to good engineering.

I'm not saying you shouldn't know the best algorithms for a problem, as this article clearly demonstrates the effectiveness of a more efficient solution. In fact, having better understanding of algorithms and computational complexity makes it safer for you to accurately assess the trade-offs you'll be making by picking slower but simpler code or faster code with more complexity. There is more to consider than just big-O when writing software.

Note: What I'm saying most strongly applies to software with functionality that doesn't exist yet. If there's a reliable library with what you're seeking (such as a way to generate primes), it's usually best to use it.

petermora11y ago

Looks the same as in Clojure's lazy-seq documentation https://clojuredocs.org/clojure.core/lazy-seq

1 more reply

fibo11y ago

I wrote this with osfameron's help https://gist.github.com/fibo/1203756

sjbr11y ago

Shame on them!

j / k navigate · click thread line to collapse

42 comments

30 comments · 16 top-level

LukeHoersten11y ago· 4 in thread

Edit: mods, thanks for changing the misleading title.

dang11y ago

Fortunately there's a decent subtitle and we can just use that.

LukeHoersten11y ago

Thanks a lot.

garrisonjOP11y ago

I know. I just wanted a provocative title.

coolsunglasses11y ago

It's annoying particularly because Haskellers are perfectly aware of the problems with the example, but have struggled on the mailing list to come up with something which:

1. Could make at least some sense to somebody that knows zero Haskell

2. Isn't too trivial

3. Isn't leaning too heavily on libraries

4. Is at least somewhat "real"/performant

Your example doesn't address any of the constraints of the medium.

If you can think of a better example for that part of the website, it would be welcomed on the mailing list.

Edit: There, I fixed it and it's merged https://github.com/haskell-infra/hl/pull/114

We're calling it a filter instead of a sieve.

4 more replies

jkarni11y ago· 3 in thread

Funnily, there's haskell-cafe thread[0], a github issue, and even a paper, about this (and I think maybe reddit got involved too).

Anyhow, the title is kind of too much. At least, given the aforementioned discussions, we're conflicted liars.

[0]https://mail.haskell.org/pipermail/haskell-cafe/2015-April/1... [2]http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf [3] https://github.com/haskell-infra/hl/pull/8

ky311y ago

In Doug's email he also pointed me to this little nugget he wrote:

http://www.cs.dartmouth.edu/~doug/sieve/

The late Dennis Ritchie 'wrote the first coroutine sieve' using Unix pipes!

What a wonderful reminder of the power of Unix compositionality, which is at the heart of the laziness experiment known as Haskell.

pervycreeper11y ago

I don't get it. I simply see the example as an elegant way of introducing the power of the language. This kind of pedantry only drives curious people away.

chrisdone11y ago

Most people know it's not a real sieve of Eratosthenes and just a demonstration of Haskell concepts but there will always be people excited to educate others regardless of the context.

ColinWright11y ago· 2 in thread

This has always bothered me about most implementations of the Sieve of Eratosthenes. Namely, what they produce isn't it.

If you have division operations, or "mod" operations, it's not the Sieve of Eratosthenes, it's just a filter.

Not the same thing at all.

jamesrom11y ago

It is a sieve. No one claimed it was the Sieve or Eratosthenes.

ColinWright11y ago

I still meet programmers who think the version shown is intended to be the Sieve of Eratosthenes. Fortunately I now have several on-line references to point them at.

1 more reply

btilly11y ago· 2 in thread

Why is Haskell so slow at this?

delluminatus11y ago

To add another benchmarking data point, I have a simple sieve of Eratosthenes written in Nim using an array that can generate 10,000 primes in less than a millisecond.

btilly11y ago

Perl gets a lot faster if you sieve blocks at a time, using vec() to manipulate bit arrays. And I'm not surprised that an actually efficient language would be massively faster.

thegeomaster11y ago· 2 in thread

coolsunglasses11y ago

Here's a priority queue library for Haskell, if you'd like an example: https://hackage.haskell.org/package/pqueue

garrisonjOP11y ago

You could use a heap. I use Data.set because it is in the standard libraries, and it's close enough.

rifung11y ago· 1 in thread

As much as I hate the title of this post, I think the author has a point.

gohrt11y ago

> it doesn't do the sort in place.

That's one of the easiest ways to raise hackles among Haskellers -- they believe that parallelizability, not in-place sort, is the defining characteristic of quicksort.

te11y ago

You can get another ~3x speedup by implementing the three lines of code comprising the "simple wheel" described at bottom of page 8 of the referenced O'Neill paper.

gohrt11y ago

This result was more famously previously published by Melissa E. O’Neill as

https://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf

The Genuine Sieve of Eratosthenes

Harvey Mudd College, Claremont, CA, U.S.A. (e-mail: oneill@acm.org)

And is available on Hackage as:

https://hackage.haskell.org/package/primes-0.2.1.0/docs/Data...

Chinjut11y ago

So I think both of these are deserving of the name "Sieve of Eratosthenes". They're just different approaches to that sieve.

jamesrom11y ago

Where is it implied on the Haskell home page that it's the Sieve of Eratosthenes?

The variable name? It's a sieve.