Edit: mods, thanks for changing the misleading title.
1. Could make at least some sense to somebody that knows zero Haskell
2. Isn't too trivial
3. Isn't leaning too heavily on libraries
4. Is at least somewhat "real"/performant
Your example doesn't address any of the constraints of the medium.
If you can think of a better example for that part of the website, it would be welcomed on the mailing list.
I've been watching people try to figure out something that isn't too weak in any of those dimensions for months and now you're going to post an article with a title calling them liars because you want more attention for your blog? What would satisfy you? Renaming the sieve function? What do we need to do to prevent people like you from writing an article like this again?
Edit: There, I fixed it and it's merged https://github.com/haskell-infra/hl/pull/114
We're calling it a filter instead of a sieve.
Anyhow, the title is kind of too much. At least, given the aforementioned discussions, we're conflicted liars.
[0]https://mail.haskell.org/pipermail/haskell-cafe/2015-April/1... [2]http://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf [3] https://github.com/haskell-infra/hl/pull/8
In Doug's email he also pointed me to this little nugget he wrote:
http://www.cs.dartmouth.edu/~doug/sieve/
The late Dennis Ritchie 'wrote the first coroutine sieve' using Unix pipes!
What a wonderful reminder of the power of Unix compositionality, which is at the heart of the laziness experiment known as Haskell.
If you have division operations, or "mod" operations, it's not the Sieve of Eratosthenes, it's just a filter.
Not the same thing at all.
And that's the problem. I've found that when this code is presented people often assume it's intended to be the Sieve of Eratosthenes, and nothing is done to preempt or prevent that misconception. As observed elsewhere, there are now several major threads, discussions, and even proper papers about this, so people are becoming aware of it.
I still meet programmers who think the version shown is intended to be the Sieve of Eratosthenes. Fortunately I now have several on-line references to point them at.
As far as I can tell, my Perl implementation at http://www.perlmonks.org/?node_id=276112 is doing something similar with similar amounts of laziness and no optimizations build it. Yet I can produce the first 50,000 primes in the time that this takes to produce the first 10,000. And nobody uses Perl for its speed!
To add another benchmarking data point, I have a simple sieve of Eratosthenes written in Nim using an array that can generate 10,000 primes in less than a millisecond.
Perl gets a lot faster if you sieve blocks at a time, using vec() to manipulate bit arrays. And I'm not surprised that an actually efficient language would be massively faster.
Here's a priority queue library for Haskell, if you'd like an example: https://hackage.haskell.org/package/pqueue
I've also seen this same thing come up when comparing implementations of quicksort in Haskell to that of other languages. They always show a short, elegant implementation in Haskell, but the issue is that it's not really quicksort as it doesn't do the sort in place.
That's one of the easiest ways to raise hackles among Haskellers -- they believe that parallelizability, not in-place sort, is the defining characteristic of quicksort.
https://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf
The Genuine Sieve of Eratosthenes
Harvey Mudd College, Claremont, CA, U.S.A. (e-mail: oneill@acm.org)
And is available on Hackage as:
https://hackage.haskell.org/package/primes-0.2.1.0/docs/Data...
The distinction is only this: when we have found a prime p and are eliminating numbers accordingly, do we consider ourselves only to spend time directly enumerating the multiples of p and crossing them off? Or do we consider ourselves as running through the entire list and going "Ok, ok, cross, ok, ok, cross, ok, ok, cross" (for, for example, p = 3), thus spending time traversing through multiples and non-multiples alike? So to speak, do we jump from "cross" to "cross", or do we walk along through the "ok"s inbetween?
In the former case, each new candidate is worked on only in proportion to its number of prime factors; in the latter case, each new candidate is worked on in proportion to all smaller primes. The former is the more efficient way of generating primes; the latter is (essentially) the ubiquitous, naive approach.
But I don't think one can say the traditional understanding of the Sieve of Eratosthenes draws a strong distinction between these two! Traditional accounts would not explicate any difference between "Jump directly from 'cross' to 'cross' " and "Walk from 'cross' to 'cross', saying 'ok' to everything inbetween". It's not a distinction anyone was traditionally worried about. Eratosthenes certainly didn't.
So I think both of these are deserving of the name "Sieve of Eratosthenes". They're just different approaches to that sieve.
In either case, we say there are primes, to each prime we associate the set of its multiples, we merge these sets into the set of composites, and close the loop of our recursion by noting that the primes are to be the complement of these composites. The difference is, in some sense, arising just from how we represent and manipulate subsets of the naturals (as pertaining to the set of multiples of each prime, as well as their merger into the totality of composites): either as streams of increasing naturals [efficient], or as streams of "In"s and "Out"s [less efficient].
The variable name? It's a sieve.
I'm not saying you shouldn't know the best algorithms for a problem, as this article clearly demonstrates the effectiveness of a more efficient solution. In fact, having better understanding of algorithms and computational complexity makes it safer for you to accurately assess the trade-offs you'll be making by picking slower but simpler code or faster code with more complexity. There is more to consider than just big-O when writing software.
Note: What I'm saying most strongly applies to software with functionality that doesn't exist yet. If there's a reliable library with what you're seeking (such as a way to generate primes), it's usually best to use it.