First, I disagree with the definition of parallelism. So the whole argument becomes moot to begin with.
Second, the author conveniently ignores that languages with side-effects can also be automatically parallelized by their compiler, which achieves the same effect as the parallelism described for functional languages. Which makes the point of the article moot again.
In conclusion (to me at least), this article adds to the confusion rather than offer the clarity it was aiming for.
http://lambda-the-ultimate.org/node/3465
Here's van Roy's definition:
"Concurrency should not be confused with parallelism. Concurrency is a language concept and parallelism is a hardware concept. Two parts are parallel if they execute simultaneously on multiple processors. Concurrency and parallelism are orthogonal: it is possible to run concurrent programs on a single processor (using preemptive scheduling and time slices) and to run sequential programs on multiple processors (by parallelizing the calculations)."
And no, languages with side-effects cannot be automatically parallelized without changing the semantics and introducing observable nondeterminism - not nearly to the same extent as pure languages. The examples I've seen essentially work for side-effect-free subsets, which kind of misses the point. Your C compiler may be able to vectorize a loop of arithmetic operations, but as soon as your code includes an external function call the compiler has know way of knowing whether that function performs I/O or modifies global variables, so it cannot safely make parallel calls to that function.
Considering I spent years working on automatic parallelizing compilers for Fortran and C++, I'd say it is possible. I don't know why you consider the heavily computational kernels of C programs less interesting just because they don't have side effects or make IO calls.
For instance I worked on some algorithms to parallelize linked-list traversals, something that you would normally think is by definition serial.
int i = 0, j = 1, k = 2;
The processor itself can execute these in parallel if there are available resources, because writes of literals to registers don't affect each other.
So even if parallelizing compilers only work for side-effect free portions of the language, that is still a decent portion of most code. If it weren't, then out-of-order execution wouldn't give much benefit, and all cpu's would have only one execution unit (instead of 3-4) - after all it only works on side-effect free portions too (more or less).
This is on a different level (fine grained instruction level vs. coarse grained module/function level) but I'm guessing the same arguments can still apply.
I understand that attempt to automatically parallelize standard programming languages achieves some but not much improvement. The numbers I recall are less than 2 times speedup. Which I think I heard in an interview of Joe Armstrong on Software Engineering Radio, but I am not sure. Or perhaps it was an presentation by Guy Steele on parallel algorithms. Joe's message was that parallelism needs to be explicit. One of Guy's was that you need to use different algorithms allow parallel rather than serial processing.
I think we agree, that we expect to get more than (2 * one core speed) on our 10,000 core CPUs of the future.
There's a ton of overlap as far as techniques (threads, monitors, locks, actors, and so on), but those two high-level goals have very different optimal strategies, and different languages and programming paradigms have different strengths and weaknesses relative to those two. Concurrency is generally more of a high-level consequence of the core algorithm you're using; parallelism is a low-level implementation detail around how a given function or process completes. Concurrency isn't generally going to be retrofitted automatically by the compiler; parallelism can be deduced either statically or at runtime or explicitly signaled by the programmer.
Parallelism, on the other hand, is all about improving performance by making simultaneous use of multiple resources. It's entirely different from multiprogramming or multiplexing which fake the multiplicity of resources; and it's entirely different from concurrency which assumes the availability of undifferentiated resources. There are many different strategies for parallelism, but most of them assume that the resources are not undifferentiated, and thus they're very dependent on the particular details of the hardware they're running on (unlike concurrency which doesn't care). Contrary to your "high level" vs "low level" distinction, parallelism is often an extremely high-level design choice which affects the entire program structure. It has to be if it's going to maximize resource usage throughout the life of the whole program.
The interesting thing about GHC's parallelism is that it's hardware agnostic, and so you don't need to rewrite your program to use different algorithms depending on the hardware. Because it's built into the language you may want to call it "low level" (i.e. "I don't have to think about it"), but it's actually a very high-level decision (because it has major repercussions on how the runtime system works and how the language lets you say things).
A parallel program, on the other hand, is one that merely runs on multiple processors, with the goal of hopefully running faster than it would on a single CPU.
It's an incredibly vague definition that actually encompasses concurrency too..
Wikipedia puts it quite succinctly:
Parallel computing is a form of computation in which many calculations are carried out simultaneously,[1] operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently ("in parallel").
The WP article then goes on to discuss specific forms of parallelism; some concurrent, some not necessarily so.
This article appears to be so heavily focused on the idea that concurrency and parallelism are discrete and disconnected that it seems to forget there is a bigger picture.
The writer has a strong key point. Parallelism !== Concurrency (note the double =). However it's ignoring the fact that they DO intersect frequently.