Parallelism /= Concurrency (opens in new tab)

(ghcmutterings.wordpress.com)

36 pointsmaxtilford16y ago15 comments

15 comments

9 comments · 1 top-level

alain9404016y ago· 8 in thread

This article has a few flaws.

First, I disagree with the definition of parallelism. So the whole argument becomes moot to begin with.

Second, the author conveniently ignores that languages with side-effects can also be automatically parallelized by their compiler, which achieves the same effect as the parallelism described for functional languages. Which makes the point of the article moot again.

In conclusion (to me at least), this article adds to the confusion rather than offer the clarity it was aiming for.

mbrubeck16y ago

You may "disagree" with this definition, but it's one that is commonly used by computer scientists. So it's a definition that is useful to know if you plan on doing or studying any research in this field. For example, see section 4.3 of this paper by Peter van Roy (author of the well-known Concepts, Techniques, and Models of Computer Programming):

http://lambda-the-ultimate.org/node/3465

Here's van Roy's definition:

"Concurrency should not be confused with parallelism. Concurrency is a language concept and parallelism is a hardware concept. Two parts are parallel if they execute simultaneously on multiple processors. Concurrency and parallelism are orthogonal: it is possible to run concurrent programs on a single processor (using preemptive scheduling and time slices) and to run sequential programs on multiple processors (by parallelizing the calculations)."

And no, languages with side-effects cannot be automatically parallelized without changing the semantics and introducing observable nondeterminism - not nearly to the same extent as pure languages. The examples I've seen essentially work for side-effect-free subsets, which kind of misses the point. Your C compiler may be able to vectorize a loop of arithmetic operations, but as soon as your code includes an external function call the compiler has know way of knowing whether that function performs I/O or modifies global variables, so it cannot safely make parallel calls to that function.

alain9404016y ago

languages with side-effects cannot be automatically parallelized without changing the semantics and introducing observable nondeterminism

Considering I spent years working on automatic parallelizing compilers for Fortran and C++, I'd say it is possible. I don't know why you consider the heavily computational kernels of C programs less interesting just because they don't have side effects or make IO calls.

For instance I worked on some algorithms to parallelize linked-list traversals, something that you would normally think is by definition serial.

2 more replies

grogers16y ago

The hardware itself automatically parallelizes execution of your program to some degree. When you say something like:

int i = 0, j = 1, k = 2;

The processor itself can execute these in parallel if there are available resources, because writes of literals to registers don't affect each other.

So even if parallelizing compilers only work for side-effect free portions of the language, that is still a decent portion of most code. If it weren't, then out-of-order execution wouldn't give much benefit, and all cpu's would have only one execution unit (instead of 3-4) - after all it only works on side-effect free portions too (more or less).

This is on a different level (fine grained instruction level vs. coarse grained module/function level) but I'm guessing the same arguments can still apply.

artsrc16y ago

What is your information on the performance of code produced by automatically parallel compilers?

I understand that attempt to automatically parallelize standard programming languages achieves some but not much improvement. The numbers I recall are less than 2 times speedup. Which I think I heard in an interview of Joe Armstrong on Software Engineering Radio, but I am not sure. Or perhaps it was an presentation by Guy Steele on parallel algorithms. Joe's message was that parallelism needs to be explicit. One of Guy's was that you need to use different algorithms allow parallel rather than serial processing.

I think we agree, that we expect to get more than (2 * one core speed) on our 10,000 core CPUs of the future.

akeefer16y ago

Whatever terminology you'd like to choose, I think it's still a useful distinction to be aware of and to consider. To rephrase the article . . . there are two reasons to execute two things in parallel. 1) To reduce latency by executing two operations at the same time instead of in serial, as in processing web server requests. 2) To increase the speed of execution of a single operation by utilizing multiple processor cores at once.

There's a ton of overlap as far as techniques (threads, monitors, locks, actors, and so on), but those two high-level goals have very different optimal strategies, and different languages and programming paradigms have different strengths and weaknesses relative to those two. Concurrency is generally more of a high-level consequence of the core algorithm you're using; parallelism is a low-level implementation detail around how a given function or process completes. Concurrency isn't generally going to be retrofitted automatically by the compiler; parallelism can be deduced either statically or at runtime or explicitly signaled by the programmer.

winterkoninkje16y ago

I think that rephrasing confuses the distinction. Concurrency is neither about performance nor about latency, it's about a conceptual organization to separate concerns into different flows of control. It's a particular kind of modularity, that's it. After you have this conceptual organization it's possible to co-opt it to improve latency (via time-slicing) or to improve performance (via running on multicore systems), but neither of those is the goal. Both are just consequences of combining the idea of concurrency with other ideas like multiprogramming and parallelism.

Parallelism, on the other hand, is all about improving performance by making simultaneous use of multiple resources. It's entirely different from multiprogramming or multiplexing which fake the multiplicity of resources; and it's entirely different from concurrency which assumes the availability of undifferentiated resources. There are many different strategies for parallelism, but most of them assume that the resources are not undifferentiated, and thus they're very dependent on the particular details of the hardware they're running on (unlike concurrency which doesn't care). Contrary to your "high level" vs "low level" distinction, parallelism is often an extremely high-level design choice which affects the entire program structure. It has to be if it's going to maximize resource usage throughout the life of the whole program.

The interesting thing about GHC's parallelism is that it's hardware agnostic, and so you don't need to rewrite your program to use different algorithms depending on the hardware. Because it's built into the language you may want to call it "low level" (i.e. "I don't have to think about it"), but it's actually a very high-level decision (because it has major repercussions on how the runtime system works and how the language lets you say things).

maxtilfordOP16y ago

I'm curious. How would you define parallelism?

ErrantX16y ago

I think the main problem is in the posters definition:

A parallel program, on the other hand, is one that merely runs on multiple processors, with the goal of hopefully running faster than it would on a single CPU.

It's an incredibly vague definition that actually encompasses concurrency too..

Wikipedia puts it quite succinctly:

Parallel computing is a form of computation in which many calculations are carried out simultaneously,[1] operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently ("in parallel").

The WP article then goes on to discuss specific forms of parallelism; some concurrent, some not necessarily so.

This article appears to be so heavily focused on the idea that concurrency and parallelism are discrete and disconnected that it seems to forget there is a bigger picture.

The writer has a strong key point. Parallelism !== Concurrency (note the double =). However it's ignoring the fact that they DO intersect frequently.

3 more replies

j / k navigate · click thread line to collapse