GCC 4.7 adds transactional memory extensions for C/C++ (opens in new tab)

(gcc.gnu.org)

178 pointsrandombit14y ago32 comments

32 comments

18 comments · 7 top-level

iam14y ago· 4 in thread

I was hoping for more information on how they implement it.. there's nothing in there about which hardware facilities they use, and they say that at worst STM is a global lock for the process.

Hopefully they're at least using some kind of compiler analysis to only use the same lock across transactions if it's touching the same memory addresses (pessimistically of course)?

exDM6914y ago

GCC will probably not use any specific hardware facilities, which means this is probably going to be implemented with regular atomic operations.

Within a transaction block, the results of all reads are stored (to a local, hidden variable). When the transaction is about to finish, all reads are repeated and if any of them yields a different result, the transaction is restarted. When the transaction is committed, there will likely be some kind of a global lock (that will be held for a very small time).

As GCC probably doesn't require any kind of threading or locking, it's most likely that the write lock will be a spinlock using an atomic read-modify-write and some kind of yield instruction (monitor/mwait on new cpu's, pause on older).

As far as I can see, there really aren't lots of other methods to implement STM, especially from within the C compiler.

roxtar14y ago

The article hints that GCC uses a combination of HTM and STM, if HTM is available.

scott_s14y ago

The paper I linked to claims they have an STM, and a hybrid HTM-STM system if hardware support is available.

eis14y ago

Since you can't do all reads in one atomic instruction and you also need to make it atomic with the write (CAS), wouldn't that still require a lock for the whole operation?

1 more reply

srean14y ago· 3 in thread

On a related note, the cilkplus branch of Gcc 4.7 contains the cilk work-stealing multithreading runtime and language extension that intel has open sourced. http://software.intel.com/en-us/articles/intel-cilk-plus-spe...

Exciting times ahead.

lukesandberg14y ago

cilk is definitely cool, but i don't think it makes designing parallel programs any easier, it just removes a lot of boilerplate. I have worked with similar infrastructures (not the language support) and found it to be extremely difficult. I looked into cilk and I don't think that the language support would have been a game changer. Does anyone have experience with cilk? What have been your experiences?

dadkins14y ago

How do you know all of this if you haven't tried it? I think you'll find Cilk a lot more polished and refined then you might at first realize. Having language level support for fine-grained parallelism together with a provably efficient scheduler is a huge win.

But you'll have to be more specific with your complaints. What kind of parallel programs are you trying to design? What similar infrastructures have you worked with?

1 more reply

ot14y ago

Cilk is definitely more than syntactic sugar.

How would you implement work-stealing in pure C?

1 more reply

camperman14y ago· 2 in thread

Is this the first step towards a GCC that would have all the features of Clojure? That would be incredibly useful to me for one - I love Clojure but just cannot make any sense of what the JVM tells me when I screw up.

dandrews14y ago

The short answer is no, STM is only a small (and some smart people suggest overrated) part of Clojure infrastructure.

But the Deep Thinkers in the Clojure community feel your stack trace pain, and now that 1.3 is in the can it seems to me that there was renewed enthusiasm at Conj for doing something about debugging clarity. You shouldn't give up hope yet.

moomin14y ago

In the meantime, I'd recommend installing clj-stacktrace as a leiningen plugin. It's far from perfection, but it's an improvement. There's a technomancy article describing how to do it.

signa1114y ago· 1 in thread

i have a fundamental question regarding stm in general: for 'manual' locking, we need to worry about dead-locks, for stm, i feel live-lock would be more sinister, and extremely hard to debug/reason about. not to mention the fact that, it would make client code non-composable as the transaction size or the system load increases.

or am i missing something ? thanks for your insights !

chalst14y ago

Two points to bear in mind:

1. If live locks are a problem coming from load, rather than bad interactions between components, you are likely to have a choice between (i) pessimistic, where most threads do nothing vs. (ii) optimistic, where most threads do work that gets thrown away. In practice, optimistic tends to be faster, because it is not better to do nothing than do worthless work and the committer is working with the results of successful computations, where the locking algorithm doesn't know which computations might not work out;

2. Extremely hard to reason about is just how it is with threads. I haven't done enough concurrent programming to really say, but the optimistic commit model seems to be more intuitive than the pessimistic lock mode. Peyton Jones makes this point forcefully in Beautiful Concurrency http://research.microsoft.com/pubs/74063/beautiful.pdf

CGamesPlay14y ago· 1 in thread

This is pretty cool. Is it done naively and using one global lock, or is it be more intelligent? Can GCC identify what memory location require locking for a given transaction, and lock just those? What is the granularity?

roxtar14y ago

It's smarter than a global lock (come on!). I would suggest reading the article which describes the implementation [1].

[1]: http://www.velox-project.eu/velox-transactional-memory-stack (courtesy scott_s)

scott_s14y ago

I think this is the implementation (pdf article available): http://www.velox-project.eu/velox-transactional-memory-stack

They point to the Velox project, which has many published papers. But this paper has Ulrich Drepper of Red Hat as a co-author. Since Drepper is active in glibc, I can imagine he worked with them on integration. The notation in the article also looks like what's shown on the website.

There's plenty of other work that could have gone into this implementation: http://www.velox-project.eu/publications There's a full TM system that tries to use idle cores or SMT threads (also known as hyperthreads) for the transactions, called STM2. Then some papers on lock-free techniques, static analysis, and a benchmark suite. There's also what looks like a direct response infamous "STM: Why Is It Only a Research Toy?" (http://queue.acm.org/detail.cfm?id=1454466) article: http://www.velox-project.eu/why-stm-can-be-more-research-toy

I don't know for sure, of course. The STM2 paper published at PACT of this year also looks interesting. Email me if you'd like to read it.

Edit: the paper I linked to at the top says it's implemented in gcc.

bretthoerner14y ago

http://nickclifton.livejournal.com/9501.html

"The support implements and tracks the Linux variant of Intel's Transactional Memory ABI specification document. Currently this is at revision 1.1, (May 6 2009). For more information see:

http://software.intel.com/en-us/articles/intel-c-stm-compile... "

j / k navigate · click thread line to collapse

32 comments

18 comments · 7 top-level

iam14y ago· 4 in thread

I was hoping for more information on how they implement it.. there's nothing in there about which hardware facilities they use, and they say that at worst STM is a global lock for the process.

Hopefully they're at least using some kind of compiler analysis to only use the same lock across transactions if it's touching the same memory addresses (pessimistically of course)?

exDM6914y ago

GCC will probably not use any specific hardware facilities, which means this is probably going to be implemented with regular atomic operations.

As far as I can see, there really aren't lots of other methods to implement STM, especially from within the C compiler.

roxtar14y ago

The article hints that GCC uses a combination of HTM and STM, if HTM is available.

scott_s14y ago

The paper I linked to claims they have an STM, and a hybrid HTM-STM system if hardware support is available.

eis14y ago

Since you can't do all reads in one atomic instruction and you also need to make it atomic with the write (CAS), wouldn't that still require a lock for the whole operation?

1 more reply

srean14y ago· 3 in thread

Exciting times ahead.

lukesandberg14y ago

dadkins14y ago

But you'll have to be more specific with your complaints. What kind of parallel programs are you trying to design? What similar infrastructures have you worked with?

1 more reply

ot14y ago

Cilk is definitely more than syntactic sugar.

How would you implement work-stealing in pure C?

1 more reply

camperman14y ago· 2 in thread

dandrews14y ago

The short answer is no, STM is only a small (and some smart people suggest overrated) part of Clojure infrastructure.

moomin14y ago

In the meantime, I'd recommend installing clj-stacktrace as a leiningen plugin. It's far from perfection, but it's an improvement. There's a technomancy article describing how to do it.

signa1114y ago· 1 in thread

or am i missing something ? thanks for your insights !

chalst14y ago

Two points to bear in mind:

CGamesPlay14y ago· 1 in thread

roxtar14y ago

It's smarter than a global lock (come on!). I would suggest reading the article which describes the implementation [1].

[1]: http://www.velox-project.eu/velox-transactional-memory-stack (courtesy scott_s)

scott_s14y ago

I think this is the implementation (pdf article available): http://www.velox-project.eu/velox-transactional-memory-stack

I don't know for sure, of course. The STM2 paper published at PACT of this year also looks interesting. Email me if you'd like to read it.

Edit: the paper I linked to at the top says it's implemented in gcc.

bretthoerner14y ago

http://nickclifton.livejournal.com/9501.html

"The support implements and tracks the Linux variant of Intel's Transactional Memory ABI specification document. Currently this is at revision 1.1, (May 6 2009). For more information see:

http://software.intel.com/en-us/articles/intel-c-stm-compile... "

j / k navigate · click thread line to collapse