- Your CPUCycle class doesn't actually do backoff. atomic_thread_fence(m_o_relaxed) is a no-op, and at least one compiler (recent clang versions) will optimize the loop away entirely. You can fix by reading from a volatile variable inside the loop.
- Similarly, nanosleep on linux will busy-loop instead of returning to the scheduler for small delays; this seems not to be the goal for some of the backoff schemes
- The BoundedQueue assumes that a futex call functions as a "full memory fence". I'm not terribly familiar with the linux kernel internals, but I don't believe this to be the case. On Power, for instance, it appears to not always generate a sync instruction, instead using lwsync or isync. I haven't looked at the queue closely enough to tell if this poses a correctness issue, but it definitely looks weaker than atomic_thread_fence(m_o_seq_cst).
- The SpinLock uses a test-and-set loop; test-and-test-and-set will often scale much better because of the reduced cache coherency traffic required.
- Similarly, using NoBackoff as the default backoff strategy seems to impose a likely performance penalty on users who aren't familiar with the need for backoff.
- The FutexLock code looks a little screwy to me. Why backoff if you're likely going to do a futex operation anyway? Why is the Lock()'s unconditional discarding of the exchanged value safe? It seems like you have the potential for missed wakeups. But it's late and I'm tired, so I'm probably just missing something.
I don't mean to be critical, it's just that code reviews only ever contain nits. I'm excited that somebody is working on locking primitives that are more efficient than glibc pthreads (whose performance I'm not a huge fan of).
- I have a TATAS lock implementation elsewhere: https://github.com/ademakov/MainMemory/blob/master/src/base/...
Could add it to Evenk as well.
- Exchanged value in FutexLock is not discarded, it is used in the conditions of the `if` and `while` statements. In both cases the returned value of 0 means that we've got the lock and the next thing executed will be the `break` statement. If either value of 1 or 2 is returned then we need to proceed with spinning. The `if` condition relies on short-circuiting of || operator, so the exchange is only done if the `value` variable is equal to 1. If it were equal to 0 then the CAS in the for loop would have returned true leading to an early exit.
The idea of using backoff before futex is picked from this article: http://locklessinc.com/articles/mutex_cv_futex/
The author suggests to spin a little in unlock as well, but I didn't go that far.
For an example of this technique, check out https://github.com/facebook/folly/blob/master/folly/detail/F... . That code only implements wake and wait, but it would be straightforward to extend it to the other futex operations.