1. A sets g1 = A
2. A reads g2 = none
3. A reads g1 = A
4. B sets g1 = B
5. B reads g2 = none
6. B reads g1 = B
7. A sets g2 = A (read in #3)
8. A reads g1 = B
9. A retries
10. A reads g2 = A
11. A succeeds
12. B sets g2 = B (read in #6)
13. B reads g1 = B
14. B reads g2 = B
15. B succeeds
You also are missing memory fences, which are needed to ensure proper ordering of memory operations. (Otherwise the processor is free to speculate reads and delay writes between cores.) See https://www.kernel.org/doc/Documentation/memory-barriers.txt for a thorough explanation.
if step #9 happens then #10 will be skipped by the "if (state->g2 == SPIN_LOCK_NO_LOCKER) {"
what am i missing?
Thread B sets g1 to B (line 31), thread B enters the "if" statement (because A hasn't yet written g1 to g2, line 32), thread B reads the value of g1 (half of line 33) and writes that value to g2 (the other half), then does its tests and returns OK.
Thread A sets g2 to A (the other half of line 33). Thread A does its tests and notices that g1 != g2. Thread A returns to the top of the loop and notices that g2 == A (line 22). Thread A returns OK.
This is assuming that reads are strongly ordered between processes and cores, and that code executes in-order, none of which is true.
I can see a few problems with this code, please consider the following constructive criticism.
for (;;) {
if (state->g2 == locker_id) {
return;
}
while (state->g2 != SPIN_LOCK_NO_LOCKER) {
if (spin_wait_callback != NULL) {
spin_wait_callback();
}
You don't need this 'continue', the loop will do that just fine without it. continue;
}
The following four lines are problematic: state->g1 = locker_id;
if (state->g2 == SPIN_LOCK_NO_LOCKER) {
Imagine the consequences of a task switch right here. state->g2 = state->g1;
This if does nothing. if (state->g1 == locker_id) {
And you're going to do the following if without all the extra
conditionals above when you do not return from here the
next time you iterate through the loop. So all that extra
testing of state->g1 above does nothing. if (state->g2 == locker_id) {
return;
}
}
}
}
I think I can see what you're trying to do here but once you
fix the basic problems with the code and you test it under
very high load locking and unlocking a resource you'll find
that fairly soon two threads will be accessing the resource
at the same time.Try a very tight loop that locks a counter, does a load, increment, store on it and then unlocks the counter again. Keep separate tallies per thread and use a third thread to keep track of the expected sum and the actual sum.
That should show you reasonably quickly why atomicity in the basic operation of acquiring a lock is a base requirement.
Locking issues almost never show up when you are running lightly loaded on a single core. But the lack of ensuring all cores see the same information, that race conditions can't occur and that instructions are executed in the right order will show up with a terrible certainty under load on multiple cores.
if thread 1 is pre-empted by thread 2 and thread 2 is able to get to the same point (line 33) then G1 will be equal to 2. both threads will assign G2 = 2 and thread 2 will win the race -- thread 1 will loop and begin to spin again.
have i missed something?
Run your compiler -S and save the output, then try to reason through the consequences of a context switch in between every instruction with multiple threads contending for the lock. And have a look at the manual for your CPU to learn about out-of-order execution and cache coherency between multiple cores / cpus.
Some interesting reading on the subject:
https://www.kernel.org/doc/Documentation/memory-barriers.txt
If you want to do this entirely in userspace and you must avoid atomic instructions for some reason consider abstracting out your locking to a separate process and use IPC, first come first serve.
if (state->g2 == SPIN_LOCK_NO_LOCKER) {
state->g2 = state->g1;
if (state->g1 == locker_id) {
if (state->g2 == locker_id) {
return;
}
}
}
You need to insert memory barriers between the writes and the reads. Without, on systems with multiple cores, other cores may not see the state changes in program order, which will make it possible for multiple threads to conclude they won the race.http://en.wikipedia.org/wiki/Memory_barrier http://en.wikipedia.org/wiki/Weak_consistency
What is a `machine level exchange instruction`?
C/c++ doesn't have many guarantees about operation ordering. http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLo...
The Linux kernel guys wrote an in-depth explanation of why volatile doesn't work for this use-case here: https://www.kernel.org/doc/Documentation/volatile-considered...
real_test.c: In function ‘main’:
real_test.c:14:10: error: too many arguments to function ‘spin_allocate’
spin-lock.h:20:32: note: declared here
real_test.c:16:3: error: too few arguments to function ‘spin_lock’
spin-lock.h:19:13: note: declared here
Easy to fix, just annoying. void
spin_lock(struct spin_lock_state *state, int locker_id,
void (*spin_wait_callback)()) {
for (;;) {
if (state->g2 == locker_id) {
return;
}
while (state->g2 != SPIN_LOCK_NO_LOCKER) {
if (spin_wait_callback != NULL) {
spin_wait_callback();
}
continue;
}
state->g1 = locker_id;
if (state->g2 == SPIN_LOCK_NO_LOCKER) {
state->g2 = state->g1;
if (state->g1 == locker_id) {
if (state->g2 == locker_id) {
return;
}
}
}
}
}
void
spin_unlock(struct spin_lock_state *state, int locker_id)
{
state->g2 = SPIN_LOCK_NO_LOCKER;
}