undefined | Better HN

0 pointsmjpt77711y ago0 comments

Can you point to a respected publication that states that is the definition of a wait-free algorithm, i.e. that other threads must help and it must cope with killing the thread externally?

By kill a thread I assume you mean use the OS to interrupt and terminate the thread from the outside while in such a simple algorithm and yet expect it to cope. Have you tried this on any java.util.concurrent classes and reported the "bugs" you have found? I'd be interested in the feedback you got. :-)

0 comments

danbruc11y ago

This is the very definition of wait-free including the case that some concurrent operations never complete.

»A wait-free implementation of a concurrent data object is one that guarantees that any process can complete any operation in a finite number of steps, regardless of the execution speeds of the other processes.« (Wait-Free Synchronization, Maurice Herlihy, 1991) [1]

As far as I can tell - I am definitely not an expert - there is no hard requirement that concurrent operations have to assist but it seems at least a common solution because undoing the partially completed work of other concurrent operations may have the consequence that they can not make progress and therefore the entire thing is no longer wait-free.

By the way, there is no requirement to explicitly kill the thread, you may as well just run out of stack space, the scheduler may decide to not assign a new time slice, cosmic rays may flip a bit in turn triggering a hardware exception because of an unknown instruction or just buggy or malicious code messing with the content of your address space. Every process can fail at any time.

[1] http://cs.brown.edu/~mph/Herlihy91/p124-herlihy.pdf

mjpt777OP11y ago

Thanks for the link.

If you look at the implementation no thread is ever stopped from completing. If a producing thread was killed mid operation by another malicious thread then the stream is broken. Other threads are never blocked. Other writers can add via a finite number of steps and the stream will back pressure when flow control kicks in. The consumer is never blocked - it simply never sees more messages. The algorithm meets the definition.

Unless I missed it that paper does not not state that the algorithm must cope with a malicious thread to be wait-free.

Try applying your thinking to this other respected MPSC wait-free queue.

http://www.1024cores.net/home/lock-free-algorithms/queues/in...

wflfof11y ago

Regarding applying the wait/lock/obstruction freedom definition to Vyukov's MPSC implementation you reference.

It's clear that the MPSC design is not wait-free, nor is it lock-free. Dmitry identifies this fact in his initial postings about that design: https://groups.google.com/forum/#!topic/lock-free/Vd9xuHrLgg...

""" Push function is blocking wrt consumer. I.e. if producer blocked in (*), then consumer is blocked too. """

Between the XCHG and the update of next. If the producing thread completes the XCHG but fails to update next, all threads will fail to make progress. "Progress" in this case being the producers transferring data to the consumer. Put another way, in this algorithm a single misbehaving thread can prevent all threads from making progress.

Regarding the idea of "maliciousness", specifically if a participant thread is "killed mid-operation". That idea is the very definition of wait-freedom: that all participants complete the algorithm in a bounded number of their own steps, irrespective of the activities of other threads.

The definitions of wait/lock/obstruction freedom are well specified. I suggest the first half of _The Art of Multiprocessor Programming_ (the revised edition!) by Herlihy and Shavit for a deep dive.

mjpt777OP11y ago

I've read the _The Art of Multiprocessor Programming_ and am happy to read it again. Maybe I have misinterpreted.

So if "killed mid-operation" must be supported then I don't see how many algorithms can be said to make progress. Take for example the Lamport SPSC queue[1], if the producer gets killed mid operation between steps 4 and 5 of the push operation. Then the data is in the queue but the consumer is blocked from ever seeing it with this line of reasoning. The Lamport SPSC queue is considered wait-free by the concurrency community I know. I base my reasoning on this. What if the producer takes a pause for a long time between steps 4 and 5 before continuing?

However if to be wait-free an algorithm must allow all other threads to continue using the data structure, not just continue making progress in other ways by being non-blocking and completing in a finite number of steps, then I stand corrected.

If wait-free must include coping with any thread being killed mid operation is there a term for being lock-free but also having all threads not block and complete in a finite number of steps for their interaction with the algorithm?

[1] - http://arxiv.org/pdf/1012.1824.pdf

1 more reply

danbruc11y ago

Two threads want to send a single block message. Writer one advances the tail pointer by 100 bytes and then dies. Writer two then advances the tail pointer by another 100 bytes, copies its message into the buffer and writes its header. The reader will see the tail pointer 200 bytes past the last processed message but no valid header and this looks exactly like a writer currently in the process of placing a 200 byte message - including the header - into the buffer. Looks to me like thread one blocked thread two indefinitely because the message of thread two is never added to the buffer. Yes, the bytes are in the buffer, but only physically. The reader can not see them, i.e. the write operation did not logically complete.

mjpt777OP11y ago

If you look at the code the reader does not read the tail. It skips forward reading the length fields. It would never see the writer two message and it never blocks. It just thinks there are no new messages.

1 more reply

saurik11y ago

What if, instead of being killed, the thread runs arbitrarily slowly (at glacial speeds) between the two critical steps? Do other threads have to wait during that time, (Note: I haven't slept in days, not did I read the algorithm as my brain isn't working well enough to understand it, but I did seem to get the gist of this back/forth, and this seemed like an important detail.)

mjpt777OP11y ago

Please read the algorithm in code. No threads ever wait. :-)

mjpt777OP11y ago

I understand the point that we need to make software robust. No single algorithm can cope with every possible failure mode. That is why we need multiple instances run on multiple nodes to be resilient. This is independent from if this is a wait-free algorithm.

danbruc11y ago

It is not wait-free if it can not handle a dying process, that is the whole point of being wait-free. The reason - again as far as I can tell as a non-expert - that wait-free algorithms are not widely used is that many of the best currently known implementations have such a large overhead that they perform worse than a good locking or lock-free implementation.

mjpt777OP11y ago

You keep saying you are not an expert but keep pushing the point :-)

Try this. A process could have a rogue thread scribbling on memory and thus corrupting it. Such a process then that has this thread, by your definition, cannot be wait-free because any data structure can be corrupted and cause algorithms to fail. This is not how the concurrency community look at this.

*added below

Even a simple LOCK XADD instruction could have its results corrupted by this rogue thread after it writes before a consumer reads it.

1 more reply

j / k navigate · click thread line to collapse

0 comments

danbruc11y ago

This is the very definition of wait-free including the case that some concurrent operations never complete.

[1] http://cs.brown.edu/~mph/Herlihy91/p124-herlihy.pdf

mjpt777OP11y ago

Thanks for the link.

Unless I missed it that paper does not not state that the algorithm must cope with a malicious thread to be wait-free.

Try applying your thinking to this other respected MPSC wait-free queue.

http://www.1024cores.net/home/lock-free-algorithms/queues/in...

wflfof11y ago

Regarding applying the wait/lock/obstruction freedom definition to Vyukov's MPSC implementation you reference.

""" Push function is blocking wrt consumer. I.e. if producer blocked in (*), then consumer is blocked too. """

The definitions of wait/lock/obstruction freedom are well specified. I suggest the first half of _The Art of Multiprocessor Programming_ (the revised edition!) by Herlihy and Shavit for a deep dive.

mjpt777OP11y ago

I've read the _The Art of Multiprocessor Programming_ and am happy to read it again. Maybe I have misinterpreted.

[1] - http://arxiv.org/pdf/1012.1824.pdf

1 more reply

danbruc11y ago

mjpt777OP11y ago

1 more reply

saurik11y ago

mjpt777OP11y ago

Please read the algorithm in code. No threads ever wait. :-)

mjpt777OP11y ago

danbruc11y ago