Go study java.util.concurrent. It's one of the absolute best libraries ever written by some of the smartest programmers I have ever seen.
The primary question is "Do I really need to wait or do I just need to be consistent?" 90% of the time the answer is that consistent is good enough.
Lock-free data structures are not a panacea. They don't always do as well as locks in the face of contention. However, if you have that much contention, congratulations, you have an actual spot you really need to optimize.
By default, though, lock-free data structures protect you from so much fail it's ridiculous. I don't dread concurrent programming if I have a good lock-free data structure library.
That having been said, if you really have to wait (normally for hardware access), then you MUST do certain things. Your "lock" MUST be as small as possible--if it isn't "lock with timeout", "single small action that always goes to completion even if error occurs", "unlock"--YOU HAVE FAILED. START OVER. Also, note the "timeout" portion of the lock. "Timeout" MUST be handled and IS NOT NECESSARILY AN ERROR.
Now, these don't get all the situations. People who need "transactions" have hard problems. People who have high contention have hard problems.
However, I can count the number of times I genuinely needed to deal with contention or transactions on one hand and still have two fingers left over.
Whereas, I have lost count of the number of times that I cleared out all manner of bugs simply by switching to a lock-free data structure.
They don't compose.
You can't use them to, for example, implement a bank account system where you need to atomically remove from one account and add to another, unless there is special support for that in the implementation - and there can't be special support for everything built in.
This is exactly the kind of thing that hangs everybody up when programming concurrency.
Most people just want "multiple tasks executing simultaneously that occasionally have to exchange data". However, everybody beats them over the head with "must always have exactly consistent state".
These are two very different problems.
Also, I keep finding bugs in lock free structures. That's annoying.
Not really. Most lock free stuff relies on a single compare-and-swap or memory barrier instruction. Even the cheap microcontrollers have them these days.
> Also, I keep finding bugs in lock free structures. That's annoying.
I recommend you leave programming now. :) I have yet to find a library that doesn't have bugs.
I was writing some highly concurrent code and initially wrote it with the jdk's lock-free containers, but found that GC was about 5-10x worse (depending on the workload) than simply wrapping the non-concurrent equivalent in a lock.
That seems a bit high, but I can believe it under heavy contention.
But, you measured, and then optimized. Which is what people should be doing but aren't.
The problem with locks is that people don't want one thread per object, and instead use one lock per object. This has the effect of making all object operations synchronous, which turns some potential race-conditions into deadlocks.
Now, in the face of contention, they don't scale as well as a lock. However, this falls under "optimize when you have data".
The one advantage of threads is that the overhead is lower when operating at low concurrency. But it's like algorithmic complexity, people only care about growth in complexity not about the initial offset.
Beyond what point? Every single company that handles petabytes of data, starting with Google, seems to be scaling just fine with thread-based concurrency.
And why shouldn't they? Thread-based concurrency has decades of study behind it, it's very well understood, the tooling is terrific (IDE's even tell you ahead of time when deadlocks can happen and when they do, they can tell you exactly why) and the performances are unmatched.
I'd say thread-based concurrency is going to be around for a while, as opposed to the many fads that come and go trying to replace it, starting with actor-based concurrency and transactional memory.
If you had like 100+ cores (just guessing) and several of them tried to access or write to a specific shared memory location, they would spend a lot of time busy-waiting for each other to finish (assuming you're using mutexes).
Maybe using semaphores could work but your code will end up looking like a mess.
With process-based concurrency, each process can only rely on its own memory pool, so that does use-up more memory, but processes are fully independent from each other (fully parallel) so no time is wasted busy-waiting for anything.
You can have "process-based concurrency... simple" or "good async support" but you can't have both... good async support becomes not simple. "Good async support" in a single process just means you pretty much have virtual threads anyhow, you just don't formally label them. That's why all the "good async" languages are basically structuring all their "good async" support so that you can write code that looks and behaves... virtually exactly like threads.
The problem with threads isn't the threads... it's the sharing patterns. Lots of nested locks is insane. But that's not the only way to do it. Even in Go it isn't that hard to do things sanely, and things like Erlang and Rust make it even easier.
If you've still only used those languages with "good async support", you really ought to try one of the modern languages that are the real competition. I have not heard very many people become fluent in one of those and then want to go back.
Ironically, the fastest model today on typical multi-core silicon looks a lot like old school single-process, single-core event-driven models that you used to see when servers actually had a single core and no threads. One process per physical core, locked to the core, that has complete ownership of its resources. Other processes/cores on the same machine are logically treated little different than if they were on another server. As a bonus, it is very easy to distribute software designed this way.
People used to design software this way back before multithreading took off, and in high-performance computing world they still do because it has higher throughput and better scalability than either lock-based concurrency or lock-free structures by a substantial margin. It has been interesting to see it make a comeback as a model for high concurrency server software, albeit with some distributed systems flavor that was not there in the first go around.
Take, for example, a simple texturing fragment shader in GLSL. You're not going to copy the entire texture to every single GPU unit; it might be a 4096x4096 texture you're rendering only a dozen pixels of. Rather, you take advantage of the caching behavior of the memory hierarchy to have each shading unit only cache the part it needs. This is what shared memory can do for you: it enables you to use the hardware to dynamically distribute the data around to maximize locality.
However, in these models you rarely move data between cores because it is expensive, both due to NUMA and cache effects. Instead, you move the operations to the data, just like you would in a big distributed system. This is the part most software engineers are not used to -- you move the operations to the threads that own the data rather moving to the data to the threads (traditional multithreading) that own the operations. Moving operations is almost always much cheaper than moving data, even within a single server, and operations are not updatable shared state.
This turns out to be a very effective architecture for highly concurrent, write heavy software like database engines. It is much faster than, for example, the currently trendy lock-free architectures. Most of the performance benefit is much better locality and fewer stalls or context switches, but it has the added benefit of implementation simplicity since your main execution path is not sharing anything.
I have seen some real doosies writing multithreaded code. We had a relatively simple data analysis project that took in spectrum measurements from a piece of hardware, logged them, did some basic visualizations, and allowed for controlling the hardware. Each of these functions ran in one or more threads. Imagine my surprise when I saw lots of uses of CreateThread but nary a call to WaitForSingleObject or even EnterCriticalSection. I think there may have been a Boolean flag to "coordinate" a pair of producer/consumer threads.
The other side - configuring the hardware - is my bread and butter, and there are a few simple abstractions that are pretty old now ( I have been using them since the late '80s ) that will help greatly.
Sadly, for inexplicable economic reason(s), these are less well known year by year.
Also - for extra points, think about how you'd do that in one thread. Betcha can... although multiprocessor can be pretty cool. Now write it to where it doesn't matter if it's one or multiple threads...
If you need shitstains in your underwear use locks and semaphores.
Threading with locks isn't going to go away any time soon no matter how religiously one states their opposition to it. Take the case of mobile or indeed desktop app programming:
• Memory usage is important
• Real-time responsiveness is important
• Avoiding slow operations on the main thread is important
Some consequences of these constraints is that if you have a simple actor like design with a GUI thread (frontend) and a backend thread (network, other expensive operations) you can easily write crappy software. If the GUI needs a bit of data and needs it fast because the user just navigated to a new screen, sending a message to the backend actor and waiting whilst it finishes off whatever task it's doing isn't going to cut it. You need fine grained access to a subset of the data being managed by the backend, and you need it now, without yielding to some other thread that might not get scheduled quickly. And you need to avoid delays due to duplicating a large object graph then garbage collecting that immutable copy or (worse) running out of RAM and having your app be killed by the OS.
In some kinds of web server I've worked on, you're serving a large mostly-but-not-quite immutable data store. That data set must be held in RAM and you cannot have one copy for each thread because that'd not fit into the servers you use. And the data set must be hot-updateable whilst the server is running. You cannot just code around these requirements, they're fundamental to the product.
You can sometimes accept doubling your memory usage so the serving copy can be immutable whilst a new copy is created and updated. but sometimes that just makes you more expensive than your competitors.
There are lots of types of programming where for performance, cost or other reasons, you simply cannot say "shared state is hard so we will never do it". You just have to bite the bullet and do it.
You get one guy, who is seemingly very smart, and he says basically "Don't do multithreading, it's very hard. Only an elite few, such as me, can do this right, so most of you out there DON'T DO IT!"
It's bullshit. Mainly because it's no harder than anything else, and has just as much pitfalls as every other type of programming. Yes, to a certain degree multithreading is hard, but it's not rocket science. But PROGRAMMING is hard. Not just multithreaded programming. There's nothing very special about multithreaded programming that should scare off people from trying it. Sure, you might fuck up, but that's
For example, our entire company was almost completely brought down a few months ago by our "architect" implementing a feature so poorly that it caused massive system instability. What was this feature? It essentially boiled down to a 1 or a 2. Customer accounts were tagged with either a 1 or a 2, an it's supposed to take a different code path for each, but he made it so fucking complicated and he didn't do his due diligence, the entire weight of his code cause significant downtime, and a customer that accounts for 60% of our revenues almost walked. And none of this is rocket science.
Of course, I worked at another company where one engineer thought "oh, asynchronous APIs are faster than synchronous APIs" so they implemented the entire API asynchronously. Of course, that required mutexes on the server side. And then more mutexes. And it got to the point where the performance was hell because of the unintended consequences of trying to make things faster. You would write a new API and the server would barf saying "You took the locks in the wrong order" but there was no indication of you ever doing anything wrong. It was a mess. So I get what the OP is saying, but it's not specific to just multithreadedness. I bet the same programmer would have made a mess of a single-threaded app as well. They are just shitty or careless programmers.
If you're careful, multithreaded programming is helpful and you can see some significant performance boosts from it. But like every other paradigm in programming, don't overuse it. A judicious use of simple multithreaded programming might help a lot, but there are few apps that benefit from an extremely complex system with hundreds of threads, massive amounts of mutexes, etc.
First, there's the type that's hard and should probably be avoided except by supergeniuses. This involves big hairy lock graphs where locks are held across complex operations that may involve other locks, and swirling dependencies of doom. This shit is nasty, and I completely agree that you must be "this tall" to be trusted with it.
The other kind of multi-threaded programming is the simple kind, where you need to have a threadsafe interface to a module, the locking is dead simple if you know what you're doing, and your lock graph has about three states, all of which clear in constant time and have no dependencies. In this case, there is no excuse for not having this basic competency, and the attitude that we should all just throw up our hands and never hope to write multi-threaded code again is massively counterproductive. This shit is not brain-bendingly hard, it just takes a small amount of practice and discipline.
Let's stop pretending all multi-threaded programming is wizardry.
The sign is near the ceiling. It's not a question of some people being taller than others. Nobody is that tall - not even dbaron (pictured in the photo), who is one of Mozilla's three Distinguished Engineers.
Carefully balancing swirling dependencies of doom doesn't make you a great programmer, at least not in the world of large-scale systems. Choosing the right design to avoid or eliminate those swirling dependencies is much more important.
It all comes down to this: You can't reliably find concurrency bugs via testing. You have to prove they don't exist or you will never know. "Don't do it!" is one way of proving that multithreading bugs don't exist.
But I find it annoying when people act like it is always down to choice. Sometimes we need shared mutable state for logical reasons. We can only avoid it where it is an implementation detail.
Everywhere else I use Elixir, and I write multi-process code and I don't think twice about it.
And I never run into problems.
I'm really feeling like people keep choosing tools that haven't solved the problem, or even tried to, and then thinking that the problem is perennial.
It was solved a long time ago by erlang.
So while some people seem to claim Erlang/Elixir may have solved the problem, they shouldn't lose sight of that there are other domains than their own. But still, for a long time there has been support for message-passing systems of different kinds even in languages like C and C++ and its siblings though, just not integrated into the language.
(I love Erlang/LFE personally, but wouldn't write a 3D shooter in it)
For example, thread 1 produces a value, and writes it into a member variable of a class. Thread 2 uses that value, but does not synchronize with thread 1 to make sure that it's been produced. But that's fine, because thread 1 finishes before thread 2 uses the value. That is, thread 1 almost always finishes first. But if it doesn't (if it loses the race), then chaos happens - chaos that is very hard to reproduce or debug.
See Therac-25 for why this is quite bad.
But queue-and-message is superior in very many ways. Done carefully, you have only the queue as a shared structure. If message processors are non-blocking then the entire system is non-blocking. Deadlock can be statically determined by examination of the message vectors.
And last but certainly not least, in the debugger, all important state is in the message or local variables in a message processor. No enormous stacks to dive through, trying to find who did what to whom. Simple single-threaded message processors have straightforward logic. And a message-aware operating system can make the work queues transparent.
Rust has a rational approach to shared data protection. Shared data is owned by a mutex, and you must borrow that mutex to access the data. You can have N read-only borrowers, or one read-write borrower. The borrow checker checks that at compile time. This gives us an enforceable way to think about who can access what.
This is how sane locking code works in C or C++.
The critical issue is that we have a lot of code already written which doesn't respect this simple rule.
90% of the time, that C/C++ code needs to be re-architected to make the locks either more comprehensive (fix races) or less (performance).
Rewriting in Rust generally achieves that, because it is a re-architecting and refactoring step with concurrency in mind.
And if done neatly, this leaves no room for the next guy to come in and undo parts of that design, because there's violating that takes more work than keeping it - the wrong approach is suddenly the harder one to get to, unlike C/C++.
You basically can't write safe abstractions in C++ in the face of pointers.
"Still in their infancy"? That's basically a description of Erlang's concurrency model, almost three decades old now.
Is there a concurrency equivalent of Spencer's law -- something along the lines of "Those who do not understand Erlang are doomed to reinvent it"?
They're not entirely wrong, though. Actor model and CSP are in their infancy w.r.t. mainstream usage, even though they seem to be working really well.
Then Erlang/OTP gives you all sorts of other perks besides concurrency.
He makes it quite clear that they (i.e. mozilla) have tried to multithread things many times and the resulting complexity has led to bit rot and increased bugginess. In that context your boasting seems inflated and inane.
I feel like a blacksmith who's been told horseshoes are impossible to make.
document.getElementById("contentpane").style.width = "100%"Edit: yeah no, still working just fine. Downvoting should not be used when confirm a problem exists or not. What's the deal today Hacker News?
This article, and those like it, all state that the problem with multi-threading and synchronization is inherent to the programing paradigm/language/architecture you're using:
> "Buggy multi-threaded code creates race conditions, which are the most dangerous and time-consuming class of bugs in software"
> "because the traditional synchronization primitives are inadequate for large-scale systems."
Ok. Fair enough, now tell us why that is so.
I get quite annoyed when the author then proceeds to turn it all around by saying this:
> "Locks don’t lend themselves to these sorts of elegant principles. The programmer needs to scope the lock just right so as to protect the data from races, while simultaneously avoiding (a) the deadlocks that arise from overlapping locks and (b) the erasure of parallelism that arise from megalocks. The resulting invariants end up being documented in comments:"
> "And so on. When that code is undergoing frequent changes by multiple people, the chances of it being correct and the comments being up to date are slim."
Implying that the real problem with locks/threading/synchronization is actually communication, proper documentation discipline, programmer skill (soft and hard).
Of-course I'm not saying that the process of using primitive synchronization methods can't be abstracted over to make it easier to write _proper_ multi threaded code. It's just that this really feels like subjective politicking very much like the aversion to (proper use of) goto() in C/C++ code.
[1] http://paninij.org/ [2] https://github.com/hridesh/panini
It took some measure of work to do so , but people ran ObjecTime code on bare metal.
Pannini looks very nice.
What are the odds that it could be ported to a restricted use of C++ language, I wonder?
Note: Ada's concurrency strategy also prevented quite a few types of errors. They're described in this article [3] on ParaSail, a language designed for easy concurrency that goes much further.
[2] http://cme.ethz.ch/publications/
[3] http://www.embedded.com/design/programming-languages-and-too...
1) Because it has STM baked in and there is a core library for CSP.
2) Because it is a lisp so adding foreign syntax is as simple as a library and doesn't need to be a language extension.
There's your problem. If you are going to use locks, you need a wider view of the system than you get at the source-code level. It is doable, but there is a big impedance mismatch between this approach to software development and agile methods.
He's not wrong.
Modern real world example: Golang authors designed net library in a such way, that everyone who uses it has to think about concurrent access to shared mutable states. Which is hard and unnecessary. Event loops never had this problem, but for some reason got labeled "non idiomatic" by Golang folks. So I had to implement event loop myself.
This is the same paradigm as MPI, the message parsing interface. Using it, you also get for free the ability to deploy your "threaded" code in distributed memory architectures. But any person who had just a bit of experience with this standard can tell you how tedious is to develop a parallel code with it. Maybe this is product of the paradigm or just the verbosity of the API (see for example: http://www.mpich.org/static/docs/v3.1/www3/MPI_Alltoallv.htm...).I wish there was some sort of OpenMP or Intel TBB equivalent for MPI to ease out the pain.
[1] - http://www.greenteapress.com/semaphores/downey05semaphores.p...