Yes the type system is a great help when dealing with thread based concurrency/paralellism, but not so much for the scenarios where we are already using type safe managed languages in distributed computing across multiple processes, most likely not even written in the same language.
Which by the way, given the recent security exploits, is much preferable to go back to, instead of relying on threads for parallelism.
The large majority of comments I read about Rust safety claims are very explicit that Rust provides memory safety and data-race freedom.
You seem to be arguing that these comments are leaving out important information by not clarifying that "This does not mean that Rust prevents all bugs, e.g., you can still write programs with logic bugs, race conditions, dead locks, DDoS, ... and the list goes on and on... in Rust".
If somebody does not know what that "memory safety" and "data-race freedom" is, they should just ask, but if they don't, and they make an incorrect assumption, then its kind of their fault, not the fault of whoever wrote such a comment for not mentioning all the things that these two properties do not guarantee.
Also, it is actually trivial to build Rust programs that prevent some race-conditions, like deadlocks (just a cargo flag away), so if anything, I'd complain that people being "too correct" by restricting themselves to "data-race freedom" are leaving out some important advantages of Rust.
When someone says “Rust is thread-safe”, Rustaceans in one common voice reply, “Rust is data-race-safe”. After that, folks get into a good conversation about what the difference is, and learn about deadlocks and other issue in threads.
Generally speaking though, it does mean that multi-threaded programming in Rust is memory-safe, and won’t introduce a significant class of bugs that you see in C and C++, where so many exploits come from.
Claiming "Fearless Concurrency" but ignoring synchronization bugs, at a low-level in concurrent data structures or at a high-level in concurrent systems is misleading. People will be fearlessly introducing livelocks.
To stay on topic: https://docs.adacore.com/spark2014-docs/html/ug/en/source/co... and https://blog.adacore.com/gnat-community-2020-is-here are worth a read.
Additionally:
> The Ada concurrency model is based on the notion of task, a unit of concurrency that represents an independent thread of control. All, the tasks and the mechanisms for inter-task communication and synchronization, are introduced at language level in order to allow building safer programs. As an illustration, Ada 95 introduced protected objects to allow controlling how data is accessed, thus eliminating race conditions. Additionally, in 1997, Burns et al. introduced the Ravenscar profile, a subset of the Ada programming language that allows high integrity applications to be analyzed for their timing properties by pursuing three main goals: (1) ensuring predictable execution, (2) simplifying the runtime support, and (3) eliminating constructs with high overhead. The limitations imposed by the Ravenscar profile have an inevitable impact in the complexity of correctness analyses, e.g. tasks can only communicate through shared objects (tasks entries are not allowed, so the rendezvous mechanism cannot be used), tasks are assumed to be non-terminating, and tasks and protected objects cannot be dynamically allocated.
> Along the same lines, SPARK, a language that subsets Ada to enable the formal verification of programs, eliminates race conditions by forcing any global object referenced from a task to be marked as Part Of that task, or be a synchronized object.
I think I have a comment about Ada's concurrency where I get into it in more detail (IIRC).
It requires the use of unsafe on the Rust side, and you better code it very carefully.
And that's... a problem. Because a lot of code that actually wants concurrency in the same memory space is code that also wants high performance. And Rust serves these applications well in other areas, where "Just as fast as C" is true.
But Rust isn't as fast as "C" here. It's demonstrably slower. Is that a good thing? I honestly don't know.
It's just a hard problem. But you can't eliminate the idea of "shared locked memory" by fiat. It's a real requirement, and Rust has little to say about it in its standard library or (IIRC, though I'd have to go check) its memory model and locking primitives. It's sort of a no-mans land of "super unsafe outside the box" code. And that's a pity.
Recursive state change calls can end up applying multiple changes either in the wrong order or not at all.
https://doc.rust-lang.org/1.30.0/book/second-edition/ch16-00...
These sound like quite bold claims to me.
> By leveraging ownership and type checking, __many__ concurrency errors are compile-time errors in Rust rather than runtime errors.
Languages like Pony, and the actor model more generally, were designed to prevent a whole class of concurrency bugs (same as Rust) - what's interesting is that, like this post shows, they also have unexpected problems with message causality.
It makes me wonder if there's an even better abstraction out there which will solve this kind of concurrency bug, and what will arise after these are solved.
> Consider for example when you are conducting a search in parallel, say to find the shortest route. To avoid fruitless search, you might want to keep a cell with the shortest route you've found thus far. This way, when you are searching down some path that's already longer than this shortest route, you can just stop and avoid wasted effort...Now in this case, we really WANT to see results from other threads interjected into our execution!
The article is an example of "task parallelism", where you have different threads doing different things on different (local) data. Having those threads communicate via message-passing is a nice way to model workflows crossing thread boundaries, and in that setup you usually want some sort of deterministic behavior related to the ordering of operations(for a given workflow).
On the other hand, I think the Rayon example is a good case of "data parallelism", where you have workers doing "the same thing" on "the same (shared-)data".
In such a case, perhaps locks and shared-data are a better fit than message-passing, and indeed what you're after is not necessarily some sort of sequential ordering of worker operations. However you still probably want the ability to make other deterministic assertions about the behavior of the workers.
I wrote another article highlighting some of these "different determinism" at https://medium.com/@polyglot_factotum/rust-concurrency-five-...
But the problem I think could be solved is where programmers have a causal dependency, but they don't encode it in the program. The next generation of concurrent languages could have the killer correctness features for solving causality problems.
> All message passing is causal. (Not casual!)
The word 'causal' is hyperlinked to https://courses.cs.vt.edu/~cs5204/fall00/causal.html which contains:
> The purpose of causal ordering of messages is to insure that the same causal relationship for the "message send" events correspond with "message receive" events. (i.e. All the messages are processed in order that they were created.)
In a simplified version:
A -1-> B
B -2-> C
A ---3------> C
The human expectation is that 2 happens "before" 3, because that's how it looks. The reality is that 1 happens before 3, and 1 happens before 2, but 2 and 3 have no causal relation - and the runtime is free to rearrange them in any order.That's why the solution is essentially
A -1-> B
B -2-> C
B -3-> C
Because then 2 really happens before 3.You could go the erlang way, which is to give up. Unexpected concurrency bug? Crash, then restart from a sane state. It's highly effective.
Your VM is carved up into failure domains that reduce the blast radius of such a crash to the minimum reasonable collection of processes. This is what frameworks like pony don't get. The reason why erlang actors (aka microservices) are so effective is that they declare, define, and decouple failure domains, and assign responsibility over these failure domains to specific bits of code and ultimately specific teams.
Look at unintuitive situations in special relativity. The unituitiveness is essentially a concurrency bug in our mental model of how the universe works, as opposed to how it actually works. Almost all of the issues are due to mistaken ideas about what it means for two events to happen "at the same time" or in a given order.
I suspect programming concurrency will continue to create bugs because humans ideas of time don't actually match up to reality. Obviously not usually due to relativity, but there are all sorts of complexities in the underlying systems we program for that intuition doesn't capture well.
Erlang and OTP may make it easier to have a better flow, but it's not a magic bullet. Plenty of ways to write race conditions and get unexpected message orderings (especially on a multinode system)
There are many properties of multithreaded/distributed systems that are emergent and that current static compilers don't address.
In distributed systems it's becoming more common to use TLA+[1] to model the system and ensure that we don't have bugs at the design level (rather than just the system level).
In particular for message-passing, I'm quite curious about the state of the research on Communicating Finite State Machines[1] and Petri Nets[2] and if there are actual products that could be integrated in regular languages and are not just of academic interest only.
I expect that Ada/Sparks[1] has some support for this pattern though Ada multithreading story is not there yet (?).
I'm quite excited about the future Z3 prover integration in Nim[5] and the potential to write concurrent programs free of design bugs[6] to address both ends of the concurrency bugs:
- bugs in the low-level concurrent data structure via formal verification (which is something I used int he past to prove that my design was bug-free and that the deadlock I experienced was a condition variable bugs in Glibc[7] and I had to directly use futexes instead.
- bugs in the high-level design, implemented as event-loop / finite state-machine reacting to events as received by channels
Are there people working on model checkers or formal verification in Rust? Like the author, i think addressing memory bugs.the borrow checker is only a part of the "fearless concurrency" story and preventing design bugs would be extremely valuable.
[1]: https://lamport.azurewebsites.net/tla/tla.html
[2]: https://en.wikipedia.org/wiki/Communicating_finite-state_mac...
[3]: https://en.wikipedia.org/wiki/Petri_net
[4]: https://www.adacore.com/sparkpro
[5]: https://nim-lang.org/docs/drnim.html
[6]: https://github.com/nim-lang/RFCs/issues/222
[7]: https://github.com/mratsim/weave/issues/56
And shameless plug, my talk at the NimConf on debugging multithreading programs (data structure and design): https://www.youtube.com/watch?v=NOAI2wH9Cf0&list=PLxLdEZg8DR...
Slides: https://docs.google.com/presentation/d/e/2PACX-1vRq2wvd4q_mo...
I will definitely steal some ideas from Coyote to drive Nim formally verification story of concurrent code forward.
On the Rust side, i feel like Loom has a lot of potential https://github.com/tokio-rs/loom and I'm keeping tabs on them.
Also, your username looked familiar -- you're the creator of Arraymancer and Laser!!
You wrote the most performant, memory-efficient fast-math libs in existence currently (as far as I know) and your work on data-parallel operations is super fascinating.
I also expect Halide[1], Taichi[2], DaCe[3] and Tiramisu[4] to be able to more easily reach those levels of performance hence why I want to add a compiler to Laser (implemented at compile-time via Nim macros) but prerequisite is excellent and composable multithreading support which OpenMP didn't provide.
In statistics / data analysis, my PCA though (Principal Component Analysis) beats all implementations I've seen from Scipy, Facebook and R.
[2]: https://github.com/taichi-dev/taichi
Deep embedding of specifications with model checkers is a really neat idea. LiquidHaskell is a really interesting approach and it would be neat to see if Rust could adopt something similar.
Go has that problem, too. The claim for Go was that concurrency would be done via message passing. But, in the examples in the original manual, and often in practice, what's being passed on the channel is some kind of reference to shared data. In a garbage collected language, this is OK as long as the sender only sends new objects and never touches them after sending. Still, it's easy to screw up, and the language does not help.
In both Go and Python, there's not much thread-local data that the language will not let you export to another thread. That's a lack. Shared state that isn't obviously shared state is a recipe for trouble caused by later maintenance programming. This is one of Rust's strengths. Sharing has to be explicit.
Shutdown is always hard in an event queue environment. Go makes it harder by making a write to a closed channel kill the whole program. So if a receiver decides it needs to shut down, it can't just close the receive end of the channel and let the sender hit a send error. Negotiations are required to get the sender shut down.
And thus Borg was born. Too hard to write processes that handle situations gracefully, so let's just let the process die and spin up a new one immediately.
Literally was looking at a patch set that was proposed recently that involved several channels in Go and one of the codepaths was os.Exit(1) for lack of any meaningful mechanism to sanely clean everything up. Sadly, this was in library code, so would be totally unexpected by the caller and probably not desirable whatsoever.
Yes. Basic Dijkstra producer-consumer one-way queues are simple and elegant. But the error cases are hell.
OK, so if the consumer wants to make the producer stop, it sets an abort flag. That's a shared variable. (Does it need a lock? Does it need a lock or fencing on ARM, which has weaker memory concurrency guarantees than x86?)
So, producer sees the abort flag set and closes its end of the channel, right? Maybe not. Maybe the channel is full, and the producer is blocked on a send. OK, so, when the consumer aborts, it has to enter a drain loop, doing receives and discarding the result until it gets an EOF/channel close. That will unblock the producer, and then the consumer can exit.
But what if the producer doesn't have anything to send right now? Now the consumer is stuck waiting for the producer to send more data, and the consumer can't exit.
Should the consumer close the producer's end of the channel after setting the abort flag? Now there's a race condition between checking the abort flag and sending. Add a critical section that covers both actions? Now the producer can be blocked while in a critical section. This creates a potential deadlock if the consumer wants to set the abort flag while the producer is inside the critical section. Now both ends are blocked and there's a deadlock.
But it all looked so simple in theory!
I was wondering about this recently. Why did Go choose this design? What's wrong with "sending and receiving can both fail if the channel is closed"?
Ps I love Typescript.