I'd like to try out Akka and Elixer in the future.
Part of the problem with Python ecosystem is the insular mind set of its proponents. Python fanboys have no interest in going and seeing whats on the other side. So the platform has become a bit of an echo chamber with Pythonistas declaring their clunky approaches the industry best.
You can see this by looking at how little love a CSP solution for python gets [https://github.com/futurecore/python-csp] verses the enormous buy-in it's more popular frameworks receive.
I like to point at Facebook's use of Haskell as a good example of being successful in this space http://community.haskell.org/~simonmar/papers/haxl-icfp14.pd... It would be disingenuous to suggest that Haskell is good in all situations, but if there was one place where it should be used, this it.
¯\_(ツ)_/¯
The biggest reason for this is not that necessarily that I think it has absolutely the best concurrency model, but that it's the most consistent one. Nearly all libraries are written for the model, which means they assume multithreaded access, blocking IO (reads/writes) and no callbacks. As a result most libraries are interoperable without problems.
Erlang/Elixir should have similar properties - however I haven't used it.
Javascript has a similar property because at least everything assumes the singlethreaded environment and concurrency through callbacks (or abstraction of them like promises and async/await on promises). I also like the interoperability and predictability here. But sometimes nested callbacks (even with promises) lead to quite a big of ugly code. And calling "async methods" is not possible from "sync methods" without converting them to async first (which could mean some big refactoring). So I prefer the Go style in general.
The worst thing from my point of view are all the languages that do not have a standard concurrency model, e.g. C++, Java, C#, and according to this article also Python. Most of them have several libraries for (async) IO which can be beautiful by themselves but won't integrate into remaining parts of the application without lots of glue code. E.g. boost asio is nice, but you need a thread with an EventLoop. If your main thread is already built around QT/gtk you now need another thread and then have 2 eventloops which need to interact. Some question for Java frameworks, e.g. integrating a Netty EventLoop in another environment (Android, ...). In these languages we then often get libraries which are not generic for the whole language but specific to a parent IO library (works with asio, works with asyncio, ...) and thereby some fragmented ecosystems.
A standard question that also always arises in these "mixed-threaded" languages when you have an API which takes a callback is: From which thread will this callback be invoked? And if I cancel the operation from a thread, will it guarantee that the callback is not invoked. If you don't think about these you are often already in bug/race-condition land.
I never understood why people tout Go's goroutine feature so much. You can have it in literally any systems language.
It's like saying that indoor plumbing is no big deal-- it's just liquid moving through a pipe. Well yes. Yes, it is. But if you don't have plumbing in your neighborhood, or a sewage treatment plant in your city, you can't fake it by fooling around in your garage. And frankly, it's not going to smell like a rose.
I never understood why people tout Go's goroutine feature so much. You can have it in literally any systems language.
There are two big reasons for it.
Firstly, goroutines are extremely lightweight. "Traditional" threading in C, C++, and Java means native OS threads, which are comparatively expensive. Sure, fiber/coroutine libraries exist for these languages, but they are far from common (and, the only fiber library for Java that I know of, Quasar, came after Go).
Secondly, Go's ecosystem encourages CSP-style message-passing, rather than "traditional" memory-sharing. This is channels, not goroutines, but they make working with goroutines very nice. This is less concrete than the first reason; you certainly can implement message-passing in any of the other languages' threading styles. But empirically, it doesn't happen as often. A factor in this is also that, unfortunately, many CS curricula don't discuss CSP, which means that Go's use of this is the first exposure many programmers have to it.
Aside from that, personally I've used both Akka and plain Scala with Futures, as well as node with Promises, bare callbacks and async (though I've not tried fibers). I find Promises and Futures are the perfect balance between simplicity of use and the benefits of using the Async model. There's no need to reason about threads, as they abstract away the actual async implementation, and the interface they expose is very easy to reason about.
From the perspective of uniformity and availability, while C# provided asynchronicity via callbacks before the introduction of Tasks in the 4.5 release of the .NET Framework, all the core libraries that used callback-style async (as well as some that had been strictly synchronous-only) were updated with Task-based overloads, so there are no problems with Task-based async being inconsistently available. Additionally, adoption of Task-based async in third-party libraries has been high, so it's relatively uncommon to encounter code that does not support it.
From the perspective of code productivity, it's hard to get much better than simply adding the async and await keywords where necessary. As a very simple example, consider a typical server application that receives requests via HTTP, processes them via an HTTP call to another service as well as a database call, and then returns an HTTP response. The sync code (blocking with a thread-per-request model) might look something like this:
void handleRequest(HttpRequest request) {
var serviceResult = makeServiceCallForRequest(request);
var databaseResult = makeDatabaseCallForRequest(request);
sendResponse(constructResponse(request, serviceResult, databaseResult));
}
In order to make that same process async (non-blocking with a dynamically-sized thread pool handling all requests), the code would look like this: async Task handleRequestAsync(HttpRequest request) {
var serviceResult = await makeServiceCallForRequestAsync(request);
var databaseResult = await makeDatabaseCallForRequestAsync(request);
await sendResponseAsync(constructResponse(request, serviceResult, databaseResult));
}
It could even be taken one step further to make the service request and database call concurrently if there were no dependencies between the two which would reduce processing latency for individual requests: async Task handleRequestAsync(HttpRequest request) {
var serviceResultTask = makeServiceCallForRequestAsync(request);
var databaseResultTask = makeDatabaseCallForRequestAsync(request);
await sendResponseAsync(constructResponse(request, await serviceResultTask, await databaseResultTask));
}
I've added asynchronicity into a C# server application as above with substantial improvements in both individual request latency and overall scalability. I'm now working on a Java8 system and bemoaning the comparatively primitive and inconsistent async capabilities in Java8.Lack of generics on channels really hurts the library ecosystem though. Many things you need to write yourself.
Just give me greenlets or whatever and let me run synchronous code concurrently.
async def proxy(dest_host, dest_port, main_task, source_sock, addr):
await main_task.cancel()
dest_sock = await curio.open_connection(dest_host, dest_port)
async with dest_sock:
await copy_all(source_sock, dest_sock)
Are you kidding me? Simplified that is async def func():
await f()
dest_sock = await f()
async with dest_sock:
await f()
Every other token is async or await. No thank you.The point is this: threads are still expensive in bulk (the CPU has to shuffle a lot of data every time you switch). So all kernels have mechanisms to support parallel IO operations. An async library will use the best available kernel mechanism for IO; epoll on Linux, kqueue on BSDs, maybe IO Completion Ports on Windows (not sure). Turns out, doing that requires some help from the language itself or the code turns into a pyramidal mess. Async keyword addresses the readability aspect of code.
So:
a) It's more complex than synchronous code
b) But it solves the performance problem without too much cognitive overhead (once you get used to it).
They don't have to be. First of all, even ordinary threads are more efficient than you might think. On a really awful low-end Android 4.1 device, I can pthread_create and pthread_join over 5,000 threads per second. On a real computer, my X1 Carbon Gen4, I can create and join over 110,000 threads per second. (And keep in mind that each create-join pair also forces two full context switches.)
For most applications, performance of regular threads is perfectly adequate. In these environments, the maintainability and debuggability advantages of using plain old boring threads makes it really hard to justify using something exotic.
But suppose you do have big performance requirements: you can still use normal-looking threaded code. There's a difference between how we represent threads in source code and how we implement them. It's possible to provide green, userspace-switched threads without requiring "await" and "async" keywords everywhere. GNU Pth did it a long time ago, and there are lots of other fibers implementations.
> the CPU has to shuffle a lot of data every time you switch
Any green-threaded system (with or without explicit preemption points) also does context switches! Such a system maintains in user space a queue of things to work on: as the system switches from one of these work items to another, it's switching contexts! You have the same kind of register reloading and cache coldness problems that switching thread contexts has. There's no particular reason that you can do it much better than the kernel can do it, especially since switching threads in the same address space is pretty efficient.
they're not. gevent (and threads) are way faster than explicit asyncio, as all of asyncio's keywords / yields each have their own overhead. Here's my benches (disclaimer: for the "yield from" version of asyncio). http://techspot.zzzeek.org/2015/02/15/asynchronous-python-an...
With or without async, we're writing threads. (Promise chains are _also_ threads, very awkwardly spelled.) Really, we're arguing over whether we want our preemption points to be explicit or implicit. I prefer implicit myself, because the implicit style leads to much clearer code.
I understand how the JavaScript people might be excited that they can finally have threads, even if ugly ones, but there's no reason to get the rest of the world to switch to explicit-preemption-point threads.
It's not even that!
It's not like you actually get to decide where to await in async/await code - you have to await on any call that is async, if you expect to get the result.
Now, if the underlying framework uses hot tasks - meaning the async operation starts executing as soon as it's invoked, and not when the returned task is awaited (as in e.g. .NET/C#) - you can choose to omit async to, effectively, fork your async "thread". So NOT doing await on something is just a fork operation. It's the reverse from regular sync code, where thread forks are explicit, and sequential flow on a single thread is implicit.
One other case where you wouldn't await is when you need to await on a combination of any or all tasks at the same time (i.e., wait until all tasks complete, or wait until one of the tasks completes). But the first one is equivalent to a thread join in sync code, and the second to a condition variable. So, again, you get a case where something more explicit in sync code is more implicit in async code, and vice versa.
Now note that all this is solely about syntax! You can take the C# compiler, and change it so that every awaitable statement is automatically awaited, except when the newly introduced operator "taskof" is applied, in which case you get the raw future instead. Voila! Cooperative future-based multitasking with implicit preemption points. Yet it works exactly the same, and will even be able to call into and be called from any existing C# code compiled by the original compiler.
I suspect that this will be the next step after async/await, once enough people notice that the default (non-await) behavior is something that they need very rarely, and figure out that it's better to rather change the syntax so that the much more common thing (await) is implicit. Similar to how the use of =/== for assignment and comparison has won out over :=/= in imperative languages.
Imagine the same thing using Promises:
def proxy(dest_host, dest_port, main_task, source_sock, addr):
main_task.cancel()\
.then(lamdba _: curio.open_connection(dest_host, dest_port))\
.then(lambda dest_sock: copy_all(source_sock, dest_sock)Think of handling a web request, where you have to do parallel I/O requests to subsystems like a database, a webservice, redis, and so on. I think async/await gives us a nice standard way of describing "hit me back once X is done".
A lot of terrible bugs in code is caused by people making assumptions such as yours.
> if you have N logical threads concurrently executing a routine with Y yield points, then there are NY possible execution orders that you have to hold in your head
is actively harmful to software maintainability. Concurrency problems don't disappear when you make your yield points explicit.
Look: in traditional multi-threaded programs, we protect shared data using locks. If you avoid explicit locks and instead rely on complete knowledge of all yield points (i.e., all possible execution orders) to ensure that data races do not happen, then you've just created a ticking time-bomb: as soon as you add a new yield point, you invalidate your safety assumptions.
Traditional lock-based preemptive multi-threaded code isn't susceptible to this problem: it already embeds maximally pessimistic assumptions about execution order, so adding a new preemption point cannot hurt anything.
Of course, you can use mutexes with explicit yield points too, but nobody does: the perception is that cooperative multitasking (or promises or whatever) frees you from having to worry about all that hard, nasty multi-threaded stuff you hated in your CS classes. But you haven't really escaped. Those dining philosophers are still there, and now they're angry.
The article claims that yield-based programming is easier because the fewer the total number of yield points, the less mental state a programmer needs to maintain. I don't think this argument is correct: in lock-based programming, we need to keep _zero_ preemption points in mind, because we assume every instruction is a yield point. Instead of thinking about NY program interleavings, we think about how many locks we hold. I bet we have fewer locks than you have yields.
To put it another way, the composition properties of locks are much saner than the composition properties of safety-through-controlling-yield.
I believe that we got multithreaded programming basically right a long time ago, and that improvement now rests on approaches like reducing mutable shared state, automated thread-safety analysis, and software transactional memory. Encouraging developers to sprinkle "async" and "await" everywhere is a step backward in performance, readability, and robustness.
1. Thread-per-request. This is a simple model. You have a fixed-size thread pool of size N, and once you hit that limit, you can't serve anymore requests. Thread-per-request has several sources of overhead, which is why people recommend against it: thread limits, per-thread stack memory usage, and context switching.
2. Coroutine style handling with cooperative scheduling at synchronization points (locks, I/O). This is how Go handles requests.
3. Asynchronous request handling. You still have a fixed-size thread pool handling requests, but you no longer limit the number of simultaneous requests with the size of that thread pool. There are several different styles of async request handling: callbacks, async/await, and futures.
#2 and #3 are more common these days because they don't suffer from the many drawbacks of the thread-per-request model, although both suffer from some understandability issues.
(By the way: most of the time, a plain-old-boring thread-per-request is just fine, because most of the time, you're not writing high-scale software. If you have at most two dozen concurrent tasks, you're wasting your time worrying about the overhead of plain old pthread_t.)
I'm using a much more expansive definition of "thread" than you are. Sure, in the right situation, maybe M:N threading, or full green threads, or whatever is the right implementation strategy. There's no reason that green threading has to involve the use of explicit "async" and "await" keywords, and it's these keywords that I consider silly.
E.g. at first you have a server that accepts multiple connections and each must be handled -> Thread per connection or one thread for all connections? If you go for threads you might even need multiples, e.g. a reader thread, a writer thread which processes a write queue and a third one which maintains the state for the connection and coordinates reads and writes.
Then on a higher layer you might have multiple streams per connection (e.g. in HTTP/2), where you again have to decide how these should be represented.
Depending on the protocol and application there might be even more or other layers that need concurrency and synchronization.
But the general approaches that you mention do still apply here: You can either use a thread for each concurrent entity and using blocking operations. Or you can multiplex multiple concurrent entities on a single thread with async operations and callbacks. Coroutines are a mix which provide an API like the first approach with an implementation that looks more like the second approach.
If your application acts as a stateless proxy between client machines and your persistence layer, can't you just spin up another instance and load balance them at any time? It's not the most efficient solution at scale, but lots of people use this strategy.
Lets not forget about forking servers. The kind where each request forks.
The complexity you see affects yourself more than the complexity you don't.
I rather agree.
FWIW my friend Abhijit Menon-Sen wrote a blog post on the matter last year, about some code with excellent test coverage and explicit yield points: http://toroid.org/callback-heaven
That may be when the "typically" happen, but you can still get race conditions where you have two tasks that can run next, and you accidentally write an assumption about which will happen into your code when there is no such assumption in the scheduler. You certainly will get fewer of these with async/await than with pure event-handling-style code, because async/await carries more information about proper ordering of code, but yes, you can still get things that are correctly described as "race conditions".
You get an equal and opposite problem: whenever you add one more lock, you invalidate your liveness assumptions.
> The article claims that yield-based programming is easier because the fewer the total number of yield points, the less mental state a programmer needs to maintain. I don't think this argument is correct: in lock-based programming, we need to keep _zero_ preemption points in mind, because we assume every instruction is a yield point. Instead of thinking about NY program interleavings, we think about how many locks we hold. I bet we have fewer locks than you have yields.
I'll take that bet. You really don't have to yield very often - only when making a network request, and perhaps not even for that in the case of a fast local network. Whereas you have to lock every piece of state that you have.
You need to lock every piece of shared state you have. Where "shared" means stuff that many threads must communicate among themselves. One tends to keep the number of that kind of state low, really low. When zero is not possible, the most common number by a wide margin is one¹.
If you have more than 1, they are normally completely independent pieces of state that will not be used at the same time. If you have more than 1, and they are not independent, the code is either the result of at least one PHD thesis, or it does not work (or, often, both).
I bet you do network requests more than once on your code.
1 - The size of the shared state does not matter, so it's often one really big state.
While true, locks aren't free from this problem. They have the inverse. If someone adds code that accesses a data structure that should be protected by a lock and they forget to add the lock, you also lose all of your safety assumptions.
In particular, WinRT heavily promotes this approach for UWP apps.
If you don't have some organized way of managing concurrency, you're going to have problems. Without OOP, what? "Critical sections" lock relative to the code, not the data. "Which lock covers what data?" is a big issue, and the cause of many race conditions.
(The dislike of OOP seems to stem from the problems of getting objects into and out of databases in web services. One anti-OOP article suggests stored procedures as an alternative. Many database-oriented programs effectively use the database as their concurrency management tool. Nothing wrong with that, but it doesn't help if your problem isn't database driven.)
Python has the threading model of C - no language constructs for threads. It's all done in libraries. There's no protection against race conditions in user code. The underlying memory model is protected, by making operations that could break the memory model atomic, but that's all. CPython also has some major thread performance problems due to the Global Interpreter Lock. Having more CPUs doesn't speed things up; it makes programs slower, due to lock contention inefficiencies. So the use of real threads is discouraged in Python.
There's a suggested workaround with the "multiprocessing" module. This creates ultra-heavyweight threads, with a process for each thread, and talks to them with inefficient message passing. It's used mostly to run other programs from Python programs, and doesn't scale well.
So Python needed something to be competitive. There are armies of Javascript programmers with no experience in locking, but familiarity with a callback model. This seems to be the source of the push to put it in Python. Like many language retrofits, it's painful.
Does this imply that the major libraries will all have to be overhauled to make them async-compatible?
…which is why I was happy to hear that not all hope is lost and that someone created an alternative. Now, I haven't taken a look at curio yet, so maybe I'm a bit quick to judge, but I already found it very refreshing that spending not even a minute to read the documentation already left me with a good idea of how it works and how I can use it. Kudos to the author(s), I will definitely give it a try!
also as of now, most people who used it complain it is slow
give it 2 more years before you worry, and for now continue with python or whatever you like to use
no one is in a rush to make perl 6 popular ... it is not a commercial project ... so don't bet your career on perl 6 ... yet