When teaching it, it's important to emphasis:
- await is locally blocking, so you should isolate linear workflows into their own coro, which is the unit of concurrency.
- to allow concurrency, you should use asyncio.create_task on coro (formerly ensure_future).
- you should always explicitly delimitate the life cycle of any task. Right now, this means using something like gather() or wait(). TaskGroup will help when it becomes mainstream.
A HN comment is not great to explain that, but if you read the article, you should investigate those points. There is no good asyncio code without them, only pain and disapointment.
Strongly agreed, but you can use anyio [1] in to of asyncio to get that functionality right now. Or, maybe even better, use Trio [2] instead, which is where the idea came from in the first place.
This is misleading... you can use asyncio.gather which does this internally [0].
[0]: https://github.com/python/cpython/blob/main/Lib/asyncio/task...
Unless you want a hacky actor system, in which case it's totally fine to `create_task` a ton of corountines which have their own spin loop with await sleep :)
https://docs.python.org/3/library/asyncio-task.html#asyncio....
> In particular, calling it will immediately return a coroutine object, which basically says "I can run the coroutine with the arguments you called with and return a result when you await me".
> The code in the target function isn't called yet - this is merely a promise that the code will run and you'll get a result back, but you need to give it to the event loop to do that.
If I try to pass the async function to gather (for example) without calling it, which makes some intuitive sense, since functions are first class objects and I know I'm not calling it, the event loop is, the error message reads something like, "gather only accepts coroutines." But I thought it was a coroutine because I declared it with async! For some reason it took me a silly amount of time to notice that in all the examples, the async function is called when it's passed to gather (or whatever). That's not intuitive to me and the distinction made in the article should be clearer in the docs.
That intuition breaks immediately when you realize that those functions can have arguments, and you have no way to pass them.
There are popular libraries for it in both Python and Perl and I suspect I could make good use for it if I understood it.
Unfortunately, I've only ever used it in a cargo cult manner of sticking together functions until the error messages go away (yeah yeah, it was only for "throwaway" "prototypes") so I really don't understand how it all is meant to fit together.
Another question: Is Python's implementation of async/await identical to other languages? In particular, do they always use coroutines instead of threads?
1. python has threads. they just cannot perform CPU bound tasks in parallel due to the GIL. The GIL is released for IO, so threads can perform IO waiting in parallel, just like asyncio 2. asyncio runs in one thread, and has the exact same limitations as threads as implemented in Python, CPU operations are serialized, async tasks can yield for IO.
the advantages offered by asyncio are: 1. you can have thousands of tasks extremely quickly cheaply, which is not as much the case for threads in Python . this can allow for massive concurrent architectures more expediently, provided your concurrency is very IO bound (if you are CPU bound, disaster) 2. people just like asyncio's programming model, IMPO this is largely due to the popularity of Javascript's event-based model being natural for lots of newer programmers
For server-side code, I'd still probably use threads up to maybe 1000 concurrent connections. Beyond that, I've used gevent to good effect. e.g., I have a server that receives HTTP POSTs which are multipart forms, the form having 3 parts, a JSON part and two file parts. The two files parts get written to files on S3 and the JSON part to SQS. The web framework is Falcon[1] and I also made use of a Cython-based HTTP form parser[2]. Concurrency is handled via gevent. Openresty sits in front and invokes the Python server via uwsgi. At the time I developed it, asyncio was not yet mature and not supported by boto3. I benchmarked against pypy but unsurprisingly (since it's I/O bound) got better performance and from CPython + gevent.
If I were developing it from scratch today, I'd re-evaluate the asyncio story, or more likely than not, choose a different language.
I don't doubt that there's use-cases to which asyncio is well-suited and the right choice, but I suspect folks may be using it in cases where they'd be fine with threads. As always, there are trade-offs.
1. https://falconframework.org/
2. https://pypi.org/project/streaming-form-data/ (I think)
Why would it need to in this case? You only need one thread for concurrent I/O.
While I am self-taught, I'm used to (academic) books that strive for completeness. It is also what I prefer. Rather than something more pragmatic like a blog post.
It doesn't mean I want to read overly complicated prose on the subject, which I'm sure is possible.
This is much more useful than the typical "let's write a single-run example with async" blog post.
The whole async thing is there to abstract away and not have the program structured around the main loop… but in reality you have to keep in mind you are in a main loop that calls poll() and then all the registered functions.
Async might technically be bolted on, but no worse than async in most languages which weren't designed de novo for async (eg go/elixir).
Erlang has been around since what, the 80s? Elixir is "just" Erlang with a different face and extra features.
> restrictive
which is? Functional programming? Immutability?
Interestingly, Erlang is often called a "true" object-oriented language thanks to its actor model. It's incredibly powerful and flexible, pretty much the opposite of restrictive. Just for a simple example, you can inspect, debug and modify your program while it's running.
From your comment it just seems you're not familiar with it.
There's an article that Cal Paterson wrote that async doesn't speed up code - it is not parallel. The GIL prevents Python from being parallel. So even if you create a thread to run an async method in Python, it shall not run in parallel to the main thread of execution. (In fact, it shall block the main thread of execution if you start a thread in the thread you are in, due to the blocking run_in_executor)
https://calpaterson.com/async-python-is-not-faster.html
I wrote a multithreaded userspace 1:M:N scheduler (1 scheduler thread, M kernel threads and N lightweight/green threads) which resembles Golang M:N model. I implemented the same design in Rust, C and Java. I am thinking it could be combined with my epoll-server and it would be an application server.
https://github.com/samsquire/preemptible-thread https://github.com/samsquire/epoll-server
I am also interested in structured concurrency. This article by Vala developers is good.
https://verdagon.dev/blog/seamless-fearless-structured-concu...
I am trying to find a concurrent software design that is scalable and is easy to write and hides complicated lock programming. I document my studies and ideas in the open in ideas4.
https://github.com/samsquire/ideas4
I've implemented multithreaded parallel multiversion concurrency control in Java, which is the same approach used by Postgresql and MySQL for concurrent read and writing to the same data atomically.
I still think concurrency is hard to write and understand. Even with async/await.
// 3 requests in flight
result1 = async_task1();
result2 = async_task2();
result3 = async_task3();
await result1;
await result2;
await result3;
I ported a parallel multiconsumer multiproducer ringbuffer from Alek
https://www.linuxjournal.com/content/lock-free-multi-produce...
I use Python threads in https://github.com/samsquire/devops-schedule and https://github.com/samsquire/parallel-workers to parallelise a topologically sorted graph of IO of devops programs. This allows efficient scheduling and blocking with thread.join() for each split of the work graph and then a regrouping before doing other things, also potentially in parallel. This pattern is efficient and easy to use.
> await result2;
> await result3;
Not really, you only have *1* request in flight.
And you're waiting for them sequentially.
You need asyncio.gather ( https://docs.python.org/3/library/asyncio-task.html#asyncio.... ) if you want to run tasks concurrently.
results = await asyncio.gather(result1, result2, result3)
// 3 requests in flight
result1 = async_task1();
result2 = async_task2();
result3 = async_task3();
Depends on implementation, some are eager, some are lazy.