Python and Async Simplified (2018) (opens in new tab)

(aeracode.org)

73 pointsdil83y ago46 comments

46 comments

34 comments · 7 top-level

BiteCode_dev3y ago· 7 in thread

It gives good pointers but it falls short on the usual suspects for an article on asyncio.

When teaching it, it's important to emphasis:

- await is locally blocking, so you should isolate linear workflows into their own coro, which is the unit of concurrency.

- to allow concurrency, you should use asyncio.create_task on coro (formerly ensure_future).

- you should always explicitly delimitate the life cycle of any task. Right now, this means using something like gather() or wait(). TaskGroup will help when it becomes mainstream.

A HN comment is not great to explain that, but if you read the article, you should investigate those points. There is no good asyncio code without them, only pain and disapointment.

quietbritishjim3y ago

> ... TaskGroup will help when it becomes mainstream.

Strongly agreed, but you can use anyio [1] in to of asyncio to get that functionality right now. Or, maybe even better, use Trio [2] instead, which is where the idea came from in the first place.

[1] https://anyio.readthedocs.io/en/stable/

[2] https://trio.readthedocs.io/en/stable/

throwway23323y ago

> to allow concurrency, you should use asyncio.create_task on coro (formerly ensure_future).

This is misleading... you can use asyncio.gather which does this internally [0].

[0]: https://github.com/python/cpython/blob/main/Lib/asyncio/task...

BiteCode_dev3y ago

Only if you wish to collect tasks where you schedule them, accumulate the results and don't need to limit concurrency.

1 more reply

uniqueuid3y ago

> you should always explicitly delimitate the life cycle of any task

Unless you want a hacky actor system, in which case it's totally fine to `create_task` a ton of corountines which have their own spin loop with await sleep :)

aserafini3y ago

Even if you want to ‘fire and forget’, it’s still essential to keep a reference to the task, otherwise it can be garbage collected mid-execution:

https://docs.python.org/3/library/asyncio-task.html#asyncio....

1 more reply

lozenge3y ago

One coroutine crashing and the others continuing to send it messages without noticing was my $40,000 bug.

1 more reply

BiteCode_dev3y ago

At some point even those tasks must be cleanly stopped and unless you want to play erlang and "let it crash", the actors have a lifecycle as well. Making it explicit will avoid much pain, and ease testing a lot. Also it will make resources consumption more predictable.

ogogmad3y ago· 6 in thread

I thought Python couldn't multithread because of GIL? I understood from the article that async derives all its benefit from certain OS-level operations which don't need to run in a coroutine, like reading from a network socket or waiting for timers to finish.

Another question: Is Python's implementation of async/await identical to other languages? In particular, do they always use coroutines instead of threads?

zzzeek3y ago

since it's my job to clear these things up, a few pointers:

1. python has threads. they just cannot perform CPU bound tasks in parallel due to the GIL. The GIL is released for IO, so threads can perform IO waiting in parallel, just like asyncio 2. asyncio runs in one thread, and has the exact same limitations as threads as implemented in Python, CPU operations are serialized, async tasks can yield for IO.

the advantages offered by asyncio are: 1. you can have thousands of tasks extremely quickly cheaply, which is not as much the case for threads in Python . this can allow for massive concurrent architectures more expediently, provided your concurrency is very IO bound (if you are CPU bound, disaster) 2. people just like asyncio's programming model, IMPO this is largely due to the popularity of Javascript's event-based model being natural for lots of newer programmers

js23y ago

I've been coding Python since 2.5 days and I have yet to have a use case where I've really needed asyncio. For client-side code, concurrent.futures (specifically ThreadPoolExecutor) has satisfied nearly every use case, though occasionally I'll use a a worker-thread model.

For server-side code, I'd still probably use threads up to maybe 1000 concurrent connections. Beyond that, I've used gevent to good effect. e.g., I have a server that receives HTTP POSTs which are multipart forms, the form having 3 parts, a JSON part and two file parts. The two files parts get written to files on S3 and the JSON part to SQS. The web framework is Falcon[1] and I also made use of a Cython-based HTTP form parser[2]. Concurrency is handled via gevent. Openresty sits in front and invokes the Python server via uwsgi. At the time I developed it, asyncio was not yet mature and not supported by boto3. I benchmarked against pypy but unsurprisingly (since it's I/O bound) got better performance and from CPython + gevent.

If I were developing it from scratch today, I'd re-evaluate the asyncio story, or more likely than not, choose a different language.

I don't doubt that there's use-cases to which asyncio is well-suited and the right choice, but I suspect folks may be using it in cases where they'd be fine with threads. As always, there are trade-offs.

1. https://falconframework.org/

2. https://pypi.org/project/streaming-form-data/ (I think)

2 more replies

morbia3y ago

The GIL gets even more complicated than that because it can also be released during CPU bound tasks that don't interact with python objects (e.g. Array operations in numpy)

1 more reply

diarrhea3y ago

I understand async/await in Python to be entirely single-threaded. So is, for example, C#'s implementation: https://learn.microsoft.com/en-us/dotnet/csharp/programming-... ("The async and await keywords don't cause additional threads to be created.")

tomnipotent3y ago

Eliminates blocking on IO requests, letting the event loop spend CPU cycles doing non-IO work. Alternative is CPU doing nothing while waiting on IO, which for something like a web app doing lots of small network requests to database/cache can add up to a lot. CPU work is still single-threaded.

throwway23323y ago

> I thought Python couldn't multithread because of GIL?

Why would it need to in this case? You only need one thread for concurrent I/O.

landersg3y ago· 5 in thread

Async is overengineered and bolted on. If you must use Python, I'd still recommend Twisted, which is more accessible. Otherwise, of course use Go, Elixir, etc. in the first place.

kortex3y ago

I find Twisted less accessible and more opaque than async. It's also more "bolted on" in that it's an entirely separate library/framework outside the standard lib.

Async might technically be bolted on, but no worse than async in most languages which weren't designed de novo for async (eg go/elixir).

throwamon3y ago

Promising contenders include algebraic effects in OCaml and JVM's Project Loom.

bsenftner3y ago

It reminds me of the cooperative multitasking used by the original Mac OS.

ogogmad3y ago

I don't know. It seems to me Elixir requires a new and rather restrictive programming paradigm. For me, that's even worse than being overengineered.

throwamon3y ago

> new

Erlang has been around since what, the 80s? Elixir is "just" Erlang with a different face and extra features.

> restrictive

which is? Functional programming? Immutability?

Interestingly, Erlang is often called a "true" object-oriented language thanks to its actor model. It's incredibly powerful and flexible, pretty much the opposite of restrictive. Just for a simple example, you can inspect, debug and modify your program while it's running.

From your comment it just seems you're not familiar with it.

sedeki3y ago· 3 in thread

Can anyone recommend a good book/primer on "concurrency models" (is that a term?) for a self-taught programmer?

While I am self-taught, I'm used to (academic) books that strive for completeness. It is also what I prefer. Rather than something more pragmatic like a blog post.

It doesn't mean I want to read overly complicated prose on the subject, which I'm sure is possible.

uniqueuid3y ago

I don't have a book recommendation ready, but if you want to see how async can be used in a large codebase, have a look at the telethon [1] library. It's a python library for telegram and one of the few that actually implement MTProto. It's huge, generates a large chunk of its machinery automatically from the MTProto specs and is extremely (!) well structured.

This is much more useful than the typical "let's write a single-run example with async" blog post.

[1] https://github.com/LonamiWebs/Telethon

cdavid3y ago

The book https://pragprog.com/titles/pb7con/seven-concurrency-models-... is actually pretty good, and much better than the title may suggest.

bombolo3y ago

I think you should start by reading how "async" really works… that's call on a poll() (or epoll on linux) function, a loop, and a list of "call this function when this file descriptor can be written/read".

The whole async thing is there to abstract away and not have the program structured around the main loop… but in reality you have to keep in mind you are in a main loop that calls poll() and then all the registered functions.

samsquire3y ago· 3 in thread

I am really interested in this space.

There's an article that Cal Paterson wrote that async doesn't speed up code - it is not parallel. The GIL prevents Python from being parallel. So even if you create a thread to run an async method in Python, it shall not run in parallel to the main thread of execution. (In fact, it shall block the main thread of execution if you start a thread in the thread you are in, due to the blocking run_in_executor)

https://calpaterson.com/async-python-is-not-faster.html

I wrote a multithreaded userspace 1:M:N scheduler (1 scheduler thread, M kernel threads and N lightweight/green threads) which resembles Golang M:N model. I implemented the same design in Rust, C and Java. I am thinking it could be combined with my epoll-server and it would be an application server.

https://github.com/samsquire/preemptible-thread https://github.com/samsquire/epoll-server

I am also interested in structured concurrency. This article by Vala developers is good.

https://verdagon.dev/blog/seamless-fearless-structured-concu...

I am trying to find a concurrent software design that is scalable and is easy to write and hides complicated lock programming. I document my studies and ideas in the open in ideas4.

https://github.com/samsquire/ideas4

I've implemented multithreaded parallel multiversion concurrency control in Java, which is the same approach used by Postgresql and MySQL for concurrent read and writing to the same data atomically.

I still think concurrency is hard to write and understand. Even with async/await.

// 3 requests in flight

result1 = async_task1();

result2 = async_task2();

result3 = async_task3();

await result1;

await result2;

await result3;

I ported a parallel multiconsumer multiproducer ringbuffer from Alek

https://www.linuxjournal.com/content/lock-free-multi-produce...

I use Python threads in https://github.com/samsquire/devops-schedule and https://github.com/samsquire/parallel-workers to parallelise a topologically sorted graph of IO of devops programs. This allows efficient scheduling and blocking with thread.join() for each split of the work graph and then a regrouping before doing other things, also potentially in parallel. This pattern is efficient and easy to use.

ElectricalUnion3y ago

> await result1;

> await result2;

> await result3;

Not really, you only have *1* request in flight.

And you're waiting for them sequentially.

You need asyncio.gather ( https://docs.python.org/3/library/asyncio-task.html#asyncio.... ) if you want to run tasks concurrently.

results = await asyncio.gather(result1, result2, result3)

samsquire3y ago

Before the first await result1 the coroutine objects are in flight.

1 more reply

throwway23323y ago

  // 3 requests in flight

  result1 = async_task1();

  result2 = async_task2();

  result3 = async_task3();

Depends on implementation, some are eager, some are lazy.

kqr3y ago· 2 in thread

Similar question to the other one at the time of writing, but more specific: does anyone have a good, thorough introduction to the "async event loop" (sometimes known as "asyncio") pattern? By thorough I mean that it goes beyond a starter tutorial, into both examples of various supporting libraries and implementation details that matter for usage. I'm fine with a book, too.

There are popular libraries for it in both Python and Perl and I suspect I could make good use for it if I understood it.

Unfortunately, I've only ever used it in a cargo cult manner of sticking together functions until the error messages go away (yeah yeah, it was only for "throwaway" "prototypes") so I really don't understand how it all is meant to fit together.

hbrn3y ago

I found this post to be amazing intro that shows you how to go from simple generators to async event loop.

https://mleue.com/posts/yield-to-async-await/

kqr3y ago

This was very good! But true to what's common, it stops just as it gets really juicy!

intrepidhero3y ago· 1 in thread

This is the bit that should be at the very top of the official docs. It's tripped me up every time I go to write async code and until you learn it, the error message is very confusing.

> In particular, calling it will immediately return a coroutine object, which basically says "I can run the coroutine with the arguments you called with and return a result when you await me".

> The code in the target function isn't called yet - this is merely a promise that the code will run and you'll get a result back, but you need to give it to the event loop to do that.

If I try to pass the async function to gather (for example) without calling it, which makes some intuitive sense, since functions are first class objects and I know I'm not calling it, the event loop is, the error message reads something like, "gather only accepts coroutines." But I thought it was a coroutine because I declared it with async! For some reason it took me a silly amount of time to notice that in all the examples, the async function is called when it's passed to gather (or whatever). That's not intuitive to me and the distinction made in the article should be clearer in the docs.

hbrn3y ago

> If I try to pass the async function to gather (for example) without calling it, which makes some intuitive sense

That intuition breaks immediately when you realize that those functions can have arguments, and you have no way to pass them.

j / k navigate · click thread line to collapse

46 comments

34 comments · 7 top-level

BiteCode_dev3y ago· 7 in thread

It gives good pointers but it falls short on the usual suspects for an article on asyncio.

When teaching it, it's important to emphasis:

- await is locally blocking, so you should isolate linear workflows into their own coro, which is the unit of concurrency.

- to allow concurrency, you should use asyncio.create_task on coro (formerly ensure_future).

- you should always explicitly delimitate the life cycle of any task. Right now, this means using something like gather() or wait(). TaskGroup will help when it becomes mainstream.

A HN comment is not great to explain that, but if you read the article, you should investigate those points. There is no good asyncio code without them, only pain and disapointment.

quietbritishjim3y ago

> ... TaskGroup will help when it becomes mainstream.

Strongly agreed, but you can use anyio [1] in to of asyncio to get that functionality right now. Or, maybe even better, use Trio [2] instead, which is where the idea came from in the first place.

[1] https://anyio.readthedocs.io/en/stable/

[2] https://trio.readthedocs.io/en/stable/

throwway23323y ago

> to allow concurrency, you should use asyncio.create_task on coro (formerly ensure_future).

This is misleading... you can use asyncio.gather which does this internally [0].

[0]: https://github.com/python/cpython/blob/main/Lib/asyncio/task...

BiteCode_dev3y ago

Only if you wish to collect tasks where you schedule them, accumulate the results and don't need to limit concurrency.

1 more reply

uniqueuid3y ago

> you should always explicitly delimitate the life cycle of any task

Unless you want a hacky actor system, in which case it's totally fine to `create_task` a ton of corountines which have their own spin loop with await sleep :)

aserafini3y ago

Even if you want to ‘fire and forget’, it’s still essential to keep a reference to the task, otherwise it can be garbage collected mid-execution:

https://docs.python.org/3/library/asyncio-task.html#asyncio....

1 more reply

lozenge3y ago

One coroutine crashing and the others continuing to send it messages without noticing was my $40,000 bug.

1 more reply

BiteCode_dev3y ago

ogogmad3y ago· 6 in thread

Another question: Is Python's implementation of async/await identical to other languages? In particular, do they always use coroutines instead of threads?

zzzeek3y ago

since it's my job to clear these things up, a few pointers:

js23y ago

If I were developing it from scratch today, I'd re-evaluate the asyncio story, or more likely than not, choose a different language.

1. https://falconframework.org/

2. https://pypi.org/project/streaming-form-data/ (I think)

2 more replies

morbia3y ago

The GIL gets even more complicated than that because it can also be released during CPU bound tasks that don't interact with python objects (e.g. Array operations in numpy)

1 more reply

diarrhea3y ago

tomnipotent3y ago

throwway23323y ago

> I thought Python couldn't multithread because of GIL?

Why would it need to in this case? You only need one thread for concurrent I/O.

landersg3y ago· 5 in thread

Async is overengineered and bolted on. If you must use Python, I'd still recommend Twisted, which is more accessible. Otherwise, of course use Go, Elixir, etc. in the first place.

kortex3y ago

I find Twisted less accessible and more opaque than async. It's also more "bolted on" in that it's an entirely separate library/framework outside the standard lib.

Async might technically be bolted on, but no worse than async in most languages which weren't designed de novo for async (eg go/elixir).

throwamon3y ago

Promising contenders include algebraic effects in OCaml and JVM's Project Loom.

bsenftner3y ago

It reminds me of the cooperative multitasking used by the original Mac OS.

ogogmad3y ago

I don't know. It seems to me Elixir requires a new and rather restrictive programming paradigm. For me, that's even worse than being overengineered.

throwamon3y ago

> new

Erlang has been around since what, the 80s? Elixir is "just" Erlang with a different face and extra features.

> restrictive

which is? Functional programming? Immutability?

From your comment it just seems you're not familiar with it.

sedeki3y ago· 3 in thread

Can anyone recommend a good book/primer on "concurrency models" (is that a term?) for a self-taught programmer?

While I am self-taught, I'm used to (academic) books that strive for completeness. It is also what I prefer. Rather than something more pragmatic like a blog post.

It doesn't mean I want to read overly complicated prose on the subject, which I'm sure is possible.

uniqueuid3y ago

This is much more useful than the typical "let's write a single-run example with async" blog post.

[1] https://github.com/LonamiWebs/Telethon

cdavid3y ago

The book https://pragprog.com/titles/pb7con/seven-concurrency-models-... is actually pretty good, and much better than the title may suggest.

bombolo3y ago

samsquire3y ago· 3 in thread

I am really interested in this space.

https://calpaterson.com/async-python-is-not-faster.html

https://github.com/samsquire/preemptible-thread https://github.com/samsquire/epoll-server

I am also interested in structured concurrency. This article by Vala developers is good.

https://verdagon.dev/blog/seamless-fearless-structured-concu...

I am trying to find a concurrent software design that is scalable and is easy to write and hides complicated lock programming. I document my studies and ideas in the open in ideas4.

https://github.com/samsquire/ideas4

I've implemented multithreaded parallel multiversion concurrency control in Java, which is the same approach used by Postgresql and MySQL for concurrent read and writing to the same data atomically.

I still think concurrency is hard to write and understand. Even with async/await.

// 3 requests in flight

result1 = async_task1();

result2 = async_task2();

result3 = async_task3();

await result1;

await result2;

await result3;

I ported a parallel multiconsumer multiproducer ringbuffer from Alek

https://www.linuxjournal.com/content/lock-free-multi-produce...

ElectricalUnion3y ago

> await result1;

> await result2;

> await result3;

Not really, you only have *1* request in flight.

And you're waiting for them sequentially.

You need asyncio.gather ( https://docs.python.org/3/library/asyncio-task.html#asyncio.... ) if you want to run tasks concurrently.

results = await asyncio.gather(result1, result2, result3)

samsquire3y ago

Before the first await result1 the coroutine objects are in flight.

1 more reply

throwway23323y ago

  // 3 requests in flight

  result1 = async_task1();

  result2 = async_task2();

  result3 = async_task3();

Depends on implementation, some are eager, some are lazy.

kqr3y ago· 2 in thread

There are popular libraries for it in both Python and Perl and I suspect I could make good use for it if I understood it.

hbrn3y ago

I found this post to be amazing intro that shows you how to go from simple generators to async event loop.

https://mleue.com/posts/yield-to-async-await/

kqr3y ago

This was very good! But true to what's common, it stops just as it gets really juicy!

intrepidhero3y ago· 1 in thread

This is the bit that should be at the very top of the official docs. It's tripped me up every time I go to write async code and until you learn it, the error message is very confusing.

> In particular, calling it will immediately return a coroutine object, which basically says "I can run the coroutine with the arguments you called with and return a result when you await me".

> The code in the target function isn't called yet - this is merely a promise that the code will run and you'll get a result back, but you need to give it to the event loop to do that.

hbrn3y ago

> If I try to pass the async function to gather (for example) without calling it, which makes some intuitive sense

That intuition breaks immediately when you realize that those functions can have arguments, and you have no way to pass them.

j / k navigate · click thread line to collapse