An easy way to concurrency and parallelism with Python stdlib (opens in new tab)

(bitecode.dev)

105 pointsolsgaarddk2y ago68 comments

68 comments

49 comments · 15 top-level

cle2y ago· 11 in thread

I recently have been doing--what should be--straightforward subprocess work in Python, and the experience is infuriatingly bad. There are so many options for launching subprocesses and communicating with them, and each one has different caveats and undocumented limitations, especially around edge cases like processes crashing, timing out, killing them, if they are stuck in native code outside of the VM, etc.

For example, some high-level options include Popen, multiprocessing.Process, multiprocessing.Pool, futures.ProcessPoolExecutor, and huge frameworks like Ray.

multiprocessing.Process includes some pickling magic and you can pick from multiprocessing.Pipe and multiprocessing.Queue, but you need to use either multiprocessing.connection.wait() or select.select() to read the process sentinel simultaneously in case the process crashes. Which one? Well connection.wait() will not be interrupted by an OS signal. It's unclear why I would ever use connection.wait() then, is there some tradeoff I don't know about?

For my use cases, process reuse would have been nice to be able to reuse network connections and such (useful even for a single process). Then you're looking at either multiprocessing.Pool or futures.ProcessPoolExecutor. They're very similar, except some bug fixes have gone into futures.ProcessPoolExecutor but not multiprocessing.Pool because...??? For example, if your subprocess exits uncleanly, multiprocessing.Pool will just hang, whereas futures.ProcessPoolExecutor will raise a BrokenProcessPool and the pool will refuse to do any more work (both of these are unreasonable behaviors IMO). Timing out and forcibly killing the subprocess is its own adventure for each of these too. I don't care about a result anymore after some time period passes, and they may be stuck in C code so I just want to whack the process and move on, but that is not very trivial with these.

What a nightmarish mess! So much for "There should be one--and preferably only one--obvious way to do it"...my God.

(I probably got some details wrong in the above rant, because there are so many to keep track of...)

My learning: there is no "easy way to [process] parallelism" in Python. There are many different ways to do it, and you need to know all the nuances of each and how they address your requirements to know whether you can reuse existing high-level impls or you need to write your own low-level impl.

nerdponx2y ago

To be clear, Popen is very different from all the other options. That's for running other programs.

Process is low-level and is almost never what you want. Pool is "mid-level", and usually isn't what you want. ProcessPoolExecutor is usually what you want, it is the "one obvious way to do it". That's not at all clear from the docs though.

The one obvious way to do it, in general, is: subprocess.run for running external processes, subprocess.Popen for async interaction with external processes, and concurrent.futures.ProcessPoolExecutor for Python multiprocessing.

Your other complaints about actually using the multiprocessing stuff are 100% valid. Error handling, cancellation, etc. is all very difficult. Passing data back and forth between the main process and subprocesses is not trivial.

But I do want to emphasize that there is a somewhat-well-defined gradient of lower- and higher-level tools in the standard library, and your "obvious way to do it" should usually start at the higher end of that gradient.

You might also want to look into the third-party Joblib library, which makes process parallelism a lot less painful for the straightforward use case of "run a function on a large amount of data, using multiple OS processes."

hot_gril2y ago

You're saying ProcessPoolExecutor is the "one obvious way to do it" but mention how the docs don't make this clear... That makes it not obvious. And since Python has built-in async/await keywords for asyncio now, shouldn't that be the one obvious correct way of doing concurrency?

Imagining I'm a newbie to Python concurrency, I Googled "concurrency in Python" and picked the first result from the official docs. https://docs.python.org/3/library/concurrency.html It's a list of everything except asyncio, and the first item on the list is the low-level `threading` :S At least that page mentions ThreadPoolExecutor, queue, and asyncio as alternatives, but I'm still lost on what is the correct way.

1 more reply

xeromal2y ago

Coming from C#, I honestly HATE python's multiprocessing and multithreading. Hell, I hate it's async await. I learned recently that in one mode, it pipes the values across the process and this made it impossible to use when passing along large pandas dataframes. I'm sure half of it is just my own lack of knowledge with python's abilities but C# sure made it easier. lol

nerdponx2y ago

For Pandas, I recommend the third-party Joblib library: https://joblib.readthedocs.io/en/latest/

1 more reply

p5a0u9l2y ago

The mess more reflects supporting a programmatic interface to processes in a cross platform manner, coupled with the actual complexity of parallel processing.

You didn’t mention the recommended high level option for subprocess, ‘subprocess. run’.

cle2y ago

Sure that exists too, but it blocks on process exit. I suppose I can run that in a separate thread but now I've got another dimension of complexity to deal with, and it's unclear if I can stream output from the subprocess?

There are other things I didn't mention that get thrown around too such as os.system() and os.fork().

1 more reply

hot_gril2y ago

Other high-level languages do a better job with this.

mixmastamyk2y ago

This off topic rant is the top comment? Really?

hot_gril2y ago

Title: "An easy way to concurrency and parallelism with Python"

Content: basically how to use ThreadPoolExecutor

Comment: Concurrency and parallelism aren't easy in Python.

How is this off-topic?

1 more reply

hot_gril2y ago

Python manages to combine the worst parts of high-level and low-level programming when it comes to multithreading. Like it's using multiple OS-level threads with the associated overhead (not greenthreading like in JS), except it's locking to negate actual multiprocessing, but you still have to use mutexes about as much as in C (no event loop like JS), and the whole API feels low-level and convoluted. It's like they tried to abstract things but gave up halfway through.

I like Python in general, but I avoid it for any kind of concurrent programming other than simple fan-out-fan-in.

seabrookmx2y ago

JS doesn't have green threads, just a single threaded event loop and context switching via promises or async/await. Green threads implies parallelism implemented in user space (ala. GoLang goroutines or JVM virtual threads).. JS is not parallel only concurrent.

2 more replies

crabbone2y ago· 7 in thread

> For those, Python actually comes with pretty decent tools: the pool executors.

Delusion level: max.

You have to be in a very, very bad place when this marginal improvement over absolute horror-show that bare Process offers seemed "pretty decent".

Python doesn't have good tools for parallelism / concurrency. It doesn't have average tools. It doesn't have even bad tools. It has the worst. Though, unfortunately, it's not the only language in this category :(

paulddraper2y ago

> It doesn't have even bad tools. It has the worst.

> It's not the only language in this category

Soo....not the worst? :) Or tied for it?

What do you find difficult/wrong with pool executors?

Also, you reference "Process", but FYI the article talks about multiple threads, not multiple processes.

hot_gril2y ago

Pool executors only solve one kind of use case. They aren't a general solution to concurrency+parallelism.

And they're still the worst version of this pattern, because despite using multiple OS-level threads with all the associated overhead, the GIL prevents most of the real parallelism from happening. And if you want full parallelism, you have to use multiprocessing.Pool, which adds pickling overhead and incompatibility.

crabbone2y ago

> Soo....not the worst? :)

Yeah... I know, it's hard to imagine that there could be more than one worst. But, as I have to practice these things with my 4 year old, I become more patient with adults who don't get the concept too.

Imagine you are in a class and the teacher gives everyone a pencil and a sheet of paper. Now, you want to find out who has the shortest pencil. All students compare their pencils and turns out that there are several pencils that are of the same exact length, and those are the shortest ones at the same time. So, more than one student has the shortest pencil.

But it doesn't end there. Not all sets which define a "greater than" relationship are totally ordered. In such sets it's possible to have multiple different smallest elements. Trivially, in a set that's not ordered, every element is the smallest.

> What do you find difficult/wrong with pool executors?

Difficult? -- I don't know.

Wrong? -- Well, it's pretty worthless... does it make it wrong? -- That's up to you to decide.

The idea of threads is bad for many reasons: one in particular is of how exceptions in threads are handled. But this isn't unique to Python. Python just made a bad decision to use threads in the language that's supposed to be "safe". Python thread implementation craps its pants when dealing with many aspects of threads. For example, thread-local variables. Since threads are objects in Python, you'd expect local variables to be properties on those objects... well the mechanism to use them is just idiotic and nothing like you would expect. When it comes to interacting with "native" code from Python, you'd expect some interaction with Python's scheduler so that the native code can portion its own execution, allow Python to interrupt it etc. but there's nothing of the kind.

Even though we haven't even gotten to the pools yet, pools, obviously, don't address any of the thread-related problems. If anything, they only amplify them. Specifically, the pool from concurrent package is worse than its relative from multiprocessing package because it uses "futures". The whole idea of "futures" is somehow broken in Python because of the neverending bugs related to deadlocking. It's been repeatedly "fixed", but every now and then deadlocks still happen. Here's the latest one I know of: https://bugs.python.org/issue46464 .

I've gone once down the rabbit hole of trying to make a native module work with Python threads... there's no good way to do it, but pools, be it from concurrent.futures or from multiprocessing are both very bad for many reasons. I was hoping to be able to give users an ability to control how parallel my native code is through the tools exposed by Python already, but that turned out to be such a disaster that I've given up on the idea. Python's thread wrappers are worthless for the native code that wants to actually execute concurrently -- they are only designed to execute Python code, non-concurrently. Like I already mentioned, Python has no infrastructure to communicate to the native code its scheduling decisions, no thread-safety in memory allocation, the code is overall poorly written (as in missing const, other imprecise typing, memory-inefficient data-structures)... there are no benefits to using that vs rolling your own. Only struggle with bad decisions.

Demiurge2y ago

If it's the worst, how is it not the only language in this category?

How do you rank C, Perl, JavaScript, PHP, ... parallelism compared to execution pool + futures here? The absolute MAX WORST?

crabbone2y ago

It's possible to have more than one worst. In a totally-ordered collection this happens if you have two or more equal elements, which happen to be worse or equal to any other element. In a partially-ordered collection, there could be groups of elements that are not comparable between each other, and so you will potentially have multiple distinct worst elements.

Trivially, in a collection that has no "worse than" relation you can define one that doesn't compare them at all, and declares them all "incomparable" -- which, again, would make them all worst.

Bonus question: can you imagine a collection where there is no worst element?

> How do you rank C, Perl, JavaScript, PHP

Well, none of these languages have their own parallelism / concurrency aspect. (Except Perl 5 maybe? I'm not really familiar with the language). They all rely on the system running them to do the parallelism.

So... all of these will go roughly into the same bin as Python?

Some languages have libraries that would allow them to do better (eg. you have PThreads in C), but that's not the function of the language.

hot_gril2y ago

Well execution pool doesn't even do parallelism really, just concurrency for the most part (thanks GIL). And JavaScript handles concurrency far better than Python; its event loop is designed for just that. JS and Py can also use subprocesses for true parallelism.

C and Java threads are better than Python because, uh, they can actually run in parallel. Rust adds convenience and safety on top, plus its own event loops. Golang has Goroutines. Erlang has some very powerful solution that I don't remember.

IDK about PHP and Perl, barely touched them. Maybe they're worse than Python for this. Everything else isn't. Python was not originally built with these use cases in mind, which is totally fine, but I'm not going to pick Python if I'm doing complex concurrency/parallelism. For simple process pools, Python is good enough.

paulddraper2y ago

While I'm painting with broad brush, I'll guess that the parent divides languages into two categories "Go" and "the worst".

1 more reply

pdimitar2y ago· 4 in thread

Does not seem exactly like an easy way to me. Not super hard, surely, but not "easy". More like "moderately easy to do and a bit annoying to implement".

Probably 20% of the effort shown in this post could have been expended to just write something very similar in Golang, and it would have taken less time, too. Because the way I see it this is trying to emulate futures / promises (and it looks like it's succeeding, at least on the surface). That can spiral out of comfortable maintainable code territory pretty quickly.

But especially for something as trivial as a crawler, I don't see the appeal of Python. You got a good deal of languages with lower friction for doing parallel stuff nowadays (Golang, Elixir, Rust if you want to cry a bit, hell, even Lua has some parallel libraries nowadays, Zig, Nim...).

simonw2y ago

If you already know Python, the advice in this article is certainly a lot easier and more actionable than "just learn Go or Rust or Zig instead".

pdimitar2y ago

Certainly. My point is that if you need to write that much code and/or do that much research, at one point the effort of doing it in another language will be less than to keep insisting on using a tool that's not designed for it.

It happened with me and many other former colleagues.

Though obviously, everyone decides for themselves when does that point come -- or if it comes at all.

4 more replies

rich_sasha2y ago

Python is surprisingly bad at parallelism, for a data or framing workhorse.

What TFA doesn't say is that process pools are quite fragile, certainly on Mac and Windows, but Linux also. They rely on pickling which is also fragile.

That said, asyncio works surprisingly well if what you want is non-blocking execution and are happy with 1 cpu. But no parallel speed up.

rmbyrro2y ago

After learning clojure, I found python's approach to concurrency terrible at best. Clojure is extremely easy to understand. It has basically three solutions, each for clear and defined use cases. It's much easier to judge what you should implement given a particular problem and how to do it.

I wish Python had similar solutions.

smallerfish2y ago· 3 in thread

Maybe I missed it, but how do the threads circumvent the GIL?

> When a request is waiting on the network, another thread is executing.

I'm guessing this is the meat, but what controls that? What other operations allow the GIL to switch to another thread?

ameliaquining2y ago

Python functions implemented in C can release the GIL when they're doing something that doesn't directly involve manipulating Python objects, and then re-acquire it when they're done: https://docs.python.org/3/c-api/init.html#thread-state-and-t...

All I/O functions in the standard library do this when blocked.

hot_gril2y ago

This is a far better explanation than the usual opaque "it's concurrent but not parallel" that I'd argue isn't even correct (cause two C calls on separate threads are running in parallel if they don't hold the GIL). Or "it's multithreading but not multiprocessing" which misses the point.

gpderetta2y ago

My understanding is that the GIL is typically released around blocking operations. Aside for allowing actual concurrency for I/O heavy programs, it would be a trivial way to deadlock if it wasn't.

potta_coffee2y ago· 2 in thread

If I need concurrency these days, I just write it in Golang. My primary use for Python was one off scripts for cloud management / automation tasks. Today I write maybe 70% Golang and 30% Python.

cpach2y ago

I agree. The team behind Go has thought a lot about concurrency right from the start, and it really shows.

potta_coffee2y ago

Concurrency in Go is just so easy and powerful.

eachro2y ago· 2 in thread

So what is the consensus view on how to do parallelism in python if you just have something that is embarassingly parallel with no communication between processes necessary?

hot_gril2y ago

People here mention Pool, and I've seen it many times. It's this: https://docs.python.org/3/library/multiprocessing.html#intro...

  from multiprocessing import Pool

  def f(x):
      return x*x

  if __name__ == '__main__':
      with Pool(5) as p:
          print(p.map(f, [1, 2, 3]))

This forks out up to 5 processes. f(x) runs fully in parallel for each input. The inputs and outputs sent between processes via pickling.

WinLychee2y ago

if you have a task that is easy to split, make a python script that runs on a subset of the task, split into N subsets, and write one output per process? Once they all complete, join together the outputs. Maybe https://docs.dask.org/en/stable/ is a good start if you want a framework. I don't think there's a consensus, it depends on the problem.

akasakahakada2y ago· 2 in thread

Don't see MPI. Can skip this article.

crabbone2y ago

MPI is not in the "standard" library, or am I behind the moving fast and break things?

Iwan-Zotow2y ago

Stdlib is keyword

hleszek2y ago· 2 in thread

The easiest and modern way is simply to use asyncio...

diarrhea2y ago

Which is concurrent but not parallel.

crabbone2y ago

Asyncio is about reading from multiple sockets. It is not a tool for dealing with concurrency in general in programming.

samsquire2y ago· 1 in thread

Thank you for the article.

I use multiprocessing and I am looking forward to the GIL removal.

I would really like library writers and parallelism experts to think on modelling computation in such a way that arbitrary programs - written in this notation - can be sped up without thinking about async or parallelism or low level synchronization primitives spreading throughout the codebase, increasing its cognitive load for everybody.

If you're doing business programming and you're using python Threads or Processes directly, I think we're operating against the wrong level of abstraction because our tools are not sufficiently abstract enough. (it's not your error, it's just not ideal where our industry is at)

I am not an expert but parallelism, coroutines, async is my hobby that I journal about all the time. I think a good approach to parallelism is to split you program into a tree dataflow and never synchronize. Shard everything.

If I have a single integer value that I want to scale throughput of updates to it by × hardware threads in my multicore and SMT CPU, I can split the integer by that number and apply updates in parallel. (You have £1000 in a bank account and 8 hardware threads you split the account into 8 bank accounts and each store £125, then you can serve 8 transactions simultaneously at a time) Then periodically, those threads can post their value to another buffer (ringbuffer) and then a thread that services that ringbuffer can sum them all for a global view. This provides an eventually consistent view of an integer without slowing down throughput.

Unfortunately multithreading becomes a distributed system and then you need consensus.

I am working on barriers inspired by bulk synchronous parallel where you have parallel phases and synchronization phases and an async pipeline syntax (see my previous HN comments for notes on this async syntax)

My goal would be that business logic can be parallelised without you needing to worry about synchronization.

wongarsu2y ago

At least in what I do, I find 80% of my parallelism needs covered by pool.map/pool.imap_unordered. Of the remaining 20%, 80% can mostly be solved by communicating through queues or channels (though admittedly this is smoother in Erlang or Rust than in Python).

Of course that's not true for everything, and depending on the domain tree dataflows can also be great. I remember them being very popular in GPGPU tasks because synchronization is very costly there.

tedivm2y ago

I know this article is all about the stdlib, but having built multiple multiprocess applications with python I eventually built a library, QuasiQueue to simplify the process. I've written a few applications with it already.

https://github.com/tedivm/quasiqueue

capital_guy2y ago

This is a really nice little guide. Much thanks to the author. Sometimes you just need to hit a bunch of APIs independently and don't want to switch your entire architecture around to do so.

Lukeisun2y ago

Awesome article, use it a lot in a python project at work and it's quite nice how simple it is. I'm trying to replicate the python code but in Rust and it is slightly slower, more than likely my fault though as I'm new to Rust.

slig2y ago

Is there a way to add tasks with independent timeouts using only the Python stdlib? I was reading a piece of code yesterday that had `pebble` as dependency and it looked like it was only needed for the `pool.schedule(..., timeout=1)`.

hot_gril2y ago

The article shows how to use ThreadPoolExecutor, but that's not fully parallel. For that, you need multiprocessing.Pool, which is slightly easier to use anyway, unless your data happens to be non-pickle-able.

thisisauserid2y ago

When dinking around in Ipython you need to use a fork for the "multiprocessing" library called "multiprocess."

Parallelism in a Notebook isn't for everyone, but how would these changes affect it?

j / k navigate · click thread line to collapse

68 comments

49 comments · 15 top-level

cle2y ago· 11 in thread

For example, some high-level options include Popen, multiprocessing.Process, multiprocessing.Pool, futures.ProcessPoolExecutor, and huge frameworks like Ray.

What a nightmarish mess! So much for "There should be one--and preferably only one--obvious way to do it"...my God.

(I probably got some details wrong in the above rant, because there are so many to keep track of...)

nerdponx2y ago

To be clear, Popen is very different from all the other options. That's for running other programs.

hot_gril2y ago

1 more reply

xeromal2y ago

nerdponx2y ago

For Pandas, I recommend the third-party Joblib library: https://joblib.readthedocs.io/en/latest/

1 more reply

p5a0u9l2y ago

The mess more reflects supporting a programmatic interface to processes in a cross platform manner, coupled with the actual complexity of parallel processing.

You didn’t mention the recommended high level option for subprocess, ‘subprocess. run’.

cle2y ago

There are other things I didn't mention that get thrown around too such as os.system() and os.fork().

1 more reply

hot_gril2y ago

Other high-level languages do a better job with this.

mixmastamyk2y ago

This off topic rant is the top comment? Really?

hot_gril2y ago

Title: "An easy way to concurrency and parallelism with Python"

Content: basically how to use ThreadPoolExecutor

Comment: Concurrency and parallelism aren't easy in Python.

How is this off-topic?

1 more reply

hot_gril2y ago

I like Python in general, but I avoid it for any kind of concurrent programming other than simple fan-out-fan-in.

seabrookmx2y ago

2 more replies

crabbone2y ago· 7 in thread

> For those, Python actually comes with pretty decent tools: the pool executors.

Delusion level: max.

You have to be in a very, very bad place when this marginal improvement over absolute horror-show that bare Process offers seemed "pretty decent".

paulddraper2y ago

> It doesn't have even bad tools. It has the worst.

> It's not the only language in this category

Soo....not the worst? :) Or tied for it?

What do you find difficult/wrong with pool executors?

Also, you reference "Process", but FYI the article talks about multiple threads, not multiple processes.

hot_gril2y ago

Pool executors only solve one kind of use case. They aren't a general solution to concurrency+parallelism.

crabbone2y ago

> Soo....not the worst? :)

> What do you find difficult/wrong with pool executors?

Difficult? -- I don't know.

Wrong? -- Well, it's pretty worthless... does it make it wrong? -- That's up to you to decide.

Demiurge2y ago

If it's the worst, how is it not the only language in this category?

How do you rank C, Perl, JavaScript, PHP, ... parallelism compared to execution pool + futures here? The absolute MAX WORST?

crabbone2y ago

Trivially, in a collection that has no "worse than" relation you can define one that doesn't compare them at all, and declares them all "incomparable" -- which, again, would make them all worst.

Bonus question: can you imagine a collection where there is no worst element?

> How do you rank C, Perl, JavaScript, PHP

So... all of these will go roughly into the same bin as Python?

Some languages have libraries that would allow them to do better (eg. you have PThreads in C), but that's not the function of the language.

hot_gril2y ago

paulddraper2y ago

While I'm painting with broad brush, I'll guess that the parent divides languages into two categories "Go" and "the worst".

1 more reply

pdimitar2y ago· 4 in thread

Does not seem exactly like an easy way to me. Not super hard, surely, but not "easy". More like "moderately easy to do and a bit annoying to implement".

simonw2y ago

If you already know Python, the advice in this article is certainly a lot easier and more actionable than "just learn Go or Rust or Zig instead".

pdimitar2y ago

It happened with me and many other former colleagues.

Though obviously, everyone decides for themselves when does that point come -- or if it comes at all.

4 more replies

rich_sasha2y ago

Python is surprisingly bad at parallelism, for a data or framing workhorse.

What TFA doesn't say is that process pools are quite fragile, certainly on Mac and Windows, but Linux also. They rely on pickling which is also fragile.

That said, asyncio works surprisingly well if what you want is non-blocking execution and are happy with 1 cpu. But no parallel speed up.

rmbyrro2y ago

I wish Python had similar solutions.

smallerfish2y ago· 3 in thread

Maybe I missed it, but how do the threads circumvent the GIL?

> When a request is waiting on the network, another thread is executing.

I'm guessing this is the meat, but what controls that? What other operations allow the GIL to switch to another thread?

ameliaquining2y ago

All I/O functions in the standard library do this when blocked.

hot_gril2y ago

gpderetta2y ago

My understanding is that the GIL is typically released around blocking operations. Aside for allowing actual concurrency for I/O heavy programs, it would be a trivial way to deadlock if it wasn't.

potta_coffee2y ago· 2 in thread

If I need concurrency these days, I just write it in Golang. My primary use for Python was one off scripts for cloud management / automation tasks. Today I write maybe 70% Golang and 30% Python.

cpach2y ago

I agree. The team behind Go has thought a lot about concurrency right from the start, and it really shows.

potta_coffee2y ago

Concurrency in Go is just so easy and powerful.

eachro2y ago· 2 in thread

So what is the consensus view on how to do parallelism in python if you just have something that is embarassingly parallel with no communication between processes necessary?

hot_gril2y ago

People here mention Pool, and I've seen it many times. It's this: https://docs.python.org/3/library/multiprocessing.html#intro...

  from multiprocessing import Pool

  def f(x):
      return x*x

  if __name__ == '__main__':
      with Pool(5) as p:
          print(p.map(f, [1, 2, 3]))

This forks out up to 5 processes. f(x) runs fully in parallel for each input. The inputs and outputs sent between processes via pickling.

WinLychee2y ago

akasakahakada2y ago· 2 in thread

Don't see MPI. Can skip this article.

crabbone2y ago

MPI is not in the "standard" library, or am I behind the moving fast and break things?

Iwan-Zotow2y ago

Stdlib is keyword

hleszek2y ago· 2 in thread

The easiest and modern way is simply to use asyncio...

diarrhea2y ago

Which is concurrent but not parallel.

crabbone2y ago

Asyncio is about reading from multiple sockets. It is not a tool for dealing with concurrency in general in programming.

samsquire2y ago· 1 in thread

Thank you for the article.

I use multiprocessing and I am looking forward to the GIL removal.

Unfortunately multithreading becomes a distributed system and then you need consensus.

My goal would be that business logic can be parallelised without you needing to worry about synchronization.

wongarsu2y ago

Of course that's not true for everything, and depending on the domain tree dataflows can also be great. I remember them being very popular in GPGPU tasks because synchronization is very costly there.

tedivm2y ago

https://github.com/tedivm/quasiqueue

capital_guy2y ago

This is a really nice little guide. Much thanks to the author. Sometimes you just need to hit a bunch of APIs independently and don't want to switch your entire architecture around to do so.

Lukeisun2y ago

slig2y ago

hot_gril2y ago

thisisauserid2y ago

When dinking around in Ipython you need to use a fork for the "multiprocessing" library called "multiprocess."

Parallelism in a Notebook isn't for everyone, but how would these changes affect it?

j / k navigate · click thread line to collapse