For instance, does fork() copy the page of memory containing the array? I believe it's Copy-on-Write semantics, right? What happens when the parent process changes the array?
Then, how do Pipe and Queue send the array across processes? Do they also pickle and unpickle it? Use shared memory?
As you’ve pointed out fork() isn’t ideal for a number of reasons and in general it’s preferred to use torch tensors directly instead of numpy arrays so that you are not forced into using fork()
There’s also this write up which I found to be quite useful for details: https://ppwwyyxx.com/blog/2022/Demystify-RAM-Usage-in-Multip...
Yes and you just answered your question.
Multiprocessing is not needed when all of your handful subprocesses are just calling Numpy-code and release their gil anyways.
Also some/most Numpy functions are multithreaded (depending on the BLAS implementation, linked against), take advantage of that and schedule huge operations and just let the interpreter sit idle waiting for that result.
Look to go parallel for speed increase, burn a bunch of time and decide its way too hard and just wait the extra few minutes instead.
The code for example loads a file from disk in a separate process to have the main process available for number crunching. The problem: This data is then sent to the main process and another copy is created which takes time. Doing the same with threads is way faster, since data i/o operations are releasing the gil anyways.
$ plasma_store -m 1000000000 -s /tmp/plasma
Arrow arrays are like NumPy arrays but they're made for zero copy e.g. IPC Interprocess Communication. There's a dtype_backend kwarg to the Pandas DataFrame constructor and read_ methods:df = pandas.Dataframe(dtype_backend="arrow")
The Plasma In-Memory Object Store > Using Arrow and Pandas with Plasma > Storing Arrow Objects in Plasma https://arrow.apache.org/docs/dev/python/plasma.html#storing...
Streaming, Serialization, and IPC > https://arrow.apache.org/docs/python/ipc.html
"DuckDB quacks Arrow: A zero-copy data integration between Apache Arrow and DuckDB" (2021) https://duckdb.org/2021/12/03/duck-arrow.html
Plasma and e.g. DuckDB do zero copy so that there is no: unmarshal of the complete database file into RAM, table scan in order to unindexed query, and then marshal/unmarshal (serialize/deserialize) of the query result for the database driver (which could allow access to a DB/DBMS over a local pipe(), a TCP socket, HTTP(S), protobufs over HTTPS)
If the memory is held in RAM by the plasma store, I'm not sure which queries are possible to service by handing an object reference to where the full non-virtual table is allocated in RAM (on only one node)? Presumably if there's no filtering or transformation in the query, and the query does not access "virtual" or "materialized" tables