[Python-ideas] multiprocessing IPC

Sturla Molden sturla at molden.no
Sat Feb 11 00:36:15 CET 2012


Den 10.02.2012 22:15, skrev Mike Meyer:
> In what way does the mmap module fail to provide your binary file 
> interface? <mike 

The short answer is that BSD mmap creates an anonymous kernel object. 
When working with multiprocessing for a while, one comes to the 
conclusion that we really need named kernel objects.

Here are two simple fail cases for anonymous kernel objects:

- Process A spawns/forks process B.
- Process B creates an object, one of the attributes is a lock.
- Fail: This object cannot be communicated back to process A. B inherits 
from A, A does not inherit from B.

- Process A spawns/forks a process pool.
- Process A creates an object, one of the attributes is a lock.
- Fail: This object cannot be communicated to the pool. They do not 
inherit new handles from A after they are started.

All of multiprocessing's IPC classes suffer from this!

Solution:

Use named kernel objects for IPC, pickle the name.

I made a shared memory array for NumPy that workes like this -- 
implemented by memory mapping from the paging file on Windows, System V 
IPC on Linux. Underneath is an extension class that allocates a shared 
memory buffer. When pickled it encodes the kernel name, not its content, 
and unpickling opens the object given its name.

There is another drawback too:

The speed of pickle. For example, sharing NumPy arrays with pickle is 
not faster with shared memory. The overhead from pickle completely 
dominate the time needed for IPC . That is why I want a type specialized 
or a binary channel. Making this from the named shared memory class I 
already have is a no-brainer.

So that is my other objection against multiprocessing.

That is:

1. Object sharing by handle inheritance fails when kernel objects must 
be passed back to the parent process or to a process pool. We need IPC 
objects that have a name in the kernel, so they can be created and 
shared in retrospect.

2. IPC with multiprocessing is too slow due to pickle. We need something 
that does not use pickle. (E.g. shared memory, but not by means of 
mmap.) It might be that the pipe or socket in multiprocessing will do 
this (I have not looked at it carefully enough), but they still don't have

Proof of concept:

http://dl.dropbox.com/u/12464039/sharedmem-feb12-2009.zip

Dependency on Cython and NumPy should probably be removed, never mind 
that. Important part is this:

sharedmemory_sysv.pyx (Linux)
sharedmemory_win.pyx and ntqueryobject.c (Windows)

Finally, I'd like to say that I think Python's standard lib should 
support high-performance asynchronous I/O for concurrency. That is not 
poll/select (on Windows it does not even work properly). Rather, I want 
IOCP on Windows, epoll on Linux, and kqueue on Mac. (Yes I know about 
twisted.) There should also be a requirement that it works with 
multiprocessing. E.g. if we open a process pool, the processes should be 
able to use the same IOCP. In other words some highly scalable 
asynchronous I/O that works with multiprocessing.

So ... As far as I am concerned, the only thing worth keeping in 
multipricessing is multiprocessing.Process and multiprocessing.Pool. The 
rest doesn't do what we want.


Sturla




More information about the Python-ideas mailing list