[Python-ideas] multiprocessing IPC

Jesse Noller jnoller at gmail.com
Sat Feb 11 01:04:11 CET 2012



On Feb 10, 2012, at 6:36 PM, Sturla Molden <sturla at molden.no> wrote:

> Den 10.02.2012 22:15, skrev Mike Meyer:
>> In what way does the mmap module fail to provide your binary file interface? <mike 
> 
> The short answer is that BSD mmap creates an anonymous kernel object. When working with multiprocessing for a while, one comes to the conclusion that we really need named kernel objects.
> 
> Here are two simple fail cases for anonymous kernel objects:
> 
> - Process A spawns/forks process B.
> - Process B creates an object, one of the attributes is a lock.
> - Fail: This object cannot be communicated back to process A. B inherits from A, A does not inherit from B.
> 
> - Process A spawns/forks a process pool.
> - Process A creates an object, one of the attributes is a lock.
> - Fail: This object cannot be communicated to the pool. They do not inherit new handles from A after they are started.
> 
> All of multiprocessing's IPC classes suffer from this!
> 
> Solution:
> 
> Use named kernel objects for IPC, pickle the name.
> 
> I made a shared memory array for NumPy that workes like this -- implemented by memory mapping from the paging file on Windows, System V IPC on Linux. Underneath is an extension class that allocates a shared memory buffer. When pickled it encodes the kernel name, not its content, and unpickling opens the object given its name.
> 
> There is another drawback too:
> 
> The speed of pickle. For example, sharing NumPy arrays with pickle is not faster with shared memory. The overhead from pickle completely dominate the time needed for IPC . That is why I want a type specialized or a binary channel. Making this from the named shared memory class I already have is a no-brainer.
> 
> So that is my other objection against multiprocessing.
> 
> That is:
> 
> 1. Object sharing by handle inheritance fails when kernel objects must be passed back to the parent process or to a process pool. We need IPC objects that have a name in the kernel, so they can be created and shared in retrospect.
> 
> 2. IPC with multiprocessing is too slow due to pickle. We need something that does not use pickle. (E.g. shared memory, but not by means of mmap.) It might be that the pipe or socket in multiprocessing will do this (I have not looked at it carefully enough), but they still don't have
> 
> Proof of concept:
> 
> http://dl.dropbox.com/u/12464039/sharedmem-feb12-2009.zip
> 
> Dependency on Cython and NumPy should probably be removed, never mind that. Important part is this:
> 
> sharedmemory_sysv.pyx (Linux)
> sharedmemory_win.pyx and ntqueryobject.c (Windows)
> 
> Finally, I'd like to say that I think Python's standard lib should support high-performance asynchronous I/O for concurrency. That is not poll/select (on Windows it does not even work properly). Rather, I want IOCP on Windows, epoll on Linux, and kqueue on Mac. (Yes I know about twisted.) There should also be a requirement that it works with multiprocessing. E.g. if we open a process pool, the processes should be able to use the same IOCP. In other words some highly scalable asynchronous I/O that works with multiprocessing.
> 
> So ... As far as I am concerned, the only thing worth keeping in multipricessing is multiprocessing.Process and multiprocessing.Pool. The rest doesn't do what we want.
> 
> 
> Sturla
> 

Sturla,

I think I've talked to you before - patches to improve multiprocessing from you are definitely welcome, and needed.

I disagree with tossing as much out as you are suggesting - managers are pretty useful, for example, but the entire team and especially me would welcome patches to improve things.

Jesse


More information about the Python-ideas mailing list