[Python-ideas] multiprocessing IPC

Mike Meyer mwm at mired.org
Sun Feb 12 02:52:00 CET 2012

pwdOn Sat, 11 Feb 2012 00:36:15 +0100
Sturla Molden <sturla at molden.no> wrote:
> Den 10.02.2012 22:15, skrev Mike Meyer:
> > In what way does the mmap module fail to provide your binary file 
> > interface? <mike 
> The short answer is that BSD mmap creates an anonymous kernel object. 

First, I didn't ask about "BSD mmap", I asked about the "mmap
module". They aren't the same thing.

> When working with multiprocessing for a while, one comes to the 
> conclusion that we really need named kernel objects.

And both the BSD mmap (at least in recent systems) and the mmap module
provide objects with names in the file system space. IIUC, while there
are systems that won't let you create anonymous objects (like early
versions of the mmap module), there aren't any - at least any longer -
that won't let you create named objects.

> Here are two simple fail cases for anonymous kernel objects:

[elided, since the restriction doesn't exist]

> All of multiprocessing's IPC classes suffer from this!

Some of them may. The one I asked about doesn't.

> Solution:
> Use named kernel objects for IPC, pickle the name.

You don't need to pickle the name if you use mmap's native name system
- it's just a string.

> There is another drawback too:
> The speed of pickle. For example, sharing NumPy arrays with pickle is 
> not faster with shared memory. The overhead from pickle completely 
> dominate the time needed for IPC . That is why I want a type specialized 
> or a binary channel. Making this from the named shared memory class I 
> already have is a no-brainer.

> So that is my other objection against multiprocessing.
> 1. Object sharing by handle inheritance fails when kernel objects must 
> be passed back to the parent process or to a process pool. We need IPC 
> objects that have a name in the kernel, so they can be created and 
> shared in retrospect.

We've already got that one. You just need to learn how to use it.

> 2. IPC with multiprocessing is too slow due to pickle. We need something 
> that does not use pickle. (E.g. shared memory, but not by means of 
> mmap.) It might be that the pipe or socket in multiprocessing will do 
> this (I have not looked at it carefully enough), but they still don't have

Since can use pickle, you're only dealing with small amounts of
data. There are better performing serialization tools available (or
they can easily be created if you have to deal with large amounts of
data), and those work fine for a large variety of problems. If they
aren't fast enough, neither a socket nor a pipe will solve the basic
issue of needing to serialize the data in order to communicate it.

This isn't a problem with mmap per se, and it's not a problem that
anything that can be accurately described as a "file" - as in your
"binary file interface" - is going to solve.

Mike Meyer <mwm at mired.org>		http://www.mired.org/
Independent Software developer/SCM consultant, email for more information.

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

More information about the Python-ideas mailing list