2.6, 3.0, and truly independent intepreters

Andy O'Meara andy55 at gmail.com
Sat Oct 25 01:50:26 CEST 2008

> Are you familiar with the API at all? Multiprocessing was designed to
> mimic threading in about every way possible, the only restriction on
> shared data is that it must be serializable, but event then you can
> override or customize the behavior.
> Also, inter process communication is done via pipes. It can also be
> done with messages if you want to tweak the manager(s).

I apologize in advance if I don't understand something correctly, but
as I understand them, everything has to be serialized in order to go
through IPC.  So when you're talking about thousands of objects,
buffers, and/or large OS opaque objects (e.g. memory-resident video
and images), that seems like a pretty rough hit of run-time resources.

Please don't misunderstand my comments to suggest that multiprocessing
isn't great stuff.  On the contrary, it's very impressive and it
singlehandedly catapults python *way* closer to efficient CPU bound
processing than it ever was before.  All I mean to say is that in the
case where using a shared address space with a worker pthread per
spare core to do CPU bound work, it's a really big win not to have to
serialize stuff.  And in the case of hundreds of megs of data and/or
thousands of data structure instances, it's a deal breaker to
serialize and unserialize everything just so that it can be sent
though IPC.  It's a deal breaker for most performance-centric apps
because of the unnecessary runtime resource hit and because now all
those data structures being passed around have to have accompanying
serialization code written (and maintained) for them.   That's
actually what I meant when I made the comment that a high level sync
object in a shared address space is "better" then sending it all
through IPC (when the data sets are wild and crazy).  From a C/C++
point of view, I would venture to say that it's always a huge win to
just stick those "embarrassingly easy" parallelization cases into the
thread with a sync object than forking and using IPC and having to
write all the serialization code. And in the case of huge data types--
such as video or image rendering--it makes me nervous to think of
serializing it all just so it can go through IPC when it could just be
passed using a pointer change and a single sync object.

So, if I'm missing something and there's a way so pass data structures
without serialization, then I'd definitely like to learn more (sorry
in advance if I missed something there).  When I took a look at
multiprocessing my concerns where:
   - serialization (discussed above)
   - maturity (are we ready to bet the farm that mp is going to work
properly on the platforms we need it to?)

Again, I'm psyched that multiprocessing appeared in 2.6 and it's a
huge huge step in getting everyone to unlock the power of python!
But, then some of the tidbits described above are additional data
points for you and others to chew on.  I can tell you they're pretty
important points for any performance-centric software provider (us,
game developers--from EA to Ambrosia, and A/V production app
developers like Patrick).


More information about the Python-list mailing list