Re: [Python-ideas] Python and Concurrency

March 25, 2007

      On Sun, Mar 25, 2007, Talin wrote:
...
Aahz wrote:
...
On Sun, Mar 25, 2007, Talin wrote:
...
Thinking more about this, it seems to me that discussions of syntax for 
doing parallel operations and nifty classes for synchronization are a 
bit premature. The real question, it seems to me, is how to get Python 
to operate concurrently at all.
Maybe that's what it seems to you; to others of us who have been looking
at this problem for a while, the real question is how to get a better
multi-process control and IPC library in Python, preferably one that is
cross-platform.  You can investigate that right now, and you don't even
need to discuss it with other people.
If you mean some sort of inter-process messaging system, there are a 
number that already exist; I'd look at IPython and py-globus for starters.
My feeling is that while such an approach is vastly easier from the 
standpoint of Python developers, and may be easier from the standpoint 
of a typical Python programmer, it doesn't actually solve the problem 
that I'm attempting to address, which is figuring out how to write 
client-side software that dynamically scales to the number of processors 
on the system.
How not?  Keep in mind that if this kind of library becomes part of the
Python Standard Library, the standard library can be written to use this
multi-process library.
...
My view is that while the number of algorithms that we have that can be 
efficiently parallelized in a fine-grained threading environment is 
small (compared to the total number of strictly sequential algorithms), 
the number of algorithms that can be adapted to heavy-weight, 
coarse-grained processes is much smaller still.
Maybe.  I'm not convinced, but see below.
...
For example, it is easy to imagine a quicksort routine where different 
threads are responsible for sorting various sub-partitions of the array. 
If this were to be done via processes, the overhead of marshalling and 
unmarshalling the array elements would completely swamp the benefits of 
making it concurrent.
The problem, though, is that Threading Doesn't Work for what you're
talking about.  SMP threading doesn't really scale when you're talking
about hundreds of CPUs.  This kind of problem really is better handled at
the library level: if it's worth splitting, the sort algorithm can figure
out how to do that.  (Whether it's threading or processes really doesn't
matter, the sort algorithm just calls an underlying library to manage it.
For example, it could put a lock around the list and the C library
releases the GIL to do its work.  As long as the overall sort() call was
synchronous, it should work.)  Generally speaking, it won't be worth
splitting for less than a million elements...
-- 
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

"Typing is cheap.  Thinking is expensive."  --Roy Smith