Re: [Python-ideas] Python and Concurrency

March 25, 2007

      Talin <talin@acm.org> wrote:
...
Aahz wrote:
...
On Sun, Mar 25, 2007, Talin wrote:
...
Thinking more about this, it seems to me that discussions of syntax for 
doing parallel operations and nifty classes for synchronization are a 
bit premature. The real question, it seems to me, is how to get Python 
to operate concurrently at all.
Maybe that's what it seems to you; to others of us who have been looking
at this problem for a while, the real question is how to get a better
multi-process control and IPC library in Python, preferably one that is
cross-platform.  You can investigate that right now, and you don't even
need to discuss it with other people.
If you mean some sort of inter-process messaging system, there are a 
number that already exist; I'd look at IPython and py-globus for starters.
My feeling is that while such an approach is vastly easier from the 
standpoint of Python developers, and may be easier from the standpoint 
of a typical Python programmer, it doesn't actually solve the problem 
that I'm attempting to address, which is figuring out how to write 
client-side software that dynamically scales to the number of processors 
on the system.
At some point either the user or the system (Python) needs to figure out
that splitting up a sequential task into multiple parallel tasks is
productive.  On the system end of things, that isn't easy.  How much
money and time has been poured into C/C++ compiler development, and
about all they can auto parallelize (via vector operations) are things
like:
    for (i=0;i<n;i++)
        a[i] = b[i] OP c[i];
for a restricted set of OP and input types for a, b, and c.

Could Python do better than that?  Sure, given enough time and research
money.
...
My view is that while the number of algorithms that we have that can be 
efficiently parallelized in a fine-grained threading environment is 
small (compared to the total number of strictly sequential algorithms), 
the number of algorithms that can be adapted to heavy-weight, 
coarse-grained processes is much smaller still.
For example, it is easy to imagine a quicksort routine where different 
threads are responsible for sorting various sub-partitions of the array. 
If this were to be done via processes, the overhead of marshalling and 
unmarshalling the array elements would completely swamp the benefits of 
making it concurrent.
But that algorithm wouldn't be used for sorting data on multiple
processors. A variant of mergesort would be used (distribute blocks
equally to processors, sort them individually, merge the results - in
parallel).  But again, all of this relies on two things:

1. a method for executing multiple streams of instructions
simultaneously
2. a method of communication between the streams of instructions

Without significant work, #1 isn't possible using threads in Python.  It
is trivial using processes.

Without work, #2 isn't "fast" using processes in Python.  It is trivial
using threads.  But here's the thing: with work, #2 can be made fast. 
Using unix domain sockets (on linux, 3.4 ghz P4 Xeons, DDR2-PC4200
memory (you can get 50% faster memory nowadays)), I've been able to push
400 megs/second between processes.  Maybe anonymous or named pipes, or
perhaps a shared mmap with some sort of synchronization would allow for
IPC to be cross platform and just about as fast.

The reason that I (and perhaps others) have been pushing for IPC is
because it is easier to solve than the removal of Pythons threading
limitations, with many of the same benefits, and even a few extra (being
able to distribute processes across different machines).

 - Josiah

Re: [Python-ideas] Python and Concurrency

Josiah Carlson