[Python-ideas] Python and Concurrency
Josiah Carlson
jcarlson at uci.edu
Sun Mar 25 20:03:06 CEST 2007
Talin <talin at acm.org> wrote:
>
> Aahz wrote:
> > On Sun, Mar 25, 2007, Talin wrote:
> >> Thinking more about this, it seems to me that discussions of syntax for
> >> doing parallel operations and nifty classes for synchronization are a
> >> bit premature. The real question, it seems to me, is how to get Python
> >> to operate concurrently at all.
> >
> > Maybe that's what it seems to you; to others of us who have been looking
> > at this problem for a while, the real question is how to get a better
> > multi-process control and IPC library in Python, preferably one that is
> > cross-platform. You can investigate that right now, and you don't even
> > need to discuss it with other people.
>
> If you mean some sort of inter-process messaging system, there are a
> number that already exist; I'd look at IPython and py-globus for starters.
>
> My feeling is that while such an approach is vastly easier from the
> standpoint of Python developers, and may be easier from the standpoint
> of a typical Python programmer, it doesn't actually solve the problem
> that I'm attempting to address, which is figuring out how to write
> client-side software that dynamically scales to the number of processors
> on the system.
At some point either the user or the system (Python) needs to figure out
that splitting up a sequential task into multiple parallel tasks is
productive. On the system end of things, that isn't easy. How much
money and time has been poured into C/C++ compiler development, and
about all they can auto parallelize (via vector operations) are things
like:
for (i=0;i<n;i++)
a[i] = b[i] OP c[i];
for a restricted set of OP and input types for a, b, and c.
Could Python do better than that? Sure, given enough time and research
money.
> My view is that while the number of algorithms that we have that can be
> efficiently parallelized in a fine-grained threading environment is
> small (compared to the total number of strictly sequential algorithms),
> the number of algorithms that can be adapted to heavy-weight,
> coarse-grained processes is much smaller still.
>
> For example, it is easy to imagine a quicksort routine where different
> threads are responsible for sorting various sub-partitions of the array.
> If this were to be done via processes, the overhead of marshalling and
> unmarshalling the array elements would completely swamp the benefits of
> making it concurrent.
But that algorithm wouldn't be used for sorting data on multiple
processors. A variant of mergesort would be used (distribute blocks
equally to processors, sort them individually, merge the results - in
parallel). But again, all of this relies on two things:
1. a method for executing multiple streams of instructions
simultaneously
2. a method of communication between the streams of instructions
Without significant work, #1 isn't possible using threads in Python. It
is trivial using processes.
Without work, #2 isn't "fast" using processes in Python. It is trivial
using threads. But here's the thing: with work, #2 can be made fast.
Using unix domain sockets (on linux, 3.4 ghz P4 Xeons, DDR2-PC4200
memory (you can get 50% faster memory nowadays)), I've been able to push
400 megs/second between processes. Maybe anonymous or named pipes, or
perhaps a shared mmap with some sort of synchronization would allow for
IPC to be cross platform and just about as fast.
The reason that I (and perhaps others) have been pushing for IPC is
because it is easier to solve than the removal of Pythons threading
limitations, with many of the same benefits, and even a few extra (being
able to distribute processes across different machines).
- Josiah
More information about the Python-ideas
mailing list