Adding a Par construct to Python?

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sun May 17 17:53:56 CEST 2009


On Sun, 17 May 2009 09:26:35 -0500, Grant Edwards wrote:

> On 2009-05-17, Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au>
> wrote:
>> On Sun, 17 May 2009 05:05:03 -0700, jeremy wrote:
>>
>>> From a user point of view I think that adding a 'par' construct to
>>> Python for parallel loops would add a lot of power and simplicity,
>>> e.g.
>>> 
>>> par i in list:
>>>     updatePartition(i)
>>> 
>>> There would be no locking and it would be the programmer's
>>> responsibility to ensure that the loop was truly parallel and correct.
>>
>> What does 'par' actually do there?
> 
> My reading of the OP is that it tells the interpreter that it can
> execute any/all iterations of updatePartion(i) in parallel (or
> presumably serially in any order) rather than serially in a strict
> sequence.
> 
>> Given that it is the programmer's responsibility to ensure that
>> updatePartition was actually parallelized, couldn't that be written as:
>>
>> for i in list:
>>     updatePartition(i)
>>
>> and save a keyword?
> 
> No, because a "for" loop is defined to execute it's iterations serially
> in a specific order.  OTOH, a "par" loop is required to execute once for
> each value, but those executions could happen in parallel or in any
> order.
> 
> At least that's how I understood the OP.

I can try guessing what the OP is thinking just as well as anyone else, 
but "in the face of ambiguity, refuse the temptation to guess" :)

It isn't clear to me what the OP expects the "par" construct is supposed 
to actually do. Does it create a thread for each iteration? A process? 
Something else? Given that the rest of Python will be sequential (apart 
from explicitly parallelized functions), and that the OP specifies that 
updatePartition still needs to handle its own parallelization, does it 
really matter if the calls to updatePartition happen sequentially?

If it's important to make the calls in arbitrary order, random.shuffle 
will do that. If there's some other non-sequential and non-random order 
to the calls, the OP should explain what it is. What else, if anything, 
does par do, that it needs to be a keyword and statement rather than a 
function? What does it do that (say) a parallel version of map() wouldn't 
do?

The OP also suggested:

"There could also be parallel versions of map, filter and reduce
provided."

It makes sense to talk about parallelizing map(), because you can 
allocate a list of the right size to slot the results into as they become 
available. I'm not so sure about filter(), unless you give up the 
requirement that the filtered results occur in the same order as the 
originals.

But reduce()? I can't see how you can parallelize reduce(). By its 
nature, it has to run sequentially: it can't operate on the nth item 
until it is operated on the (n-1)th item.



-- 
Steven



More information about the Python-list mailing list