Adding a Par construct to Python?

Gary Herron gherron at islandtraining.com
Sun May 17 19:56:18 CEST 2009


MRAB wrote:
> Steven D'Aprano wrote:
>> On Sun, 17 May 2009 09:26:35 -0500, Grant Edwards wrote:
>>
>>> On 2009-05-17, Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au>
>>> wrote:
>>>> On Sun, 17 May 2009 05:05:03 -0700, jeremy wrote:
>>>>
>>>>> From a user point of view I think that adding a 'par' construct to
>>>>> Python for parallel loops would add a lot of power and simplicity,
>>>>> e.g.
>>>>>
>>>>> par i in list:
>>>>>     updatePartition(i)
>>>>>
>>>>> There would be no locking and it would be the programmer's
>>>>> responsibility to ensure that the loop was truly parallel and 
>>>>> correct.
>>>> What does 'par' actually do there?
>>> My reading of the OP is that it tells the interpreter that it can
>>> execute any/all iterations of updatePartion(i) in parallel (or
>>> presumably serially in any order) rather than serially in a strict
>>> sequence.
>>>
>>>> Given that it is the programmer's responsibility to ensure that
>>>> updatePartition was actually parallelized, couldn't that be written 
>>>> as:
>>>>
>>>> for i in list:
>>>>     updatePartition(i)
>>>>
>>>> and save a keyword?
>>> No, because a "for" loop is defined to execute it's iterations serially
>>> in a specific order.  OTOH, a "par" loop is required to execute once 
>>> for
>>> each value, but those executions could happen in parallel or in any
>>> order.
>>>
>>> At least that's how I understood the OP.
>>
>> I can try guessing what the OP is thinking just as well as anyone 
>> else, but "in the face of ambiguity, refuse the temptation to guess" :)
>>
>> It isn't clear to me what the OP expects the "par" construct is 
>> supposed to actually do. Does it create a thread for each iteration? 
>> A process? Something else? Given that the rest of Python will be 
>> sequential (apart from explicitly parallelized functions), and that 
>> the OP specifies that updatePartition still needs to handle its own 
>> parallelization, does it really matter if the calls to 
>> updatePartition happen sequentially?
>>
>> If it's important to make the calls in arbitrary order, 
>> random.shuffle will do that. If there's some other non-sequential and 
>> non-random order to the calls, the OP should explain what it is. What 
>> else, if anything, does par do, that it needs to be a keyword and 
>> statement rather than a function? What does it do that (say) a 
>> parallel version of map() wouldn't do?
>>
>> The OP also suggested:
>>
>> "There could also be parallel versions of map, filter and reduce
>> provided."
>>
>> It makes sense to talk about parallelizing map(), because you can 
>> allocate a list of the right size to slot the results into as they 
>> become available. I'm not so sure about filter(), unless you give up 
>> the requirement that the filtered results occur in the same order as 
>> the originals.
>>
>> But reduce()? I can't see how you can parallelize reduce(). By its 
>> nature, it has to run sequentially: it can't operate on the nth item 
>> until it is operated on the (n-1)th item.
>>
> It can calculate the items in parallel, but the final result must be
> calculated sequence, although if the final operation is commutative then
> some of them could be done in parallel.

That should read "associative" not "commutative".

For instance A+B+C+D could be calculated sequentially as implied by
  ((A+B)+C)+D
or with some parallelism as implied by
  (A+B)+(C+D)
That's an application of the associativity of addition.

Gary Herron





More information about the Python-list mailing list