Adding a Par construct to Python?

MRAB google at mrabarnett.plus.com
Sun May 17 12:19:15 EDT 2009


Steven D'Aprano wrote:
> On Sun, 17 May 2009 09:26:35 -0500, Grant Edwards wrote:
> 
>> On 2009-05-17, Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au>
>> wrote:
>>> On Sun, 17 May 2009 05:05:03 -0700, jeremy wrote:
>>>
>>>> From a user point of view I think that adding a 'par' construct to
>>>> Python for parallel loops would add a lot of power and simplicity,
>>>> e.g.
>>>>
>>>> par i in list:
>>>>     updatePartition(i)
>>>>
>>>> There would be no locking and it would be the programmer's
>>>> responsibility to ensure that the loop was truly parallel and correct.
>>> What does 'par' actually do there?
>> My reading of the OP is that it tells the interpreter that it can
>> execute any/all iterations of updatePartion(i) in parallel (or
>> presumably serially in any order) rather than serially in a strict
>> sequence.
>>
>>> Given that it is the programmer's responsibility to ensure that
>>> updatePartition was actually parallelized, couldn't that be written as:
>>>
>>> for i in list:
>>>     updatePartition(i)
>>>
>>> and save a keyword?
>> No, because a "for" loop is defined to execute it's iterations serially
>> in a specific order.  OTOH, a "par" loop is required to execute once for
>> each value, but those executions could happen in parallel or in any
>> order.
>>
>> At least that's how I understood the OP.
> 
> I can try guessing what the OP is thinking just as well as anyone else, 
> but "in the face of ambiguity, refuse the temptation to guess" :)
> 
> It isn't clear to me what the OP expects the "par" construct is supposed 
> to actually do. Does it create a thread for each iteration? A process? 
> Something else? Given that the rest of Python will be sequential (apart 
> from explicitly parallelized functions), and that the OP specifies that 
> updatePartition still needs to handle its own parallelization, does it 
> really matter if the calls to updatePartition happen sequentially?
> 
> If it's important to make the calls in arbitrary order, random.shuffle 
> will do that. If there's some other non-sequential and non-random order 
> to the calls, the OP should explain what it is. What else, if anything, 
> does par do, that it needs to be a keyword and statement rather than a 
> function? What does it do that (say) a parallel version of map() wouldn't 
> do?
> 
> The OP also suggested:
> 
> "There could also be parallel versions of map, filter and reduce
> provided."
> 
> It makes sense to talk about parallelizing map(), because you can 
> allocate a list of the right size to slot the results into as they become 
> available. I'm not so sure about filter(), unless you give up the 
> requirement that the filtered results occur in the same order as the 
> originals.
> 
> But reduce()? I can't see how you can parallelize reduce(). By its 
> nature, it has to run sequentially: it can't operate on the nth item 
> until it is operated on the (n-1)th item.
> 
It can calculate the items in parallel, but the final result must be
calculated sequence, although if the final operation is commutative then
some of them could be done in parallel.



More information about the Python-list mailing list