Adding a Par construct to Python?

Sun May 17 12:19:15 EDT 2009

Steven D'Aprano wrote:
> On Sun, 17 May 2009 09:26:35 -0500, Grant Edwards wrote:
> 
>> On 2009-05-17, Steven D'Aprano <steve at REMOVE-THIS-cybersource.com.au>
>> wrote:
>>> On Sun, 17 May 2009 05:05:03 -0700, jeremy wrote:
>>>
>>>> From a user point of view I think that adding a 'par' construct to
>>>> Python for parallel loops would add a lot of power and simplicity,
>>>> e.g.
>>>>
>>>> par i in list:
>>>>     updatePartition(i)
>>>>
>>>> There would be no locking and it would be the programmer's
>>>> responsibility to ensure that the loop was truly parallel and correct.
>>> What does 'par' actually do there?
>> My reading of the OP is that it tells the interpreter that it can
>> execute any/all iterations of updatePartion(i) in parallel (or
>> presumably serially in any order) rather than serially in a strict
>> sequence.
>>
>>> Given that it is the programmer's responsibility to ensure that
>>> updatePartition was actually parallelized, couldn't that be written as:
>>>
>>> for i in list:
>>>     updatePartition(i)
>>>
>>> and save a keyword?
>> No, because a "for" loop is defined to execute it's iterations serially
>> in a specific order.  OTOH, a "par" loop is required to execute once for
>> each value, but those executions could happen in parallel or in any
>> order.
>>
>> At least that's how I understood the OP.
> 
> I can try guessing what the OP is thinking just as well as anyone else, 
> but "in the face of ambiguity, refuse the temptation to guess" :)
> 
> It isn't clear to me what the OP expects the "par" construct is supposed 
> to actually do. Does it create a thread for each iteration? A process? 
> Something else? Given that the rest of Python will be sequential (apart 
> from explicitly parallelized functions), and that the OP specifies that 
> updatePartition still needs to handle its own parallelization, does it 
> really matter if the calls to updatePartition happen sequentially?
> 
> If it's important to make the calls in arbitrary order, random.shuffle 
> will do that. If there's some other non-sequential and non-random order 
> to the calls, the OP should explain what it is. What else, if anything, 
> does par do, that it needs to be a keyword and statement rather than a 
> function? What does it do that (say) a parallel version of map() wouldn't 
> do?
> 
> The OP also suggested:
> 
> "There could also be parallel versions of map, filter and reduce
> provided."
> 
> It makes sense to talk about parallelizing map(), because you can 
> allocate a list of the right size to slot the results into as they become 
> available. I'm not so sure about filter(), unless you give up the 
> requirement that the filtered results occur in the same order as the 
> originals.
> 
> But reduce()? I can't see how you can parallelize reduce(). By its 
> nature, it has to run sequentially: it can't operate on the nth item 
> until it is operated on the (n-1)th item.
> 
It can calculate the items in parallel, but the final result must be
calculated sequence, although if the final operation is commutative then
some of them could be done in parallel.