Adding a Par construct to Python?

Wed May 20 04:19:50 EDT 2009

On 20 May, 03:43, Steven D'Aprano
<ste... at REMOVE.THIS.cybersource.com.au> wrote:
> On Tue, 19 May 2009 03:57:43 -0700, jeremy wrote:
> >> you want it so simple to use that amateurs can mechanically replace
> >> 'for' with 'par' in their code and everything will Just Work, no effort
> >> or thought required.
>
> > Yes I do want the par construction to be simple, but of course you can't
> > just replace a for loop with a par loop in the general case.
>
> But that's exactly what you said you wanted people to be able to do:
>
> "with my suggestion they could potentially get a massive speed up just by
> changing 'for' to 'par' or 'map' to 'pmap'."
>
> I am finding this conversation difficult because it seems to me you don't
> have a consistent set of requirements.
>
> > This issue
> > arises when people use OpenMP: you can take a correct piece of code, add
> > a comment to indicate that a loop is 'parallel', and if you get it wrong
> > the code with no longer work correctly.
>
> How will 'par' be any different? It won't magically turn code with
> deadlocks into bug-free code.
>
> > With my 'par' construct the
> > programmer's intention is made explicit in the code, rather than by a
> > compiler directive and so I think that is clearer than OpenMP.
>
> A compiler directive is just as clear about the programmer's intention as
> a keyword. Possibly even more so.
>
> #$ PARALLEL-LOOP
> for x in seq:
>     do(x)
>
> Seems pretty obvious to me. (Not that I'm suggesting compiler directives
> is a good solution to this problem.)
>
> > As I wrote before, concurrency is one of the hardest things for
> > professional programmers to grasp. For 'amateur' programmers we need to
> > make it as simple as possible,
>
> The problem is that "as simple as possible" is Not Very Simple. There's
> no getting around the fact that concurrency is inherently complex. In
> some special cases, you can keep it simple, e.g. parallel-map with a
> function that has no side-effects. But in the general case, no, you can't
> avoid dealing with the complexity, at least a little bit.
>
> > and I think that a parallel loop
> > construction and the dangers that lurk within would be reasonably
> > straightforward to explain: there are no locks to worry about, no
> > message passing.
>
> It's *already* easy to explain. And having explained it, you still need
> to do something about it. You can't just say "Oh well, I've had all the
> pitfalls explained to me, so now I don't have to actually do anything
> about avoiding those pitfalls". You still need to actually avoid them.
> For example, you can choose one of four tactics:
>
> (1) the loop construct deals with locking;
>
> (2) the caller deals with locking;
>
> (3) nobody deals with locking, therefore the code is buggy and risks
> deadlocks; or
>
> (4) the caller is responsible for making sure he never shares data while
> looping over it.
>
> I don't think I've missed any possibilities. You have to pick one of
> those four.
>
> > The only advanced concept is the 'sync' keyword, which
> > would be used to rendezvous all the threads. That would only be used to
> > speed up certain codes in order to avoid having to repeatedly shut down
> > and start up gangs of threads.
>
> So now you want a second keyword as well.
>
> --
> Steven

Hi Steven,

You wrote:

> I am finding this conversation difficult because it seems to me you don't have a consistent set of requirements".

I think that my position has actually been consistent throughout this
discussion about what I would like to achieve. However I have learned
more about the inner workings of python than I knew before which have
made it clear that it would be difficult to implement (in CPython at
least). And also I never intended to present this as a fait accompli -
the intention was to start a debate as we have been doing. You also
wrote

> So now you want a second keyword as well

I actually described the 'sync' keyword in my second email before
anybody else contributed.

I *do* actually know a bit about concurrency and would never imply
that *any* for loop could be converted to a parallel one. The
intention of my remark "with my suggestion they could potentially get
a massive speed up just by changing 'for' to 'par' or 'map' to
'pmap'." is that it could be applied in the particular circumstances
where there are no dependencies between different iterations of the
loop.

Regarding your implementation strategies, mine would be related to
this one:

> (4) the caller is responsible for making sure he never shares data while
> looping over it.

However in my world there is no problem with threads sharing data as
long as they do not change the values. So the actual rule would be
something like:

5. The caller is responsible for making sure that one iteration of the
parallel loop never tries to write to a variable that another
iteration may read, unless separated by a 'sync' event.

This shows why the sync event is needed - to avoid  race conditions on
shared variables. It is borrowed from the BSP paradigm - although that
is a distibuted memory approach. Without the sync clause, rule 5 would
just be the standard way of defining a parallelisable loop.

Jeremy

P.S. I have a couple of additional embellishments to share at this
stage:

1. Parallel slackness - this is a concept where one creates many more
threads than there are available cores to cover up for a load
imbalance. So you might use this in a case where the iterations you
wish to parallelise take a variable amount of time. I think my par
concept needs a mechanism for the user to specify how many threads to
create - overriding any decision made by Python, e.g.
par 100 i in list: # Create 100 threads no matter how many cores are
available.

2. Scope of the 'sync' command. It was pointed out to me by a
colleague that I need to define what happens with sync when there are
nested par loops. I think that it should be defined to apply to the
innermost par construct which encloses the statement.