[Python-ideas] Loop manager syntax

Tue Jul 28 22:39:37 CEST 2015

On Jul 28, 2015 7:02 PM, "Andrew Barnert" <abarnert at yahoo.com> wrote:
>
> On Jul 28, 2015, at 15:28, Todd <toddrjen at gmail.com> wrote:
> >
> > Following the discussion of the new "async" keyword, I think it would
be useful to provide a generic way to alter the behavior of loops.  My idea
is to allow a user to take control over the operation of a "for" or "while"
loop.
> >
> > The basic idea is similar to context managers, where an object
implementing certain magic methods, probably "__for__" and "__while__",
could be placed in front of a "for" or "while" statement, respectively.
This class would then be put in charge of carrying out the loop.  Due to
the similarity to context managers, I am tentatively calling this a "loop
manager".
> >
> > What originally prompted this idea was parallelization.  For example
the "multiprocessing.Pool" class could act as a "for" loop manager,
allowing you to do something like this:
> >
> >  >>> from multiprocessing import Pool
> >  >>>
> >  >>> Pool() for x in range(20):
> >  ...    do_something
> >  ...
> >  >>>
> >
> > The body of the "for" loop would then be run in parallel.
>
> First, this code would create a Pool, use it, and leak it. And yes, sure,
you could wrap this all in a with statement, but then the apparent niceness
that seems to motivate the idea disappears.
>
> Second, what does the Pool.__for__ method get called with here? There's
an iterable, a variable name that it has to somehow assign to in the
calling function's scope, and some code (in what form exactly?) that it has
to execute in that calling function's scope.
>

I wanted to avoid too much bikeshedding, but my thinking for the "__for__"
method is that it would be passed an iterator (not an iterable), a variable
name, four dicts containing the local, nonlocal, higher enclosing, and
global namespaces, and a function-like object. Mutating the dicts would NOT
alter the corresponding namespaces.  In cases where one or more of the
namespaces doesn't make sense the corresponding dict would be empty.  The
method would return three dicts containing the local, nonlocal, and global
namespaces, any or all of which could be empty.  Returning a non-empty dict
in a case where the corresponding namespace doesn't make sense would raise
an exception.  The interpreter would merge these dicts back into the
corresponding namespaces.

The function-like object would be passed four dicts corresponding to the
same namespaces,  and would return a tuple of three dicts corresponding to
the same namespaces.  The interpreter would again be responsible for
initializing the function-like object's namespaces with the contents of the
dicts and pulling out those namespaces at the end.

In the case of yield, the returned tuple will have one additional element
for the yielded value. The interpreter would be in charge of remembering
which yield it is at, but the function-like object would still be
initialized with the namespaces provided by the method.  So any loop
handler that allows yielding will need to be able to get the correct values
in the namespace, failing to do so will raise an exception.  The
function-like object always has an optional argument for injecting values
into the yield, but passing anything to it when the function-like object is
not at a yield that accepts a value would raise an exception.

Returns and breaks will be exceptions, which contain the namespaces as
extra data.  Continues will work similar to returns in normal functions,
causing the function to terminate normally and return the namespaces at the
point the continue was encountered.

The "__for__" class is in charge of putting the iterator values into the
local namespace dict passed to the function-like object (or not), for
determining what should be in the namespace dicts passed to the
function-like object, and for figuring out what, if anything, should be in
the namespace dicts returned at the end.

How to deal with yields, breaks, and returns is up to the class designer.
There is no reason all loop handlers would need to handle all possible loop
behaviour.  It would be possible to catch and re-raise break or return
exceptions, or simply not handle them at all, in cases where they shouldn't
be used.  Similarly, a class could simply raise an exception if the
function-like object tries to yield anything if yielding didn't make
sense.

While loop managers would be similar, except instead of a variable name and
iterator it would be passed a second function-like object for the
conditional and a tuple of variable names used in the conditional.  This
function-like object would return a namespace dict for the local namespace
and a boolean for the result of the conditional.  Ideally this namespace
dict would be empty or None if it is identical to the input namespace.

It would also be possible to have an alternative context manager
implementation that works in the same way. It would just be passed
namespace dicts and a function-like object and return namespace dicts.

> It would take a nontrivial change to the compiler to compile the body of
the loop into a separate code object, but with assignments still counted in
the outer scope's locals list, yield expressions still making the outer
function into a generator function, etc. You'd need to invent this new
scope object type (just passing local, nonlocal, global dicts won't work
because you can have assignments inside a loop body).

Right, this is why the loop handler is passed namespace dicts and returns
namespace dicts.  Changes to any namespace will remain isolated until
everything is done and the handler can determine what to do with them.

> Making yield, yield from, and return act on the calling function is bad
enough, but for the first two, you need some way to also resume into the
loop code later.

I think I addressed this.

> But that still doesn't get you anywhere near what you need for this
proposal, because your motivating example is trying to run the code in
parallel. What exactly happens when one iteration does a break and 7 others
are running at the same time? Do you change the semantics of break so it
breaks "within a few iterations", or add a way to cancel existing
iterations and roll back any changes they'd made to the scope, or...? And
return and yield seem even more problematic here.

In these cases it would probably just raise an exception telling you you
can't use breaks or yields.

> And, beyond the problems with concurrency, you have cross-process
problems. For example, how do you pickle a live scope from one interpreter,
pass it to another interpreter, and make it work on the first interpreter's
scope?

That is the whole point of passing namespace dicts around.

> > However, there are other uses as well.  For example, Python has no
"do...while" structure, because nobody has come up with a clean way to do
it (and probably nobody ever will).  However, under this proposal it would
be possible for a third-party package to implement a "while" loop manager
that can provide this functionality:
>
> If someone can come up with a clean way to write this do object (even
ignoring the fact that it appears to be a weird singleton global
object--unless, contrary to other protocols, this one allows you do define
the magic methods as @classmethods and then knows how to call them
appropriately), why hasn't anyone come up with a clean way of writing a
do...while structure? How would it be easier this way?

It would be easier because it can be uglier.  The bar for new statements is
necessarily much, much, much higher than for third-party packages.  I
certainly wouldn't propose loop handlers solely or even primarily to allow
do...while loops, this is more of a side benefit and an example of the
sorts of variations on existing loop behaviour that would be possible.

> > I think, like context managers, this would provide a great deal of
flexibility to the language and allow a lot of useful behaviors.  Of course
the syntax and details are just strawmen examples at this point, there may
be much better syntaxes.  But I think the basic idea of being able to
control a loop in a manner like this is important.
>
> The major difference between this proposal and context managers is that
you want to be able to have the loop manager drive the execution of its
suite, while a context manager can't do that; it just has __enter__ and
__exit__ methods that get called before and after the suite is executed
normally. That's how it avoids all of the problems here. Of course it still
uses a change to the interpreter to allow the __exit__ method to get called
as part of exception handling, but it's easy to see how you could have
implemented it as a source transformation into a try/finally, in which case
it wouldn't have needed any new interpreter functionality at all.

Yes, that is why I said it was similar "in principle".  The implementation
is different, but I think the concepts have a lot in common.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150728/4d67bff4/attachment.html>