[Python-ideas] Loop manager syntax

Todd toddrjen at gmail.com
Wed Jul 29 13:16:22 CEST 2015


On Wed, Jul 29, 2015 at 11:47 AM, Andrew Barnert <abarnert at yahoo.com> wrote:

> On Jul 28, 2015, at 22:39, Todd <toddrjen at gmail.com> wrote:
>
> On Jul 28, 2015 7:02 PM, "Andrew Barnert" <abarnert at yahoo.com> wrote:
> >
> > On Jul 28, 2015, at 15:28, Todd <toddrjen at gmail.com> wrote:
> > >
> > > Following the discussion of the new "async" keyword, I think it would
> be useful to provide a generic way to alter the behavior of loops.  My idea
> is to allow a user to take control over the operation of a "for" or "while"
> loop.
> > >
> > > The basic idea is similar to context managers, where an object
> implementing certain magic methods, probably "__for__" and "__while__",
> could be placed in front of a "for" or "while" statement, respectively.
> This class would then be put in charge of carrying out the loop.  Due to
> the similarity to context managers, I am tentatively calling this a "loop
> manager".
> > >
> > > What originally prompted this idea was parallelization.  For example
> the "multiprocessing.Pool" class could act as a "for" loop manager,
> allowing you to do something like this:
> > >
> > >  >>> from multiprocessing import Pool
> > >  >>>
> > >  >>> Pool() for x in range(20):
> > >  ...    do_something
> > >  ...
> > >  >>>
> > >
> > > The body of the "for" loop would then be run in parallel.
> >
> > First, this code would create a Pool, use it, and leak it. And yes,
> sure, you could wrap this all in a with statement, but then the apparent
> niceness that seems to motivate the idea disappears.
> >
> > Second, what does the Pool.__for__ method get called with here? There's
> an iterable, a variable name that it has to somehow assign to in the
> calling function's scope, and some code (in what form exactly?) that it has
> to execute in that calling function's scope.
> >
>
> I wanted to avoid too much bikeshedding,
>
> But when you propose something this complicated, at least a sketch of the
> implementation is pretty much necessary, or nobody has any idea of the
> scope of what you're proposing. (Also, that sketch highlights all the
> pieces that Python is currently missing that would make this suggestion
> trivial, and I think some of those pieces are themselves interesting. And
> it would especially be a shame to do 90% of the work of building those new
> pieces just to implement this, but then not expose any of it.)
>

I thought it was better to first determine whether the idea had any merit
at all before getting into details.  If the idea turned out to be stupid
then there isn't much point getting into details.

> but my thinking for the "__for__" method is that it would be passed an
> iterator (not an iterable), a variable name, four dicts containing the
> local, nonlocal, higher enclosing, and global namespaces, and a
> function-like object. Mutating the dicts would NOT alter the corresponding
> namespaces.
>
> That's a lot more complicated than it at first sounds, partly because
> Python currently doesn't have any notion of the "higher enclosing
> namespace", partly because the implicit thing that represents that
> namespace is made of cells, which are only constructed when needed. In
> particular, Python has to be able to see, at compile time, that your
> function is accessing a variable from an outer function so that it can mark
> the outer function as a cell variable and the inner function as a free
> variable so they can be connected when the inner function is created at
> runtime. Comprehensions and lambdas can ignore most of this because they
> can't contain statements, but your implicit loop functions can, so the
> compiler has to look inside them. And then, at runtime, you have to do
> something different--because the cells aren't actually being passed in to
> the implicit function, only their values, you have to add free variables to
> the outer function to make the post-call "namespace merging" work. It's not
> like you couldn't come up with the right algorithm to do all of this the
> way you want, it's just that it's very different from anything Python
> currently does, so you have to design it in detail before you can be sure
> it makes sense, not just hand wave it.
>

Fair enough.


> Meanwhile, if mutating the dicts (that is, the inner function assigning to
> variables) doesn't affect the outer namespace, what about mutating the
> values in the dict? Do you deep-copy everything into the dicts, then
> "deep-update" back out later? If not, how does a line like "d[k] += 1"
> inside the loop end up working at all (where d is a global or nonlocal)?
>
>
The contents of the dict are deep-copied when passed to the loop manager.
Whether the loop manager deep copies those or not is up to it.

It may be possible to make the dicts directly represent the namespaces, and
thus have changes immediately reflected in the namespace.  I thought
keeping things well-isolated was more important, but it may not be worth
the trouble.


> Meanwhile, this scope semantic is very weird if it interacts with the rest
> of Python. For example, if you call a closure that was built outside, but
> you've modified one of the variables that closure uses, it'll be called the
> with old value, right? That would be more than a little confusing.
>
>
Where and when did you modify it?



> And finally, how does this copy-and-merge work in parallel? Let's say you
> do x=0 outside the loop, then inside the loop you do x+=i. So, the first
> iteration starts with x=0, changes it to x=1, and you merge back x=1. The
> second iteration starts with x=0, changes it to x=2, and you merge back
> x=2. This seems like a guaranteed race, with no way either the interpreter
> or the Pool.__for__ mechanism could even have a possibility of ending up
> with 3, much less guarantee it. I think what you actually need here is not
> copied and merged scopes, but something more like STM, which is a lot more
> complicated.
>

Again, this would be up to the loop handler.  In the case of
multiprocessing, each repetition will likely get x=0, and the last value
(whatever that is) will be returned at the end.  The documentation would
need to say that the behavior in such cases is non-deterministic and
shouldn't be counted on.


> In cases where one or more of the namespaces doesn't make sense the
> corresponding dict would be empty.  The method would return three dicts
> containing the local, nonlocal, and global namespaces, any or all of which
> could be empty.  Returning a non-empty dict in a case where the
> corresponding namespace doesn't make sense would raise an exception.  The
> interpreter would merge these dicts back into the corresponding namespaces.
>
> The function-like object would be passed four dicts corresponding to the
> same namespaces,  and would return a tuple of three dicts corresponding to
> the same namespaces.  The interpreter would again be responsible for
> initializing the function-like object's namespaces with the contents of the
> dicts and pulling out those namespaces at the end.
>
> What does this "function-like object" look like? How do you call it with
> these dicts in such a way that their locals, nonlocals, and globals get set
> up as the variables the function was compiled to expect? (For just locals
> and globals, you can get the code object out, and exec that instead of
> calling the function, as I suggested--but you clearly don't want that, so
> what do you want instead?) Again, this is a new feature that I think would
> be more complicated, and more broadly useful, than your suggested feature,
> so it ought to be specified.
>

I am not sure I am understanding the question.  Can you please explain in
more detail?


> In the case of yield, the returned tuple will have one additional element
> for the yielded value.
>
> How you do distinguish between "exited normally" and "yielded None"?
>

The first would return a length-3 tuple and the second would return a
length-4 tuple with the last element being "None".


> The interpreter would be in charge of remembering which yield it is at,
> but the function-like object would still be initialized with the namespaces
> provided by the method.  So any loop handler that allows yielding will need
> to be able to get the correct values in the namespace, failing to do so
> will raise an exception.  The function-like object always has an optional
> argument for injecting values into the yield, but passing anything to it
> when the function-like object is not at a yield that accepts a value would
> raise an exception.
>
> Well, you also need to handle throwing exceptions into the function-like
> object.
>
>
I am not sure what you mean by this.


> More importantly, if you think about this, you're proposing to have
> something like explicit generator states and/or continuations that can be
> manually passed around and called (and even constructed?). Together with
> your manual scopes (which can at least be merged into real scopes, if not
> allowing real scopes to be constructed), this gives you all kinds of cool
> things--I think you could write Stackless or greenlets in pure Python if
> you had this. But again, this is something we definitely don't have today,
> that has to be designed before you can just assume it for a new feature.
>

I considered that possibility, but it seemed overly complicated.  So at
least in my proposal right now there is no way to get at the generator
state.

> Returns and breaks will be exceptions, which contain the namespaces as
> extra data.
>
> Which, as I said, obviously means that you've created a new compiler mode
> that, among other things, compiled returns and breaks into raises.
>

 Correct, which is why I call this a "function-like object" rather than a
"function".

> Continues will work similar to returns in normal functions, causing the
> function to terminate normally and return the namespaces at the point the
> continue was encountered.
>
> The "__for__" class is in charge of putting the iterator values into the
> local namespace dict passed to the function-like object (or not), for
> determining what should be in the namespace dicts passed to the
> function-like object, and for figuring out what, if anything, should be in
> the namespace dicts returned at the end.
>
> Fine, but it's just filtering the four dicts the interpreter is magicking
> up, right? (And the dicts the function-like object is returning.) So the
> part the interpreter does is the interesting bit; the __for__ method will
> usually just be passing them along unchanged.
>

Maybe, maybe not.  In the case of parallel code, then no.  In this case of
serial code, well that depends on exactly what you want to do.  I think
that in many cases, messing with the namespace would be one of the big
advantages.


> How to deal with yields, breaks, and returns is up to the class designer.
> There is no reason all loop handlers would need to handle all possible loop
> behaviour.  It would be possible to catch and re-raise break or return
> exceptions, or simply not handle them at all, in cases where they shouldn't
> be used.  Similarly, a class could simply raise an exception if the
> function-like object tries to yield anything if yielding didn't make
> sense.
>
> But how does the class know whether, e.g., yielding from the calling
> function makes sense? It doesn't know what function(s) it's going to be
> called from.
>
>
I am not understanding the question.  Either there is a sane way to deal
with yields, in which case it would do so, or there isn't, in which case it
wouldn't.  Can you name a situation where allowing a yield would be sane in
some cases but not in others?


> And, again, _how_ can it handle yielding if it decides to do so? You can't
> write a method that's sometimes a generator function and sometimes not
> (depending on whether some function-like object returns a special value or
> not).
>

There are a couple of ways I thought of to deal with this.  One is that
there are two versions of each loop handler, one for generators and one for
regular loops.  The other is that loop handler can be passed an object that
it would push values to yield.


> While loop managers would be similar, except instead of a variable name
> and iterator it would be passed a second function-like object for the
> conditional and a tuple of variable names used in the conditional.
>
> And it has to manually do the LEGB rule to dynamically figure out where to
> look up those names?
>

If it doesn't want to alter them, then no it just passes along the
namespaces to the function-like object.  If it wants to alter them, then
yes.  That is, what, two or three lines of code?


>
> Also, you realize that, even if it uses exactly the same rules as the
> normal interpreter, the effect will be different, because the interpreter
> applies the rule partly at compile time, not completely dynamically, right?
>

Yes, if someone is using this feature they would need to be aware of the
effects it has.


> This function-like object would return a namespace dict for the local
> namespace and a boolean for the result of the conditional.  Ideally this
> namespace dict would be empty or None if it is identical to the input
> namespace.
>
> Why is it returning a namespace dict? An expression can't assign to a
> variable, so the namespace will always be identical.
>
>
An expression can, of course, call a mutating method, but unless you're
> suggesting another deep-copy/deep-update (or STM transaction) here, the
> value is already going to be mutated, and the dict won't help anyway.
>

It is up to the loop manager whether to deep copy or not.  It would, of
course, be possible for the loop manager to determine whether anything has
mutated, but I felt this approach would be more consistent.


> It would also be possible to have an alternative context manager
> implementation that works in the same way. It would just be passed
> namespace dicts and a function-like object and return namespace dicts.
>
> Sure, but it would be a much, much larger change than what we currently
> have, or even than what was initially proposed, and it would have to answer
> most of the same questions raised here. Which may be why nobody suggested
> that as an implementation for context managers in the first place.
>

Yes, I know.  Again, this would be a side benefit, rather than the primary
purpose of the proposal.

> > It would take a nontrivial change to the compiler to compile the body of
> the loop into a separate code object, but with assignments still counted in
> the outer scope's locals list, yield expressions still making the outer
> function into a generator function, etc. You'd need to invent this new
> scope object type (just passing local, nonlocal, global dicts won't work
> because you can have assignments inside a loop body).
>
> Right, this is why the loop handler is passed namespace dicts and returns
> namespace dicts.  Changes to any namespace will remain isolated until
> everything is done and the handler can determine what to do with them.
>
> > Making yield, yield from, and return act on the calling function is bad
> enough, but for the first two, you need some way to also resume into the
> loop code later.
>
> I think I addressed this.
>
> > But that still doesn't get you anywhere near what you need for this
> proposal, because your motivating example is trying to run the code in
> parallel. What exactly happens when one iteration does a break and 7 others
> are running at the same time? Do you change the semantics of break so it
> breaks "within a few iterations", or add a way to cancel existing
> iterations and roll back any changes they'd made to the scope, or...? And
> return and yield seem even more problematic here.
>
> In these cases it would probably just raise an exception telling you you
> can't use breaks or yields.
>
> > And, beyond the problems with concurrency, you have cross-process
> problems. For example, how do you pickle a live scope from one interpreter,
> pass it to another interpreter, and make it work on the first interpreter's
> scope?
>
> That is the whole point of passing namespace dicts around.
>
> OK, think about this:
>
> I have a list a=[0]*8.
>
> Now, for i in range(8), I set a[i] = i**2.
>
> If I do that with a pool for, each instance gets its own copy of the scope
> with its own copy of a, which is modifies. How does the interpreter or the
> Pool.__for__ merge those 8 separate copies of a back in those 8 separate
> namespaces back into the original namespace?
>

It would need to check if any of the elements are not the same as before
(using "is").


> > > However, there are other uses as well.  For example, Python has no
> "do...while" structure, because nobody has come up with a clean way to do
> it (and probably nobody ever will).  However, under this proposal it would
> be possible for a third-party package to implement a "while" loop manager
> that can provide this functionality:
> >
> > If someone can come up with a clean way to write this do object (even
> ignoring the fact that it appears to be a weird singleton global
> object--unless, contrary to other protocols, this one allows you do define
> the magic methods as @classmethods and then knows how to call them
> appropriately), why hasn't anyone come up with a clean way of writing a
> do...while structure? How would it be easier this way?
>
> It would be easier because it can be uglier.  The bar for new statements
> is necessarily much, much, much higher than for third-party packages.  I
> certainly wouldn't propose loop handlers solely or even primarily to allow
> do...while loops, this is more of a side benefit and an example of the
> sorts of variations on existing loop behaviour that would be possible.
>
> > > I think, like context managers, this would provide a great deal of
> flexibility to the language and allow a lot of useful behaviors.  Of course
> the syntax and details are just strawmen examples at this point, there may
> be much better syntaxes.  But I think the basic idea of being able to
> control a loop in a manner like this is important.
> >
> > The major difference between this proposal and context managers is that
> you want to be able to have the loop manager drive the execution of its
> suite, while a context manager can't do that; it just has __enter__ and
> __exit__ methods that get called before and after the suite is executed
> normally. That's how it avoids all of the problems here. Of course it still
> uses a change to the interpreter to allow the __exit__ method to get called
> as part of exception handling, but it's easy to see how you could have
> implemented it as a source transformation into a try/finally, in which case
> it wouldn't have needed any new interpreter functionality at all.
>
> Yes, that is why I said it was similar "in principle".  The implementation
> is different, but I think the concepts have a lot in common.
>
> But my point is that they're very different even in principle. One just
> supplies functions to get called around the suite, the other tries to
> control the way the suite is executed, using some kind of novel mechanism
> that still hasn't been designed.
>
> Perhaps it is better to say this is more similar to decorators, then.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150729/daa276e0/attachment-0001.html>


More information about the Python-ideas mailing list