Loop manager syntax

Following the discussion of the new "async" keyword, I think it would be useful to provide a generic way to alter the behavior of loops. My idea is to allow a user to take control over the operation of a "for" or "while" loop. The basic idea is similar to context managers, where an object implementing certain magic methods, probably "__for__" and "__while__", could be placed in front of a "for" or "while" statement, respectively. This class would then be put in charge of carrying out the loop. Due to the similarity to context managers, I am tentatively calling this a "loop manager". What originally prompted this idea was parallelization. For example the "multiprocessing.Pool" class could act as a "for" loop manager, allowing you to do something like this:
The body of the "for" loop would then be run in parallel. However, there are other uses as well. For example, Python has no "do...while" structure, because nobody has come up with a clean way to do it (and probably nobody ever will). However, under this proposal it would be possible for a third-party package to implement a "while" loop manager that can provide this functionality:
The "do" class would just defer running the conditional until after executing the body of the "while" loop once. Another possible use-case would be to alter how the loop interacts with the surrounding namespace. It would be possible to limit the loop so only particular variables become part of the local namespace after the loop is finished, or just prevent the index from being preserved after a "for" loop is finished. I think, like context managers, this would provide a great deal of flexibility to the language and allow a lot of useful behaviors. Of course the syntax and details are just strawmen examples at this point, there may be much better syntaxes. But I think the basic idea of being able to control a loop in a manner like this is important.

On Jul 28, 2015, at 15:28, Todd <toddrjen@gmail.com> wrote:
First, this code would create a Pool, use it, and leak it. And yes, sure, you could wrap this all in a with statement, but then the apparent niceness that seems to motivate the idea disappears. Second, what does the Pool.__for__ method get called with here? There's an iterable, a variable name that it has to somehow assign to in the calling function's scope, and some code (in what form exactly?) that it has to execute in that calling function's scope. You could do something like this for the most trivial __for__ method: def __for__(self, scope: ScopeType, iterable: Iterable, name: str, code: CodeType): for x in iterable: scope.assign(name, x) try: exec(code, scope) except LoopBreak: break except LoopContinue: continue except LoopYield as y make calling function yield?! except LoopReturn as r: make calling function return?! It would take a nontrivial change to the compiler to compile the body of the loop into a separate code object, but with assignments still counted in the outer scope's locals list, yield expressions still making the outer function into a generator function, etc. You'd need to invent this new scope object type (just passing local, nonlocal, global dicts won't work because you can have assignments inside a loop body). Making yield, yield from, and return act on the calling function is bad enough, but for the first two, you need some way to also resume into the loop code later. If you designed a full "degenerate function" that solved all of these problems, I think that would be more useful than this proposal; different people have tried to come up with ways of doing that for making continuations for various custom-control-flow-without-macros purposes, and it doesn't seem like an easy problem. But that still doesn't get you anywhere near what you need for this proposal, because your motivating example is trying to run the code in parallel. What exactly happens when one iteration does a break and 7 others are running at the same time? Do you change the semantics of break so it breaks "within a few iterations", or add a way to cancel existing iterations and roll back any changes they'd made to the scope, or...? And return and yield seem even more problematic here. And, beyond the problems with concurrency, you have cross-process problems. For example, how do you pickle a live scope from one interpreter, pass it to another interpreter, and make it work on the first interpreter's scope? And that may not be all the problems you'd need to solve to turn this into a real proposal.
However, there are other uses as well. For example, Python has no "do...while" structure, because nobody has come up with a clean way to do it (and probably nobody ever will). However, under this proposal it would be possible for a third-party package to implement a "while" loop manager that can provide this functionality:
If someone can come up with a clean way to write this do object (even ignoring the fact that it appears to be a weird singleton global object--unless, contrary to other protocols, this one allows you do define the magic methods as @classmethods and then knows how to call them appropriately), why hasn't anyone come up with a clean way of writing a do...while structure? How would it be easier this way?
Just designing the scope object that would give you a way to do this sounds like a big enough proposal on its own. Maybe you could do this in CPython by exposing the LocalsToFast and FastToLocals methods on frame objects, adding a frame constructor, and then wrapping that up in something (in pure Python) that has a nicer API for the purpose and disguises the fact that you're actually passing around interpreter frames. You might even be able to pull off a test implementation without hacking the interpreter by using ctypes.pythonapi?
I think, like context managers, this would provide a great deal of flexibility to the language and allow a lot of useful behaviors. Of course the syntax and details are just strawmen examples at this point, there may be much better syntaxes. But I think the basic idea of being able to control a loop in a manner like this is important.
The major difference between this proposal and context managers is that you want to be able to have the loop manager drive the execution of its suite, while a context manager can't do that; it just has __enter__ and __exit__ methods that get called before and after the suite is executed normally. That's how it avoids all of the problems here. Of course it still uses a change to the interpreter to allow the __exit__ method to get called as part of exception handling, but it's easy to see how you could have implemented it as a source transformation into a try/finally, in which case it wouldn't have needed any new interpreter functionality at all. Maybe there's some way to rework your proposal into something that gets called to set up the loop, before and after the __next__ or expression test (with the after being passed the value and returning an optionally different value), and before and after each execution of the suite (the last two being very similar to what a context manager does). I don't see how any such thing could cause the suite to get executed in a process pool or in an isolated scope or any of your other motivating examples except the do...while simulator, but just because I'm not clever enough to see it doesn't mean you might not be.

I feel like you're overcomplicating the internals if this. My personal implementation idea: functions. Basically, the pool example would be equivalent to: temp = Pool() def func(x): do_something temp.__for__(range(20), func) so that __for__ could be implemented like: def __for__(self, iter, func): # Do something with the pool Yields and returns could be implicitly propagated. However, this *does* start treading into the "everything is magically implicit" territory of Ruby and Perl, which completely contradicts Python's zen. On July 28, 2015 12:02:09 PM CDT, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
-- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity.

On Jul 28, 2015, at 19:17, Ryan Gonzalez <rymg19@gmail.com> wrote:
I feel like you're overcomplicating the internals if this.
Only because the internals _have_ to be overcomplicated if this is going to work. Unless you can come up with something simpler that actually works. Your solution doesn't solve any of the problems, so being simpler doesn't really matter.
But now func has a different scope from the caller, so all of its assignments don't work. Also, it's illegal to put break or continue statements directly inside a function. So the inner function still has to be compiled in some special way. But what should outer break and continue get compiled to? It can't be the jump opcodes they normally become. And whatever you do, surely the break and continue have to be communicated to the controlling function, or it's not controlling the loop. Hence the LoopBreak and LoopContinue exceptions from my version. How can you simplify those away? (And this still doesn't answer the question of what break is supposed to do to a parallel loop.) And, while you say "yields and returns could be implicitly propagated", I'm not sure what that actually means semantically. For returns, how does the caller or the interpreter loop or anyone else even know whether the inner function did an explicit return (which has to get implicitly propagated) vs. just falling off the end (which can't be)? That's why I included the LoopReturn exception; I don't see how you can do without that either. If you want to make it more implicit, you can just make it so that if the controlling function doesn't handle LoopReturn, it's swallowed and the value of the LoopReturn is used as the value of the controlling function (not too different from StopIteration today), but that's adding more complexity onto my solution, not removing it. And yield and yield from have the same problems as return, plus the much more serious question of what it means to implicitly propagate them, and to implicitly propagate the next or send that continues the generator. Also the question of how the __for__ method gets the generator flag set at runtime depending on whether the function it's going to call has that flag set--or, if not that, how it can yield (implicitly or otherwise) without being a generator. And so on. Go through each of the problems I raised; your solution doesn't solve any of them, doesn't make any of them easier to solve, and makes some of them harder to solve. As I mentioned, this might be easier if you were trying to control comprehensions instead of for statements, but it also seems less useful there because it really adds nothing you can't do with an explicit function call like pool.map.

On Jul 28, 2015 7:02 PM, "Andrew Barnert" <abarnert@yahoo.com> wrote:
On Jul 28, 2015, at 15:28, Todd <toddrjen@gmail.com> wrote:
Following the discussion of the new "async" keyword, I think it would
What originally prompted this idea was parallelization. For example
be useful to provide a generic way to alter the behavior of loops. My idea is to allow a user to take control over the operation of a "for" or "while" loop. the similarity to context managers, I am tentatively calling this a "loop manager". the "multiprocessing.Pool" class could act as a "for" loop manager, allowing you to do something like this: that seems to motivate the idea disappears.
Second, what does the Pool.__for__ method get called with here? There's
an iterable, a variable name that it has to somehow assign to in the calling function's scope, and some code (in what form exactly?) that it has to execute in that calling function's scope.
It would take a nontrivial change to the compiler to compile the body of
I wanted to avoid too much bikeshedding, but my thinking for the "__for__" method is that it would be passed an iterator (not an iterable), a variable name, four dicts containing the local, nonlocal, higher enclosing, and global namespaces, and a function-like object. Mutating the dicts would NOT alter the corresponding namespaces. In cases where one or more of the namespaces doesn't make sense the corresponding dict would be empty. The method would return three dicts containing the local, nonlocal, and global namespaces, any or all of which could be empty. Returning a non-empty dict in a case where the corresponding namespace doesn't make sense would raise an exception. The interpreter would merge these dicts back into the corresponding namespaces. The function-like object would be passed four dicts corresponding to the same namespaces, and would return a tuple of three dicts corresponding to the same namespaces. The interpreter would again be responsible for initializing the function-like object's namespaces with the contents of the dicts and pulling out those namespaces at the end. In the case of yield, the returned tuple will have one additional element for the yielded value. The interpreter would be in charge of remembering which yield it is at, but the function-like object would still be initialized with the namespaces provided by the method. So any loop handler that allows yielding will need to be able to get the correct values in the namespace, failing to do so will raise an exception. The function-like object always has an optional argument for injecting values into the yield, but passing anything to it when the function-like object is not at a yield that accepts a value would raise an exception. Returns and breaks will be exceptions, which contain the namespaces as extra data. Continues will work similar to returns in normal functions, causing the function to terminate normally and return the namespaces at the point the continue was encountered. The "__for__" class is in charge of putting the iterator values into the local namespace dict passed to the function-like object (or not), for determining what should be in the namespace dicts passed to the function-like object, and for figuring out what, if anything, should be in the namespace dicts returned at the end. How to deal with yields, breaks, and returns is up to the class designer. There is no reason all loop handlers would need to handle all possible loop behaviour. It would be possible to catch and re-raise break or return exceptions, or simply not handle them at all, in cases where they shouldn't be used. Similarly, a class could simply raise an exception if the function-like object tries to yield anything if yielding didn't make sense. While loop managers would be similar, except instead of a variable name and iterator it would be passed a second function-like object for the conditional and a tuple of variable names used in the conditional. This function-like object would return a namespace dict for the local namespace and a boolean for the result of the conditional. Ideally this namespace dict would be empty or None if it is identical to the input namespace. It would also be possible to have an alternative context manager implementation that works in the same way. It would just be passed namespace dicts and a function-like object and return namespace dicts. the loop into a separate code object, but with assignments still counted in the outer scope's locals list, yield expressions still making the outer function into a generator function, etc. You'd need to invent this new scope object type (just passing local, nonlocal, global dicts won't work because you can have assignments inside a loop body). Right, this is why the loop handler is passed namespace dicts and returns namespace dicts. Changes to any namespace will remain isolated until everything is done and the handler can determine what to do with them.
But that still doesn't get you anywhere near what you need for this
And, beyond the problems with concurrency, you have cross-process
I think I addressed this. proposal, because your motivating example is trying to run the code in parallel. What exactly happens when one iteration does a break and 7 others are running at the same time? Do you change the semantics of break so it breaks "within a few iterations", or add a way to cancel existing iterations and roll back any changes they'd made to the scope, or...? And return and yield seem even more problematic here. In these cases it would probably just raise an exception telling you you can't use breaks or yields. problems. For example, how do you pickle a live scope from one interpreter, pass it to another interpreter, and make it work on the first interpreter's scope? That is the whole point of passing namespace dicts around. that can provide this functionality:
If someone can come up with a clean way to write this do object (even
I think, like context managers, this would provide a great deal of flexibility to the language and allow a lot of useful behaviors. Of course
ignoring the fact that it appears to be a weird singleton global object--unless, contrary to other protocols, this one allows you do define the magic methods as @classmethods and then knows how to call them appropriately), why hasn't anyone come up with a clean way of writing a do...while structure? How would it be easier this way? It would be easier because it can be uglier. The bar for new statements is necessarily much, much, much higher than for third-party packages. I certainly wouldn't propose loop handlers solely or even primarily to allow do...while loops, this is more of a side benefit and an example of the sorts of variations on existing loop behaviour that would be possible. the syntax and details are just strawmen examples at this point, there may be much better syntaxes. But I think the basic idea of being able to control a loop in a manner like this is important.
The major difference between this proposal and context managers is that
you want to be able to have the loop manager drive the execution of its suite, while a context manager can't do that; it just has __enter__ and __exit__ methods that get called before and after the suite is executed normally. That's how it avoids all of the problems here. Of course it still uses a change to the interpreter to allow the __exit__ method to get called as part of exception handling, but it's easy to see how you could have implemented it as a source transformation into a try/finally, in which case it wouldn't have needed any new interpreter functionality at all. Yes, that is why I said it was similar "in principle". The implementation is different, but I think the concepts have a lot in common.

On Jul 28, 2015, at 22:39, Todd <toddrjen@gmail.com> wrote:
But when you propose something this complicated, at least a sketch of the implementation is pretty much necessary, or nobody has any idea of the scope of what you're proposing. (Also, that sketch highlights all the pieces that Python is currently missing that would make this suggestion trivial, and I think some of those pieces are themselves interesting. And it would especially be a shame to do 90% of the work of building those new pieces just to implement this, but then not expose any of it.)
but my thinking for the "__for__" method is that it would be passed an iterator (not an iterable), a variable name, four dicts containing the local, nonlocal, higher enclosing, and global namespaces, and a function-like object. Mutating the dicts would NOT alter the corresponding namespaces.
That's a lot more complicated than it at first sounds, partly because Python currently doesn't have any notion of the "higher enclosing namespace", partly because the implicit thing that represents that namespace is made of cells, which are only constructed when needed. In particular, Python has to be able to see, at compile time, that your function is accessing a variable from an outer function so that it can mark the outer function as a cell variable and the inner function as a free variable so they can be connected when the inner function is created at runtime. Comprehensions and lambdas can ignore most of this because they can't contain statements, but your implicit loop functions can, so the compiler has to look inside them. And then, at runtime, you have to do something different--because the cells aren't actually being passed in to the implicit function, only their values, you have to add free variables to the outer function to make the post-call "namespace merging" work. It's not like you couldn't come up with the right algorithm to do all of this the way you want, it's just that it's very different from anything Python currently does, so you have to design it in detail before you can be sure it makes sense, not just hand wave it. Meanwhile, if mutating the dicts (that is, the inner function assigning to variables) doesn't affect the outer namespace, what about mutating the values in the dict? Do you deep-copy everything into the dicts, then "deep-update" back out later? If not, how does a line like "d[k] += 1" inside the loop end up working at all (where d is a global or nonlocal)? Meanwhile, this scope semantic is very weird if it interacts with the rest of Python. For example, if you call a closure that was built outside, but you've modified one of the variables that closure uses, it'll be called the with old value, right? That would be more than a little confusing. And finally, how does this copy-and-merge work in parallel? Let's say you do x=0 outside the loop, then inside the loop you do x+=i. So, the first iteration starts with x=0, changes it to x=1, and you merge back x=1. The second iteration starts with x=0, changes it to x=2, and you merge back x=2. This seems like a guaranteed race, with no way either the interpreter or the Pool.__for__ mechanism could even have a possibility of ending up with 3, much less guarantee it. I think what you actually need here is not copied and merged scopes, but something more like STM, which is a lot more complicated.
In cases where one or more of the namespaces doesn't make sense the corresponding dict would be empty. The method would return three dicts containing the local, nonlocal, and global namespaces, any or all of which could be empty. Returning a non-empty dict in a case where the corresponding namespace doesn't make sense would raise an exception. The interpreter would merge these dicts back into the corresponding namespaces.
The function-like object would be passed four dicts corresponding to the same namespaces, and would return a tuple of three dicts corresponding to the same namespaces. The interpreter would again be responsible for initializing the function-like object's namespaces with the contents of the dicts and pulling out those namespaces at the end.
What does this "function-like object" look like? How do you call it with these dicts in such a way that their locals, nonlocals, and globals get set up as the variables the function was compiled to expect? (For just locals and globals, you can get the code object out, and exec that instead of calling the function, as I suggested--but you clearly don't want that, so what do you want instead?) Again, this is a new feature that I think would be more complicated, and more broadly useful, than your suggested feature, so it ought to be specified.
In the case of yield, the returned tuple will have one additional element for the yielded value.
How you do distinguish between "exited normally" and "yielded None"?
The interpreter would be in charge of remembering which yield it is at, but the function-like object would still be initialized with the namespaces provided by the method. So any loop handler that allows yielding will need to be able to get the correct values in the namespace, failing to do so will raise an exception. The function-like object always has an optional argument for injecting values into the yield, but passing anything to it when the function-like object is not at a yield that accepts a value would raise an exception.
Well, you also need to handle throwing exceptions into the function-like object. More importantly, if you think about this, you're proposing to have something like explicit generator states and/or continuations that can be manually passed around and called (and even constructed?). Together with your manual scopes (which can at least be merged into real scopes, if not allowing real scopes to be constructed), this gives you all kinds of cool things--I think you could write Stackless or greenlets in pure Python if you had this. But again, this is something we definitely don't have today, that has to be designed before you can just assume it for a new feature.
Returns and breaks will be exceptions, which contain the namespaces as extra data.
Which, as I said, obviously means that you've created a new compiler mode that, among other things, compiled returns and breaks into raises.
Continues will work similar to returns in normal functions, causing the function to terminate normally and return the namespaces at the point the continue was encountered.
The "__for__" class is in charge of putting the iterator values into the local namespace dict passed to the function-like object (or not), for determining what should be in the namespace dicts passed to the function-like object, and for figuring out what, if anything, should be in the namespace dicts returned at the end.
Fine, but it's just filtering the four dicts the interpreter is magicking up, right? (And the dicts the function-like object is returning.) So the part the interpreter does is the interesting bit; the __for__ method will usually just be passing them along unchanged.
How to deal with yields, breaks, and returns is up to the class designer. There is no reason all loop handlers would need to handle all possible loop behaviour. It would be possible to catch and re-raise break or return exceptions, or simply not handle them at all, in cases where they shouldn't be used. Similarly, a class could simply raise an exception if the function-like object tries to yield anything if yielding didn't make sense.
But how does the class know whether, e.g., yielding from the calling function makes sense? It doesn't know what function(s) it's going to be called from. And, again, _how_ can it handle yielding if it decides to do so? You can't write a method that's sometimes a generator function and sometimes not (depending on whether some function-like object returns a special value or not).
While loop managers would be similar, except instead of a variable name and iterator it would be passed a second function-like object for the conditional and a tuple of variable names used in the conditional.
And it has to manually do the LEGB rule to dynamically figure out where to look up those names? Also, you realize that, even if it uses exactly the same rules as the normal interpreter, the effect will be different, because the interpreter applies the rule partly at compile time, not completely dynamically, right?
This function-like object would return a namespace dict for the local namespace and a boolean for the result of the conditional. Ideally this namespace dict would be empty or None if it is identical to the input namespace.
Why is it returning a namespace dict? An expression can't assign to a variable, so the namespace will always be identical. An expression can, of course, call a mutating method, but unless you're suggesting another deep-copy/deep-update (or STM transaction) here, the value is already going to be mutated, and the dict won't help anyway.
It would also be possible to have an alternative context manager implementation that works in the same way. It would just be passed namespace dicts and a function-like object and return namespace dicts.
Sure, but it would be a much, much larger change than what we currently have, or even than what was initially proposed, and it would have to answer most of the same questions raised here. Which may be why nobody suggested that as an implementation for context managers in the first place.
OK, think about this: I have a list a=[0]*8. Now, for i in range(8), I set a[i] = i**2. If I do that with a pool for, each instance gets its own copy of the scope with its own copy of a, which is modifies. How does the interpreter or the Pool.__for__ merge those 8 separate copies of a back in those 8 separate namespaces back into the original namespace?
But my point is that they're very different even in principle. One just supplies functions to get called around the suite, the other tries to control the way the suite is executed, using some kind of novel mechanism that still hasn't been designed.

On Wed, Jul 29, 2015 at 11:47 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
I thought it was better to first determine whether the idea had any merit at all before getting into details. If the idea turned out to be stupid then there isn't much point getting into details.
Fair enough.
The contents of the dict are deep-copied when passed to the loop manager. Whether the loop manager deep copies those or not is up to it. It may be possible to make the dicts directly represent the namespaces, and thus have changes immediately reflected in the namespace. I thought keeping things well-isolated was more important, but it may not be worth the trouble.
Where and when did you modify it?
Again, this would be up to the loop handler. In the case of multiprocessing, each repetition will likely get x=0, and the last value (whatever that is) will be returned at the end. The documentation would need to say that the behavior in such cases is non-deterministic and shouldn't be counted on.
I am not sure I am understanding the question. Can you please explain in more detail?
The first would return a length-3 tuple and the second would return a length-4 tuple with the last element being "None".
I am not sure what you mean by this.
I considered that possibility, but it seemed overly complicated. So at least in my proposal right now there is no way to get at the generator state.
Correct, which is why I call this a "function-like object" rather than a "function".
Maybe, maybe not. In the case of parallel code, then no. In this case of serial code, well that depends on exactly what you want to do. I think that in many cases, messing with the namespace would be one of the big advantages.
I am not understanding the question. Either there is a sane way to deal with yields, in which case it would do so, or there isn't, in which case it wouldn't. Can you name a situation where allowing a yield would be sane in some cases but not in others?
There are a couple of ways I thought of to deal with this. One is that there are two versions of each loop handler, one for generators and one for regular loops. The other is that loop handler can be passed an object that it would push values to yield.
If it doesn't want to alter them, then no it just passes along the namespaces to the function-like object. If it wants to alter them, then yes. That is, what, two or three lines of code?
Yes, if someone is using this feature they would need to be aware of the effects it has.
An expression can, of course, call a mutating method, but unless you're
suggesting another deep-copy/deep-update (or STM transaction) here, the value is already going to be mutated, and the dict won't help anyway.
It is up to the loop manager whether to deep copy or not. It would, of course, be possible for the loop manager to determine whether anything has mutated, but I felt this approach would be more consistent.
Yes, I know. Again, this would be a side benefit, rather than the primary purpose of the proposal.
It would need to check if any of the elements are not the same as before (using "is").

On Jul 28, 2015, at 15:28, Todd <toddrjen@gmail.com> wrote:
Following the discussion of the new "async" keyword, I think it would be useful to provide a generic way to alter the behavior of loops. My idea is to allow a user to take control over the operation of a "for" or "while" loop.
It strikes me that allowing control of comprehensions rather than loop statements might get you some of the desired benefits, while avoiding most of the problems I described in my previous email. The major difference is that comprehensions can only contain expressions, not statements, so all the issues of scopes, break and friends, etc. go away (or are already handled by the way comprehension functions are compiled). While yield is allowed inside a comprehension, it's very weird to do so. (You're essentially turning the list-building function into a generator function that returns the built list as the argument to its StopIteration; I suspect this is only legal because nobody felt the need to write the code to make it illegal, not because anyone found it useful?) You could come up with a syntax like this: values = [with pool spam(x) for x in iterable] The semantics are still a bit complicated (and still need to be defined, because what method(s) this should call with what arguments and what they should do is still not obvious), but they might not require any new kinds of objects or any new compilation modes or anything like that. I'm not sure how useful this would be, because you can always just wrap the expression in a lambda (or, in this case, use spam as-is) and call pool.map instead.

On 28 July 2015 at 23:28, Todd <toddrjen@gmail.com> wrote:
Guido's original PEP 340 (what eventually became PEP 343's context managers) is worth reading for background here: https://www.python.org/dev/peps/pep-0340/ And then the motivation section in PEP 343 covers why he changed his mind away from introducing a new general purpose looping construct and proposed the simpler context management protocol instead: https://www.python.org/dev/peps/pep-0343/ As such, rather than starting from the notion of a general purpose loop manager, we're likely better off focusing specifically on the parallelisation problem as Andrew suggests, and figuring out how we might go about enabling parallel execution of the components of a generator expression or container comprehension for at least the following cases: * native coroutine (async/await) * concurrent.futures.ThreadPoolExecutor.map * concurrent.futures.ProcessPoolExecutor.map Consider the following serial operation: result = sum(process(x, y, z) for x, y, z in seq) If "process" is a time consuming function, we may want to dispatch it to different processes in order to exploit all cores. Currently that looks like: with concurrent.futures.ProcessPoolExector() as pool: result = sum(pool.map(process, seq)) If "process" is a blocking IO operation rather than a CPU bound one, we may decide to save some IPC overhead, and use local threads instead (there's no default pool size for a thread executor): with concurrent.futures.ThreadPoolExector(max_workers=10) as pool: result = sum(pool.map(process, seq)) And if we're working with natively asynchronous algorithms: result = sum(await asyncio.gather(process_async(x, y, z) for x, y, z in seq)) That's what parallel dispatch of a loop with independent iterations already looks like today, with the key requirement being that you name the operation performed on each iteration (or use a lambda expression in the case of concurrent.futures). PEP 492 deliberately postponed the question of "What does an asynchronous comprehension look like?", because it wasn't clear what either the syntax *or* semantics should be, and as the above example shows, it's already fairly tidy if you're working with an already defined coroutine. Given the current suite level spelling for the concurrent.futures case, one could easily imagine a syntax like: result = sum(process(x, y, z) with pool for x, y, z in seq) That translated to: def _parallel_genexp(pool, seq): futures = [] with pool: for x, y, z in seq: futures.append(pool.__submit__(lambda x=x, y=y, z=z: process(x, y, z)) for future in futures: yield future.result() result = sum(_parallel_genexp(pool, seq)) Container comprehensions would replace the "yield future.result()" with "expr_result.append(item)", "expr_result.add(item)" or "expr_result[key] = value" as usual. To avoid destroying the executor with each use, a "persistent pool" wrapper could be added that delegated __submit__, but changed __enter__ and __exit__ into no-ops. Native coroutine syntax could then potentially be added using the async keyword already introduced in PEP 492, where: result = sum(process(x, y, z) with async for x, y, z in seq) May mean something like: async def _async_genexp(seq): futures = [] async for x, y, z in seq: async def _iteration(x=x, y=y, z=z): return process(x, y, z) futures.append(asyncio.ensure_future(_iteration())) return asyncio.gather(futures) result = sum(await _async_genexp(seq)) Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Guys, that is awesome. Nice spirit. I actually had another idea in mind regarding the 'concurrency syntax issue'. Not sure if you want to discuss the Pool manager syntax first. Or if I should start a new thread on this list. Best, Sven

On Jul 28, 2015, at 15:28, Todd <toddrjen@gmail.com> wrote:
First, this code would create a Pool, use it, and leak it. And yes, sure, you could wrap this all in a with statement, but then the apparent niceness that seems to motivate the idea disappears. Second, what does the Pool.__for__ method get called with here? There's an iterable, a variable name that it has to somehow assign to in the calling function's scope, and some code (in what form exactly?) that it has to execute in that calling function's scope. You could do something like this for the most trivial __for__ method: def __for__(self, scope: ScopeType, iterable: Iterable, name: str, code: CodeType): for x in iterable: scope.assign(name, x) try: exec(code, scope) except LoopBreak: break except LoopContinue: continue except LoopYield as y make calling function yield?! except LoopReturn as r: make calling function return?! It would take a nontrivial change to the compiler to compile the body of the loop into a separate code object, but with assignments still counted in the outer scope's locals list, yield expressions still making the outer function into a generator function, etc. You'd need to invent this new scope object type (just passing local, nonlocal, global dicts won't work because you can have assignments inside a loop body). Making yield, yield from, and return act on the calling function is bad enough, but for the first two, you need some way to also resume into the loop code later. If you designed a full "degenerate function" that solved all of these problems, I think that would be more useful than this proposal; different people have tried to come up with ways of doing that for making continuations for various custom-control-flow-without-macros purposes, and it doesn't seem like an easy problem. But that still doesn't get you anywhere near what you need for this proposal, because your motivating example is trying to run the code in parallel. What exactly happens when one iteration does a break and 7 others are running at the same time? Do you change the semantics of break so it breaks "within a few iterations", or add a way to cancel existing iterations and roll back any changes they'd made to the scope, or...? And return and yield seem even more problematic here. And, beyond the problems with concurrency, you have cross-process problems. For example, how do you pickle a live scope from one interpreter, pass it to another interpreter, and make it work on the first interpreter's scope? And that may not be all the problems you'd need to solve to turn this into a real proposal.
However, there are other uses as well. For example, Python has no "do...while" structure, because nobody has come up with a clean way to do it (and probably nobody ever will). However, under this proposal it would be possible for a third-party package to implement a "while" loop manager that can provide this functionality:
If someone can come up with a clean way to write this do object (even ignoring the fact that it appears to be a weird singleton global object--unless, contrary to other protocols, this one allows you do define the magic methods as @classmethods and then knows how to call them appropriately), why hasn't anyone come up with a clean way of writing a do...while structure? How would it be easier this way?
Just designing the scope object that would give you a way to do this sounds like a big enough proposal on its own. Maybe you could do this in CPython by exposing the LocalsToFast and FastToLocals methods on frame objects, adding a frame constructor, and then wrapping that up in something (in pure Python) that has a nicer API for the purpose and disguises the fact that you're actually passing around interpreter frames. You might even be able to pull off a test implementation without hacking the interpreter by using ctypes.pythonapi?
I think, like context managers, this would provide a great deal of flexibility to the language and allow a lot of useful behaviors. Of course the syntax and details are just strawmen examples at this point, there may be much better syntaxes. But I think the basic idea of being able to control a loop in a manner like this is important.
The major difference between this proposal and context managers is that you want to be able to have the loop manager drive the execution of its suite, while a context manager can't do that; it just has __enter__ and __exit__ methods that get called before and after the suite is executed normally. That's how it avoids all of the problems here. Of course it still uses a change to the interpreter to allow the __exit__ method to get called as part of exception handling, but it's easy to see how you could have implemented it as a source transformation into a try/finally, in which case it wouldn't have needed any new interpreter functionality at all. Maybe there's some way to rework your proposal into something that gets called to set up the loop, before and after the __next__ or expression test (with the after being passed the value and returning an optionally different value), and before and after each execution of the suite (the last two being very similar to what a context manager does). I don't see how any such thing could cause the suite to get executed in a process pool or in an isolated scope or any of your other motivating examples except the do...while simulator, but just because I'm not clever enough to see it doesn't mean you might not be.

I feel like you're overcomplicating the internals if this. My personal implementation idea: functions. Basically, the pool example would be equivalent to: temp = Pool() def func(x): do_something temp.__for__(range(20), func) so that __for__ could be implemented like: def __for__(self, iter, func): # Do something with the pool Yields and returns could be implicitly propagated. However, this *does* start treading into the "everything is magically implicit" territory of Ruby and Perl, which completely contradicts Python's zen. On July 28, 2015 12:02:09 PM CDT, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
-- Sent from my Nexus 5 with K-9 Mail. Please excuse my brevity.

On Jul 28, 2015, at 19:17, Ryan Gonzalez <rymg19@gmail.com> wrote:
I feel like you're overcomplicating the internals if this.
Only because the internals _have_ to be overcomplicated if this is going to work. Unless you can come up with something simpler that actually works. Your solution doesn't solve any of the problems, so being simpler doesn't really matter.
But now func has a different scope from the caller, so all of its assignments don't work. Also, it's illegal to put break or continue statements directly inside a function. So the inner function still has to be compiled in some special way. But what should outer break and continue get compiled to? It can't be the jump opcodes they normally become. And whatever you do, surely the break and continue have to be communicated to the controlling function, or it's not controlling the loop. Hence the LoopBreak and LoopContinue exceptions from my version. How can you simplify those away? (And this still doesn't answer the question of what break is supposed to do to a parallel loop.) And, while you say "yields and returns could be implicitly propagated", I'm not sure what that actually means semantically. For returns, how does the caller or the interpreter loop or anyone else even know whether the inner function did an explicit return (which has to get implicitly propagated) vs. just falling off the end (which can't be)? That's why I included the LoopReturn exception; I don't see how you can do without that either. If you want to make it more implicit, you can just make it so that if the controlling function doesn't handle LoopReturn, it's swallowed and the value of the LoopReturn is used as the value of the controlling function (not too different from StopIteration today), but that's adding more complexity onto my solution, not removing it. And yield and yield from have the same problems as return, plus the much more serious question of what it means to implicitly propagate them, and to implicitly propagate the next or send that continues the generator. Also the question of how the __for__ method gets the generator flag set at runtime depending on whether the function it's going to call has that flag set--or, if not that, how it can yield (implicitly or otherwise) without being a generator. And so on. Go through each of the problems I raised; your solution doesn't solve any of them, doesn't make any of them easier to solve, and makes some of them harder to solve. As I mentioned, this might be easier if you were trying to control comprehensions instead of for statements, but it also seems less useful there because it really adds nothing you can't do with an explicit function call like pool.map.

On Jul 28, 2015 7:02 PM, "Andrew Barnert" <abarnert@yahoo.com> wrote:
On Jul 28, 2015, at 15:28, Todd <toddrjen@gmail.com> wrote:
Following the discussion of the new "async" keyword, I think it would
What originally prompted this idea was parallelization. For example
be useful to provide a generic way to alter the behavior of loops. My idea is to allow a user to take control over the operation of a "for" or "while" loop. the similarity to context managers, I am tentatively calling this a "loop manager". the "multiprocessing.Pool" class could act as a "for" loop manager, allowing you to do something like this: that seems to motivate the idea disappears.
Second, what does the Pool.__for__ method get called with here? There's
an iterable, a variable name that it has to somehow assign to in the calling function's scope, and some code (in what form exactly?) that it has to execute in that calling function's scope.
It would take a nontrivial change to the compiler to compile the body of
I wanted to avoid too much bikeshedding, but my thinking for the "__for__" method is that it would be passed an iterator (not an iterable), a variable name, four dicts containing the local, nonlocal, higher enclosing, and global namespaces, and a function-like object. Mutating the dicts would NOT alter the corresponding namespaces. In cases where one or more of the namespaces doesn't make sense the corresponding dict would be empty. The method would return three dicts containing the local, nonlocal, and global namespaces, any or all of which could be empty. Returning a non-empty dict in a case where the corresponding namespace doesn't make sense would raise an exception. The interpreter would merge these dicts back into the corresponding namespaces. The function-like object would be passed four dicts corresponding to the same namespaces, and would return a tuple of three dicts corresponding to the same namespaces. The interpreter would again be responsible for initializing the function-like object's namespaces with the contents of the dicts and pulling out those namespaces at the end. In the case of yield, the returned tuple will have one additional element for the yielded value. The interpreter would be in charge of remembering which yield it is at, but the function-like object would still be initialized with the namespaces provided by the method. So any loop handler that allows yielding will need to be able to get the correct values in the namespace, failing to do so will raise an exception. The function-like object always has an optional argument for injecting values into the yield, but passing anything to it when the function-like object is not at a yield that accepts a value would raise an exception. Returns and breaks will be exceptions, which contain the namespaces as extra data. Continues will work similar to returns in normal functions, causing the function to terminate normally and return the namespaces at the point the continue was encountered. The "__for__" class is in charge of putting the iterator values into the local namespace dict passed to the function-like object (or not), for determining what should be in the namespace dicts passed to the function-like object, and for figuring out what, if anything, should be in the namespace dicts returned at the end. How to deal with yields, breaks, and returns is up to the class designer. There is no reason all loop handlers would need to handle all possible loop behaviour. It would be possible to catch and re-raise break or return exceptions, or simply not handle them at all, in cases where they shouldn't be used. Similarly, a class could simply raise an exception if the function-like object tries to yield anything if yielding didn't make sense. While loop managers would be similar, except instead of a variable name and iterator it would be passed a second function-like object for the conditional and a tuple of variable names used in the conditional. This function-like object would return a namespace dict for the local namespace and a boolean for the result of the conditional. Ideally this namespace dict would be empty or None if it is identical to the input namespace. It would also be possible to have an alternative context manager implementation that works in the same way. It would just be passed namespace dicts and a function-like object and return namespace dicts. the loop into a separate code object, but with assignments still counted in the outer scope's locals list, yield expressions still making the outer function into a generator function, etc. You'd need to invent this new scope object type (just passing local, nonlocal, global dicts won't work because you can have assignments inside a loop body). Right, this is why the loop handler is passed namespace dicts and returns namespace dicts. Changes to any namespace will remain isolated until everything is done and the handler can determine what to do with them.
But that still doesn't get you anywhere near what you need for this
And, beyond the problems with concurrency, you have cross-process
I think I addressed this. proposal, because your motivating example is trying to run the code in parallel. What exactly happens when one iteration does a break and 7 others are running at the same time? Do you change the semantics of break so it breaks "within a few iterations", or add a way to cancel existing iterations and roll back any changes they'd made to the scope, or...? And return and yield seem even more problematic here. In these cases it would probably just raise an exception telling you you can't use breaks or yields. problems. For example, how do you pickle a live scope from one interpreter, pass it to another interpreter, and make it work on the first interpreter's scope? That is the whole point of passing namespace dicts around. that can provide this functionality:
If someone can come up with a clean way to write this do object (even
I think, like context managers, this would provide a great deal of flexibility to the language and allow a lot of useful behaviors. Of course
ignoring the fact that it appears to be a weird singleton global object--unless, contrary to other protocols, this one allows you do define the magic methods as @classmethods and then knows how to call them appropriately), why hasn't anyone come up with a clean way of writing a do...while structure? How would it be easier this way? It would be easier because it can be uglier. The bar for new statements is necessarily much, much, much higher than for third-party packages. I certainly wouldn't propose loop handlers solely or even primarily to allow do...while loops, this is more of a side benefit and an example of the sorts of variations on existing loop behaviour that would be possible. the syntax and details are just strawmen examples at this point, there may be much better syntaxes. But I think the basic idea of being able to control a loop in a manner like this is important.
The major difference between this proposal and context managers is that
you want to be able to have the loop manager drive the execution of its suite, while a context manager can't do that; it just has __enter__ and __exit__ methods that get called before and after the suite is executed normally. That's how it avoids all of the problems here. Of course it still uses a change to the interpreter to allow the __exit__ method to get called as part of exception handling, but it's easy to see how you could have implemented it as a source transformation into a try/finally, in which case it wouldn't have needed any new interpreter functionality at all. Yes, that is why I said it was similar "in principle". The implementation is different, but I think the concepts have a lot in common.

On Jul 28, 2015, at 22:39, Todd <toddrjen@gmail.com> wrote:
But when you propose something this complicated, at least a sketch of the implementation is pretty much necessary, or nobody has any idea of the scope of what you're proposing. (Also, that sketch highlights all the pieces that Python is currently missing that would make this suggestion trivial, and I think some of those pieces are themselves interesting. And it would especially be a shame to do 90% of the work of building those new pieces just to implement this, but then not expose any of it.)
but my thinking for the "__for__" method is that it would be passed an iterator (not an iterable), a variable name, four dicts containing the local, nonlocal, higher enclosing, and global namespaces, and a function-like object. Mutating the dicts would NOT alter the corresponding namespaces.
That's a lot more complicated than it at first sounds, partly because Python currently doesn't have any notion of the "higher enclosing namespace", partly because the implicit thing that represents that namespace is made of cells, which are only constructed when needed. In particular, Python has to be able to see, at compile time, that your function is accessing a variable from an outer function so that it can mark the outer function as a cell variable and the inner function as a free variable so they can be connected when the inner function is created at runtime. Comprehensions and lambdas can ignore most of this because they can't contain statements, but your implicit loop functions can, so the compiler has to look inside them. And then, at runtime, you have to do something different--because the cells aren't actually being passed in to the implicit function, only their values, you have to add free variables to the outer function to make the post-call "namespace merging" work. It's not like you couldn't come up with the right algorithm to do all of this the way you want, it's just that it's very different from anything Python currently does, so you have to design it in detail before you can be sure it makes sense, not just hand wave it. Meanwhile, if mutating the dicts (that is, the inner function assigning to variables) doesn't affect the outer namespace, what about mutating the values in the dict? Do you deep-copy everything into the dicts, then "deep-update" back out later? If not, how does a line like "d[k] += 1" inside the loop end up working at all (where d is a global or nonlocal)? Meanwhile, this scope semantic is very weird if it interacts with the rest of Python. For example, if you call a closure that was built outside, but you've modified one of the variables that closure uses, it'll be called the with old value, right? That would be more than a little confusing. And finally, how does this copy-and-merge work in parallel? Let's say you do x=0 outside the loop, then inside the loop you do x+=i. So, the first iteration starts with x=0, changes it to x=1, and you merge back x=1. The second iteration starts with x=0, changes it to x=2, and you merge back x=2. This seems like a guaranteed race, with no way either the interpreter or the Pool.__for__ mechanism could even have a possibility of ending up with 3, much less guarantee it. I think what you actually need here is not copied and merged scopes, but something more like STM, which is a lot more complicated.
In cases where one or more of the namespaces doesn't make sense the corresponding dict would be empty. The method would return three dicts containing the local, nonlocal, and global namespaces, any or all of which could be empty. Returning a non-empty dict in a case where the corresponding namespace doesn't make sense would raise an exception. The interpreter would merge these dicts back into the corresponding namespaces.
The function-like object would be passed four dicts corresponding to the same namespaces, and would return a tuple of three dicts corresponding to the same namespaces. The interpreter would again be responsible for initializing the function-like object's namespaces with the contents of the dicts and pulling out those namespaces at the end.
What does this "function-like object" look like? How do you call it with these dicts in such a way that their locals, nonlocals, and globals get set up as the variables the function was compiled to expect? (For just locals and globals, you can get the code object out, and exec that instead of calling the function, as I suggested--but you clearly don't want that, so what do you want instead?) Again, this is a new feature that I think would be more complicated, and more broadly useful, than your suggested feature, so it ought to be specified.
In the case of yield, the returned tuple will have one additional element for the yielded value.
How you do distinguish between "exited normally" and "yielded None"?
The interpreter would be in charge of remembering which yield it is at, but the function-like object would still be initialized with the namespaces provided by the method. So any loop handler that allows yielding will need to be able to get the correct values in the namespace, failing to do so will raise an exception. The function-like object always has an optional argument for injecting values into the yield, but passing anything to it when the function-like object is not at a yield that accepts a value would raise an exception.
Well, you also need to handle throwing exceptions into the function-like object. More importantly, if you think about this, you're proposing to have something like explicit generator states and/or continuations that can be manually passed around and called (and even constructed?). Together with your manual scopes (which can at least be merged into real scopes, if not allowing real scopes to be constructed), this gives you all kinds of cool things--I think you could write Stackless or greenlets in pure Python if you had this. But again, this is something we definitely don't have today, that has to be designed before you can just assume it for a new feature.
Returns and breaks will be exceptions, which contain the namespaces as extra data.
Which, as I said, obviously means that you've created a new compiler mode that, among other things, compiled returns and breaks into raises.
Continues will work similar to returns in normal functions, causing the function to terminate normally and return the namespaces at the point the continue was encountered.
The "__for__" class is in charge of putting the iterator values into the local namespace dict passed to the function-like object (or not), for determining what should be in the namespace dicts passed to the function-like object, and for figuring out what, if anything, should be in the namespace dicts returned at the end.
Fine, but it's just filtering the four dicts the interpreter is magicking up, right? (And the dicts the function-like object is returning.) So the part the interpreter does is the interesting bit; the __for__ method will usually just be passing them along unchanged.
How to deal with yields, breaks, and returns is up to the class designer. There is no reason all loop handlers would need to handle all possible loop behaviour. It would be possible to catch and re-raise break or return exceptions, or simply not handle them at all, in cases where they shouldn't be used. Similarly, a class could simply raise an exception if the function-like object tries to yield anything if yielding didn't make sense.
But how does the class know whether, e.g., yielding from the calling function makes sense? It doesn't know what function(s) it's going to be called from. And, again, _how_ can it handle yielding if it decides to do so? You can't write a method that's sometimes a generator function and sometimes not (depending on whether some function-like object returns a special value or not).
While loop managers would be similar, except instead of a variable name and iterator it would be passed a second function-like object for the conditional and a tuple of variable names used in the conditional.
And it has to manually do the LEGB rule to dynamically figure out where to look up those names? Also, you realize that, even if it uses exactly the same rules as the normal interpreter, the effect will be different, because the interpreter applies the rule partly at compile time, not completely dynamically, right?
This function-like object would return a namespace dict for the local namespace and a boolean for the result of the conditional. Ideally this namespace dict would be empty or None if it is identical to the input namespace.
Why is it returning a namespace dict? An expression can't assign to a variable, so the namespace will always be identical. An expression can, of course, call a mutating method, but unless you're suggesting another deep-copy/deep-update (or STM transaction) here, the value is already going to be mutated, and the dict won't help anyway.
It would also be possible to have an alternative context manager implementation that works in the same way. It would just be passed namespace dicts and a function-like object and return namespace dicts.
Sure, but it would be a much, much larger change than what we currently have, or even than what was initially proposed, and it would have to answer most of the same questions raised here. Which may be why nobody suggested that as an implementation for context managers in the first place.
OK, think about this: I have a list a=[0]*8. Now, for i in range(8), I set a[i] = i**2. If I do that with a pool for, each instance gets its own copy of the scope with its own copy of a, which is modifies. How does the interpreter or the Pool.__for__ merge those 8 separate copies of a back in those 8 separate namespaces back into the original namespace?
But my point is that they're very different even in principle. One just supplies functions to get called around the suite, the other tries to control the way the suite is executed, using some kind of novel mechanism that still hasn't been designed.

On Wed, Jul 29, 2015 at 11:47 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
I thought it was better to first determine whether the idea had any merit at all before getting into details. If the idea turned out to be stupid then there isn't much point getting into details.
Fair enough.
The contents of the dict are deep-copied when passed to the loop manager. Whether the loop manager deep copies those or not is up to it. It may be possible to make the dicts directly represent the namespaces, and thus have changes immediately reflected in the namespace. I thought keeping things well-isolated was more important, but it may not be worth the trouble.
Where and when did you modify it?
Again, this would be up to the loop handler. In the case of multiprocessing, each repetition will likely get x=0, and the last value (whatever that is) will be returned at the end. The documentation would need to say that the behavior in such cases is non-deterministic and shouldn't be counted on.
I am not sure I am understanding the question. Can you please explain in more detail?
The first would return a length-3 tuple and the second would return a length-4 tuple with the last element being "None".
I am not sure what you mean by this.
I considered that possibility, but it seemed overly complicated. So at least in my proposal right now there is no way to get at the generator state.
Correct, which is why I call this a "function-like object" rather than a "function".
Maybe, maybe not. In the case of parallel code, then no. In this case of serial code, well that depends on exactly what you want to do. I think that in many cases, messing with the namespace would be one of the big advantages.
I am not understanding the question. Either there is a sane way to deal with yields, in which case it would do so, or there isn't, in which case it wouldn't. Can you name a situation where allowing a yield would be sane in some cases but not in others?
There are a couple of ways I thought of to deal with this. One is that there are two versions of each loop handler, one for generators and one for regular loops. The other is that loop handler can be passed an object that it would push values to yield.
If it doesn't want to alter them, then no it just passes along the namespaces to the function-like object. If it wants to alter them, then yes. That is, what, two or three lines of code?
Yes, if someone is using this feature they would need to be aware of the effects it has.
An expression can, of course, call a mutating method, but unless you're
suggesting another deep-copy/deep-update (or STM transaction) here, the value is already going to be mutated, and the dict won't help anyway.
It is up to the loop manager whether to deep copy or not. It would, of course, be possible for the loop manager to determine whether anything has mutated, but I felt this approach would be more consistent.
Yes, I know. Again, this would be a side benefit, rather than the primary purpose of the proposal.
It would need to check if any of the elements are not the same as before (using "is").

On Jul 28, 2015, at 15:28, Todd <toddrjen@gmail.com> wrote:
Following the discussion of the new "async" keyword, I think it would be useful to provide a generic way to alter the behavior of loops. My idea is to allow a user to take control over the operation of a "for" or "while" loop.
It strikes me that allowing control of comprehensions rather than loop statements might get you some of the desired benefits, while avoiding most of the problems I described in my previous email. The major difference is that comprehensions can only contain expressions, not statements, so all the issues of scopes, break and friends, etc. go away (or are already handled by the way comprehension functions are compiled). While yield is allowed inside a comprehension, it's very weird to do so. (You're essentially turning the list-building function into a generator function that returns the built list as the argument to its StopIteration; I suspect this is only legal because nobody felt the need to write the code to make it illegal, not because anyone found it useful?) You could come up with a syntax like this: values = [with pool spam(x) for x in iterable] The semantics are still a bit complicated (and still need to be defined, because what method(s) this should call with what arguments and what they should do is still not obvious), but they might not require any new kinds of objects or any new compilation modes or anything like that. I'm not sure how useful this would be, because you can always just wrap the expression in a lambda (or, in this case, use spam as-is) and call pool.map instead.

On 28 July 2015 at 23:28, Todd <toddrjen@gmail.com> wrote:
Guido's original PEP 340 (what eventually became PEP 343's context managers) is worth reading for background here: https://www.python.org/dev/peps/pep-0340/ And then the motivation section in PEP 343 covers why he changed his mind away from introducing a new general purpose looping construct and proposed the simpler context management protocol instead: https://www.python.org/dev/peps/pep-0343/ As such, rather than starting from the notion of a general purpose loop manager, we're likely better off focusing specifically on the parallelisation problem as Andrew suggests, and figuring out how we might go about enabling parallel execution of the components of a generator expression or container comprehension for at least the following cases: * native coroutine (async/await) * concurrent.futures.ThreadPoolExecutor.map * concurrent.futures.ProcessPoolExecutor.map Consider the following serial operation: result = sum(process(x, y, z) for x, y, z in seq) If "process" is a time consuming function, we may want to dispatch it to different processes in order to exploit all cores. Currently that looks like: with concurrent.futures.ProcessPoolExector() as pool: result = sum(pool.map(process, seq)) If "process" is a blocking IO operation rather than a CPU bound one, we may decide to save some IPC overhead, and use local threads instead (there's no default pool size for a thread executor): with concurrent.futures.ThreadPoolExector(max_workers=10) as pool: result = sum(pool.map(process, seq)) And if we're working with natively asynchronous algorithms: result = sum(await asyncio.gather(process_async(x, y, z) for x, y, z in seq)) That's what parallel dispatch of a loop with independent iterations already looks like today, with the key requirement being that you name the operation performed on each iteration (or use a lambda expression in the case of concurrent.futures). PEP 492 deliberately postponed the question of "What does an asynchronous comprehension look like?", because it wasn't clear what either the syntax *or* semantics should be, and as the above example shows, it's already fairly tidy if you're working with an already defined coroutine. Given the current suite level spelling for the concurrent.futures case, one could easily imagine a syntax like: result = sum(process(x, y, z) with pool for x, y, z in seq) That translated to: def _parallel_genexp(pool, seq): futures = [] with pool: for x, y, z in seq: futures.append(pool.__submit__(lambda x=x, y=y, z=z: process(x, y, z)) for future in futures: yield future.result() result = sum(_parallel_genexp(pool, seq)) Container comprehensions would replace the "yield future.result()" with "expr_result.append(item)", "expr_result.add(item)" or "expr_result[key] = value" as usual. To avoid destroying the executor with each use, a "persistent pool" wrapper could be added that delegated __submit__, but changed __enter__ and __exit__ into no-ops. Native coroutine syntax could then potentially be added using the async keyword already introduced in PEP 492, where: result = sum(process(x, y, z) with async for x, y, z in seq) May mean something like: async def _async_genexp(seq): futures = [] async for x, y, z in seq: async def _iteration(x=x, y=y, z=z): return process(x, y, z) futures.append(asyncio.ensure_future(_iteration())) return asyncio.gather(futures) result = sum(await _async_genexp(seq)) Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Guys, that is awesome. Nice spirit. I actually had another idea in mind regarding the 'concurrency syntax issue'. Not sure if you want to discuss the Pool manager syntax first. Or if I should start a new thread on this list. Best, Sven
participants (5)
Andrew Barnert
Nick Coghlan
Ryan Gonzalez
Sven R. Kunze