Mailman 3 More general "for" loop handling - Python-ideas

newer
Support 1.x notation in version...

More general "for" loop handling

Todd

April 30, 2015

9:48 a.m.

Looking at pep 492, it seems to me the handling of "for" loops has use outside of just asyncio. The primary use-case I can think of is multiprocessing and multithreading. For example, you could create a multiprocessing pool, and let the pool handle the items in a "for" loop, like so: from multiprocessing import Pool mypool = Pool(10, maxtasksperchild=2) mypool for item in items: do_something_here do_something_else do_yet_another_thing Or something similar with third-party modules: from greenlet import greenlet greenlet for item in items: do_something_here do_something_else do_yet_another_thing Of course this sort of thing is possible with iterators and maps today, but I think a lot of the same advantages that apply to asyncio also apply to these sorts of cases. So I think that, rather than having a special keyword just for asyncio, I think it would be better to have a more flexible approach. Perhaps something like a "__for__" magic method that lets a class implement "for" loop handling, along with the corresponding changes in how the language processes the "for" loop.

Attachments:

attachment.htm (text/html — 1.4 KB)

Show replies by date

Paul Moore

April 2015

10 a.m.

On 30 April 2015 at 10:48, Todd <toddrjen@gmail.com> wrote:

...

Of course this sort of thing is possible with iterators and maps today, but I think a lot of the same advantages that apply to asyncio also apply to these sorts of cases. So I think that, rather than having a special keyword just for asyncio, I think it would be better to have a more flexible approach. Perhaps something like a "__for__" magic method that lets a class implement "for" loop handling, along with the corresponding changes in how the language processes the "for" loop.

+1 on making a more general construct than "async for", which can then be used to implement an equivalent to "async for" as well as similar constructs for threads processes and whatever else 3rd party code might find a use for. Paul

Steven D'Aprano

11:36 a.m.

On Thu, Apr 30, 2015 at 11:48:21AM +0200, Todd wrote:

...

Looking at pep 492, it seems to me the handling of "for" loops has use outside of just asyncio. The primary use-case I can think of is multiprocessing and multithreading.

For example, you could create a multiprocessing pool, and let the pool handle the items in a "for" loop, like so:

from multiprocessing import Pool

mypool = Pool(10, maxtasksperchild=2)

mypool for item in items: do_something_here do_something_else do_yet_another_thing

That's a very pretty piece of pseudo-code (actually, I lie, I don't think it is pretty at all, but for the sake of the argument let's pretend it is) but what does it do? How does it do it? Let's be concrete: mypool = Pool(10, maxtasksperchild=2) items = range(1000) mypool for item in items: print(item) if item == 30: break x = item + 1 print(x) What gets printed? A parallel version of map makes sense, because the semantics of map are well defined: given a function f and a sequence [a, b, c, ...] it creates a new sequence [f(a), f(b), f(c), ...]. The assumption is that f is a pure-function which is side-effect free (if it isn't, you're going to have a bad time). The specific order in which a, b, c etc. are processed doesn't matter. If it does matter, then map is the wrong way to process it. But a parallel version of for does not make sense to me. (I must admit, I'm having trouble understanding what the "async for" will do too.) By definition, a for-loop is supposed to be sequential. Loop the first time, *then* the second time, *then* the third time. There's no presumption of the body of the for-block being side-effect free, and you're certainly not free to perform the loops in some other order.

...

Of course this sort of thing is possible with iterators and maps today, but I think a lot of the same advantages that apply to asyncio also apply to these sorts of cases. So I think that, rather than having a special keyword just for asyncio, I think it would be better to have a more flexible approach. Perhaps something like a "__for__" magic method that lets a class implement "for" loop handling, along with the corresponding changes in how the language processes the "for" loop.

"async for" hasn't proven itself yet, and you are already looking to generalise it? Shouldn't it prove itself as not a mistake first? -- Steve

Stefan Behnel

4:03 p.m.

Steven D'Aprano schrieb am 30.04.2015 um 13:36:

...

"async for" hasn't proven itself yet, and you are already looking to generalise it? Shouldn't it prove itself as not a mistake first?

Also, it should be quite possible to achieve what the OP proposed with "async for" since it's in no way limited to the way asyncio handles things. "async for" is a bit of a badly named feature, but that's intended in order to match what people would know from other programming languages. Stefan

Paul Moore

4:45 p.m.

On 30 April 2015 at 17:03, Stefan Behnel <stefan_ml@behnel.de> wrote:

...

Steven D'Aprano schrieb am 30.04.2015 um 13:36:

...
"async for" hasn't proven itself yet, and you are already looking to generalise it? Shouldn't it prove itself as not a mistake first?

Also, it should be quite possible to achieve what the OP proposed with "async for" since it's in no way limited to the way asyncio handles things. "async for" is a bit of a badly named feature, but that's intended in order to match what people would know from other programming languages.

Could you explain how? Specifically, what's the translation of from multiprocessing import Pool mypool = Pool(10, maxtasksperchild=2) mypool for item in items: do_something_here do_something_else do_yet_another_thing I'm assuming that's the OP's intention (it's certainly mine) is that the "mypool for" loop works something like def _work(item): do_something_here do_something_else do_yet_another_thing for _ in mypool.map(_work, items): # Wait for the subprocesses pass How would I use "async for" to get the same result? (And the same for a concurrent.futures Executor in place of a multiprocessing pool). Paul.

Todd

5:13 p.m.

On Thu, Apr 30, 2015 at 6:45 PM, Paul Moore <p.f.moore@gmail.com> wrote:

...

...
Steven D'Aprano schrieb am 30.04.2015 um 13:36:

...
"async for" hasn't proven itself yet, and you are already looking to generalise it? Shouldn't it prove itself as not a mistake first?

Also, it should be quite possible to achieve what the OP proposed with "async for" since it's in no way limited to the way asyncio handles

On 30 April 2015 at 17:03, Stefan Behnel <stefan_ml@behnel.de> wrote: things.

...
"async for" is a bit of a badly named feature, but that's intended in order to match what people would know from other programming languages.

Could you explain how?

Specifically, what's the translation of

from multiprocessing import Pool

mypool = Pool(10, maxtasksperchild=2)

mypool for item in items: do_something_here do_something_else do_yet_another_thing

I'm assuming that's the OP's intention (it's certainly mine) is that the "mypool for" loop works something like

def _work(item): do_something_here do_something_else do_yet_another_thing for _ in mypool.map(_work, items): # Wait for the subprocesses pass

Yes, thank you, that is exactly what I intended.

Guido van Rossum

5:31 p.m.

Ah. But 'async for' is not meant to introduce parallelism or concurrency. It is only meant to be able to insert specific places during a sequential iteration where a coroutine's stack can be suspended. The primitives proposed by PEP 492 don't introduce new ways to spell concurrency -- for that you would need things like asyncio.gather(). If you want to introduce ways to spell concurrency directly in the language you'll have to write and defend your own PEP. PEP 492 is only meant to make code easier to read and write that's already written to use coroutines (e.g. using the asyncio library, but not limited to that). On Thu, Apr 30, 2015 at 10:13 AM, Todd <toddrjen@gmail.com> wrote:

...

On Thu, Apr 30, 2015 at 6:45 PM, Paul Moore <p.f.moore@gmail.com> wrote:

...
...
Steven D'Aprano schrieb am 30.04.2015 um 13:36:

...
"async for" hasn't proven itself yet, and you are already looking to generalise it? Shouldn't it prove itself as not a mistake first?

Also, it should be quite possible to achieve what the OP proposed with "async for" since it's in no way limited to the way asyncio handles

On 30 April 2015 at 17:03, Stefan Behnel <stefan_ml@behnel.de> wrote: things.

...
"async for" is a bit of a badly named feature, but that's intended in order to match what people would know from other programming languages.

Could you explain how?

Specifically, what's the translation of

from multiprocessing import Pool

mypool = Pool(10, maxtasksperchild=2)

mypool for item in items: do_something_here do_something_else do_yet_another_thing

I'm assuming that's the OP's intention (it's certainly mine) is that the "mypool for" loop works something like

def _work(item): do_something_here do_something_else do_yet_another_thing for _ in mypool.map(_work, items): # Wait for the subprocesses pass

Yes, thank you, that is exactly what I intended.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (python.org/~guido)

Paul Moore

5:54 p.m.

On 30 April 2015 at 18:31, Guido van Rossum <guido@python.org> wrote:

...

PEP 492 is only meant to make code easier to read and write that's already written to use coroutines (e.g. using the asyncio library, but not limited to that).

OK, that's fair. To an outsider like me it feels like a lot of new syntax to support a very specific use case. But that's because I don't really have a feel for what you mean when you note "but not limited to that". Are there any good examples or use cases for coroutines that are *not* asyncio-based? And assuming you are saying that PEP 482 should help for those as well, could it include a non-asyncio example? My immediate reaction is that the keywords "async" and "await" will seem a little odd in a non-asyncio context. Paul Paul

Guido van Rossum

6:18 p.m.

On Thu, Apr 30, 2015 at 10:54 AM, Paul Moore <p.f.moore@gmail.com> wrote:

...

...
PEP 492 is only meant to make code easier to read and write that's already written to use coroutines (e.g. using the asyncio library, but not

On 30 April 2015 at 18:31, Guido van Rossum <guido@python.org> wrote: limited

...
to that).

OK, that's fair. To an outsider like me it feels like a lot of new syntax to support a very specific use case. But that's because I don't really have a feel for what you mean when you note "but not limited to that". Are there any good examples or use cases for coroutines that are *not* asyncio-based? And assuming you are saying that PEP 482 should help for those as well, could it include a non-asyncio example? My immediate reaction is that the keywords "async" and "await" will seem a little odd in a non-asyncio context.

The async/await pair of keywords for coroutines actually stems from C#, where the compiler generates special coroutine suspension code when it sees them, and the type checker verifies that they are used correctly -- await's argument must be something of type async, and await must occur inside a function declared as async. These semantics are very similar to coroutines using yield-from in Python, and I had observed the similarity long before this PEP was written. Most examples of coroutines will be doing some kind of I/O multiplexing, because that's what they're good for. But asyncio is not the only explicit I/O multiplexing system in the Python world. Twisted had yield-based coroutines long before anyone else (they called them InlineCallbacks though) and is likely to support Python 3 and some level of interoperability with asyncio. Note: it's PEP 492, not 482. -- --Guido van Rossum (python.org/~guido)

Andrew Barnert

May 2015

11:13 a.m.

On Apr 30, 2015, at 10:54, Paul Moore <p.f.moore@gmail.com> wrote:

...

...
On 30 April 2015 at 18:31, Guido van Rossum <guido@python.org> wrote: PEP 492 is only meant to make code easier to read and write that's already written to use coroutines (e.g. using the asyncio library, but not limited to that).

OK, that's fair. To an outsider like me it feels like a lot of new syntax to support a very specific use case. But that's because I don't really have a feel for what you mean when you note "but not limited to that". Are there any good examples or use cases for coroutines that are *not* asyncio-based?

IIRC, the original asyncio PEP has links to Greg Ewing's posts that demonstrated how you could use yield from coroutines for various purposes, including asynchronous I/O, but also things like many-actor simulations, with pretty detailed examples.

...

And assuming you are saying that PEP 482 should help for those as well, could it include a non-asyncio example? My immediate reaction is that the keywords "async" and "await" will seem a little odd in a non-asyncio context.

Paul

Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Guido van Rossum

3:12 p.m.

On Fri, May 1, 2015 at 4:13 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

...

IIRC, the original asyncio PEP has links to Greg Ewing's posts that demonstrated how you could use yield from coroutines for various purposes, including asynchronous I/O, but also things like many-actor simulations, with pretty detailed examples.

http://www.cosc.canterbury.ac.nz/greg.ewing/python/yield-from/yield_from.htm... It has two small examples of *generator iterators* that can be nicely refactored using yield-from (no need to switch to async there), but the only meaty example using a trampoline is a scheduler for multiplexed I/O. -- --Guido van Rossum (python.org/~guido)

Greg

12:26 a.m.

On 1/05/2015 5:31 a.m., Guido van Rossum wrote:

...

Ah. But 'async for' is not meant to introduce parallelism or concurrency.

This kind of confusion is why I'm not all that enamoured of using the word "async" the way PEP 492 does. But since there seems to be prior art for it in other languages now, I suppose there are at least some people out there who won't be confused by it. -- Greg

Stefan Behnel

11:58 a.m.

Guido van Rossum schrieb am 30.04.2015 um 19:31:

...

But 'async for' is not meant to introduce parallelism or concurrency.

Well, the fact that it's not *meant* for that doesn't mean you can't use it for that. It allows an iterator (name it coroutine if you want) to suspend and return control to the outer caller to wait for the next item. What the caller does in order to get that item is completely up to itself. It could be called "asyncio" and do some I/O in order to get data, but it can equally well be a multi-threading setup that grabs data from a queue connected to a pool of threads. Granted, this implies an inversion of control in that it's the caller that provides the thread-pool and not the user, but it's not like it's unprecedented to work with a 'global' pool of pre-instantiated threads (or processes, for that matter) in order to avoid startup overhead. Stefan

Stephen J. Turnbull

1:30 p.m.

Paul Moore writes:

...

mypool for item in items: do_something_here do_something_else do_yet_another_thing

I'm assuming that's the OP's intention (it's certainly mine) is that the "mypool for" loop works something like

def _work(item): do_something_here do_something_else do_yet_another_thing for _ in mypool.map(_work, items): # Wait for the subprocesses pass

I would think that given a pool of processors, the pool's .map method itself would implement the distribution. In fact the Pool ABC would probably provide several variations on the map method (eg, a mapreduce implementation, a map-to-list implementation, and a map-is-generator implementation depending on the treatment of results of the _work computation (if any). I don't see a need for syntax here. Aside: Doesn't the "Wait for the subprocesses" belong outside the for suite?

Todd

April 2015

5:12 p.m.

On Thu, Apr 30, 2015 at 1:36 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

On Thu, Apr 30, 2015 at 11:48:21AM +0200, Todd wrote:

...
Looking at pep 492, it seems to me the handling of "for" loops has use outside of just asyncio. The primary use-case I can think of is multiprocessing and multithreading.

For example, you could create a multiprocessing pool, and let the pool handle the items in a "for" loop, like so:

from multiprocessing import Pool

mypool = Pool(10, maxtasksperchild=2)

mypool for item in items: do_something_here do_something_else do_yet_another_thing

A parallel version of map makes sense, because the semantics of map are well defined: given a function f and a sequence [a, b, c, ...] it creates a new sequence [f(a), f(b), f(c), ...]. The assumption is that f is a pure-function which is side-effect free (if it isn't, you're going to have a bad time). The specific order in which a, b, c etc. are processed doesn't matter. If it does matter, then map is the wrong way to process it.

multiprocessing.Pool.map guarantees ordering. It is multiprocessing.Pool.imap_unordered that doesn't.

...

...
Of course this sort of thing is possible with iterators and maps today, but I think a lot of the same advantages that apply to asyncio also apply to these sorts of cases. So I think that, rather than having a special keyword just for asyncio, I think it would be better to have a more flexible approach. Perhaps something like a "__for__" magic method that lets a class implement "for" loop handling, along with the corresponding changes in how the language processes the "for" loop.

"async for" hasn't proven itself yet, and you are already looking to generalise it? Shouldn't it prove itself as not a mistake first?

Two reasons: 1. It may be hard to generalize it later without breaking backwards compatibility. 2. Whether it can be generalized or not may have some bearing on whether it gets accepted.

Steven D'Aprano

May 2015

12:35 a.m.

On Thu, Apr 30, 2015 at 07:12:11PM +0200, Todd wrote:

...

On Thu, Apr 30, 2015 at 1:36 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

...
A parallel version of map makes sense, because the semantics of map are well defined: given a function f and a sequence [a, b, c, ...] it creates a new sequence [f(a), f(b), f(c), ...]. The assumption is that f is a pure-function which is side-effect free (if it isn't, you're going to have a bad time). The specific order in which a, b, c etc. are processed doesn't matter. If it does matter, then map is the wrong way to process it.

multiprocessing.Pool.map guarantees ordering. It is multiprocessing.Pool.imap_unordered that doesn't.

I don't think it guarantees ordering in the sense I'm referring to. It guarantees that the returned result will be [f(a), f(b), f(c), ...] in that order, but not that f(a) will be calculated before f(b), which is calculated before f(c), ... and so on. That's the point of parallelism: if f(a) takes a long time to complete, another worker may have completed f(b) in the meantime. The point I am making is that map() doesn't have any connotations of the order of execution, where as for loops have a very strong connotation of executing the block in a specific sequence. People don't tend to use map with a function with side-effects: map(lambda i: print(i) or i, range(100)) will return [0, 1, 2, ..., 99] but it may not print 0 1 2 3 ... in that order. But with a for-loop, it would be quite surprising if for i in range(100): print(i) printed the values out of order. In my opinion, sticking "mypool" in front of the "for i" doesn't change the fact that adding parallelism to a for loop would be surprising and hard to reason about. If you still wish to argue for this, one thing which may help your case is if you can identify other programming languages that have already done something similar. -- Steve

Yury Selivanov

12:54 a.m.

On 2015-04-30 8:35 PM, Steven D'Aprano wrote:

...

...
...
multiprocessing.Pool.imap_unordered that doesn't. I don't think it guarantees ordering in the sense I'm referring to. It guarantees that the returned result will be [f(a), f(b), f(c), ...] in

multiprocessing.Pool.map guarantees ordering. It is that order, but not that f(a) will be calculated before f(b), which is calculated before f(c), ... and so on. That's the point of parallelism: if f(a) takes a long time to complete, another worker may have completed f(b) in the meantime.

This is an *excellent* point. Yury

Ethan Furman

1:02 a.m.

On 04/30, Yury Selivanov wrote:

...

On 2015-04-30 8:35 PM, Steven D'Aprano wrote:

...

...
I don't think it guarantees ordering in the sense I'm referring to. It guarantees that the returned result will be [f(a), f(b), f(c), ...] in that order, but not that f(a) will be calculated before f(b), which is calculated before f(c), ... and so on. That's the point of parallelism: if f(a) takes a long time to complete, another worker may have completed f(b) in the meantime.

This is an *excellent* point.

So, PEP 492 asynch for also guarantees that the loop runs in order, one at a time, with one loop finishing before the next one starts? *sigh* How disappointing. -- ~Ethan~

Yury Selivanov

1:07 a.m.

On 2015-04-30 9:02 PM, Ethan Furman wrote:

...

On 04/30, Yury Selivanov wrote:

...
On 2015-04-30 8:35 PM, Steven D'Aprano wrote:

...
I don't think it guarantees ordering in the sense I'm referring to. It guarantees that the returned result will be [f(a), f(b), f(c), ...] in that order, but not that f(a) will be calculated before f(b), which is calculated before f(c), ... and so on. That's the point of parallelism: if f(a) takes a long time to complete, another worker may have completed f(b) in the meantime. This is an *excellent* point. So, PEP 492 asynch for also guarantees that the loop runs in order, one at a time, with one loop finishing before the next one starts?

*sigh*

How disappointing.

No. Nothing prevents you from scheduling asynchronous parallel computation, or prefetching more data. Since __anext__ is an awaitable you can do that. Steven's point is that Todd's proposal isn't that straightforward to apply. Yury

Guido van Rossum

3:29 a.m.

On Thu, Apr 30, 2015 at 6:07 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:

...

On 2015-04-30 9:02 PM, Ethan Furman wrote:

...
On 04/30, Yury Selivanov wrote:

...
On 2015-04-30 8:35 PM, Steven D'Aprano wrote:

...
I don't think it guarantees ordering in the sense I'm referring to. It guarantees that the returned result will be [f(a), f(b), f(c), ...] in that order, but not that f(a) will be calculated before f(b), which is calculated before f(c), ... and so on. That's the point of parallelism: if f(a) takes a long time to complete, another worker may have completed f(b) in the meantime.

This is an *excellent* point.

So, PEP 492 asynch for also guarantees that the loop runs in order, one at a time, with one loop finishing before the next one starts?

*sigh*

How disappointing.

No. Nothing prevents you from scheduling asynchronous parallel computation, or prefetching more data. Since __anext__ is an awaitable you can do that.

That's not Ethan's point. The 'async for' statement indeed is a sequential loop: e.g. if you write async for rec in db_cursor: print(rec) you are guaranteed that the records are printed in the order in which they are produced by the database cursor. There is no implicit parallellism of the execution of the loop bodies. Of course you can introduce parallelism, but you have to be explicit about it, e.g. by calling some async function for each record *without* awaiting for the result, e.g. collecting the awaitables in a separate list and then using e.g. the gather() operation from the asyncio package: async def process_record(rec): print(rec) fs = [] for rec in db_cursor: fs.append(process_record(rec)) await asyncio.gather(*fs) This may print the records in arbitrary order. Note that unlike threads, you don't need locks, since there is no worry about parallel access to sys.stdout by print(). The print() function does not guarantee atomicity when it writes to sys.stdout, and in a threaded version of the above code you might occasionally see two records followed by two \n characters, because threads can be arbitrarily interleaved. Task switching between coroutines only happens at await (or yield [from] :-) and at the await points specified by PEP 492 in the 'async for' and 'async with' statements. -- --Guido van Rossum (python.org/~guido)

Ron Adam

4:45 p.m.

New subject: awaiting ... was Re: More general "for" loop handling

On 04/30/2015 09:07 PM, Yury Selivanov wrote:

...

On 2015-04-30 9:02 PM, Ethan Furman wrote:

...
On 04/30, Yury Selivanov wrote:

...
On 2015-04-30 8:35 PM, Steven D'Aprano wrote:

...
I don't think it guarantees ordering in the sense I'm referring to. It guarantees that the returned result will be [f(a), f(b), f(c), ...] in that order, but not that f(a) will be calculated before f(b), which is calculated before f(c), ... and so on. That's the point of parallelism: if f(a) takes a long time to complete, another worker may have completed f(b) in the meantime. This is an *excellent* point. So, PEP 492 asynch for also guarantees that the loop runs in order, one at a time, with one loop finishing before the next one starts?

*sigh*

How disappointing.

...

No. Nothing prevents you from scheduling asynchronous parallel computation, or prefetching more data. Since __anext__ is an awaitable you can do that.

Steven's point is that Todd's proposal isn't that straightforward to apply.

Initialising several coroutines at once still doesn't seem clear/clean to me. Or maybe I'm just not getting that part yet. Here is what I would like. :-) values = awaiting [awaitable, awaitable, ...] a, b, ... = awaiting (awaitable, awaitable, ...) This doesn't have the issues of order because a list of values is returned with the same order of the awaitables. But the awaitables are scheduled in parallel. A regular for loop could still do these in order, but would pause when it gets to a values that haven't returned/resolved yet. That would probably be expected. Awaiting sets would be different... they are unordered. So we can use a set and get the items that become available as they become available... for x in awaiting {awaitable, awaitable, ...}: print(x) x would print in an arbitrary order, but that would be what I would expect here. :-) The body could have await calls in it, and so it could cooperate along with the awaiting set. Of course if it's only a few statements, that probably wouldn't make much difference. This seems like it's both explicit and simple to think about. It also seems like it might not be that hard to do, I think most of the parts are already worked out. One option is to allow await to work with iterables in this way. But the awaiting keyword would make the code clearer and error messages nicer. Cheers, Ron

Andrew Barnert

11:19 a.m.

On Apr 30, 2015, at 17:35, Steven D'Aprano <steve@pearwood.info> wrote:

...

...
On Thu, Apr 30, 2015 at 07:12:11PM +0200, Todd wrote: On Thu, Apr 30, 2015 at 1:36 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...
...
A parallel version of map makes sense, because the semantics of map are well defined: given a function f and a sequence [a, b, c, ...] it creates a new sequence [f(a), f(b), f(c), ...]. The assumption is that f is a pure-function which is side-effect free (if it isn't, you're going to have a bad time). The specific order in which a, b, c etc. are processed doesn't matter. If it does matter, then map is the wrong way to process it. multiprocessing.Pool.map guarantees ordering. It is multiprocessing.Pool.imap_unordered that doesn't.

I don't think it guarantees ordering in the sense I'm referring to. It guarantees that the returned result will be [f(a), f(b), f(c), ...] in that order, but not that f(a) will be calculated before f(b), which is calculated before f(c), ... and so on. That's the point of parallelism: if f(a) takes a long time to complete, another worker may have completed f(b) in the meantime.

The point I am making is that map() doesn't have any connotations of the order of execution, where as for loops have a very strong connotation of executing the block in a specific sequence. People don't tend to use map with a function with side-effects:

map(lambda i: print(i) or i, range(100))

will return [0, 1, 2, ..., 99] but it may not print 0 1 2 3 ... in that order. But with a for-loop, it would be quite surprising if

for i in range(100): print(i)

printed the values out of order. In my opinion, sticking "mypool" in front of the "for i" doesn't change the fact that adding parallelism to a for loop would be surprising and hard to reason about.

If you still wish to argue for this, one thing which may help your case is if you can identify other programming languages that have already done something similar.

The obvious thing to look at here seems to be OpenMP's parallel for. I haven't used it in a long time, but IIRC, in the C bindings, you use it something like: #pragma omp_parallel_for for (int i=0; i!=100; ++i) { lots_of_work(i); } ... and it turns it into something like: for (int i=0; i!=100; ++i) { queue_put(current_team_queue, processed loop body thingy); } queue_wait(current_team_queue, 100);

...

-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Joseph Martinot-Lagarde

3:52 p.m.

Le 01/05/2015 02:35, Steven D'Aprano a écrit :

...

If you still wish to argue for this, one thing which may help your case is if you can identify other programming languages that have already done something similar.

Cython has prange. It replaces range() in the for loop but runs the loop body in parallel using openmp: from cython.parallel import prange cdef int func(Py_ssize_t n): cdef Py_ssize_t i for i in prange(n, nogil=True): if i == 8: with gil: raise Exception() elif i == 4: break elif i == 2: return i This is an example from the cython documentation: http://docs.cython.org/src/userguide/parallelism.html Joseph

Guido van Rossum

4:56 p.m.

On Fri, May 1, 2015 at 8:52 AM, Joseph Martinot-Lagarde < joseph.martinot-lagarde@m4x.org> wrote:

...

Le 01/05/2015 02:35, Steven D'Aprano a écrit :

...
If you still wish to argue for this, one thing which may help your case is if you can identify other programming languages that have already done something similar.

Cython has prange. It replaces range() in the for loop but runs the loop

body in parallel using openmp:

from cython.parallel import prange

cdef int func(Py_ssize_t n): cdef Py_ssize_t i

for i in prange(n, nogil=True): if i == 8: with gil: raise Exception() elif i == 4: break elif i == 2: return i

This is an example from the cython documentation: http://docs.cython.org/src/userguide/parallelism.html

Interesting. I'm trying to imagine how this could be implemented in CPython by turning the for-loop body into a coroutine. It would be a complicated transformation because of the interaction with local variables in the code surrounding the for-loop. Perhaps the compiler could mark all such variables as implicitly nonlocal. The Cython example also shows other interesting issues -- what should return or break do? In any case, I don't want this idea to distract the PEP 492 discussion -- it's a much thornier problem, and maybe coroutine concurrency isn't what we should be after here -- the use cases here seem to be true (GIL-free) parallelism. I'm imagining that pyparallel has already solved this (if it has solved anything :-). -- --Guido van Rossum (python.org/~guido)

Joseph Martinot-Lagarde

6:52 p.m.

Le 01/05/2015 18:56, Guido van Rossum a écrit :

...

On Fri, May 1, 2015 at 8:52 AM, Joseph Martinot-Lagarde <joseph.martinot-lagarde@m4x.org <mailto:joseph.martinot-lagarde@m4x.org>> wrote:

Le 01/05/2015 02:35, Steven D'Aprano a écrit :

If you still wish to argue for this, one thing which may help your case is if you can identify other programming languages that have already done something similar.

Cython has prange. It replaces range() in the for loop but runs the loop body in parallel using openmp:

from cython.parallel import prange

cdef int func(Py_ssize_t n): cdef Py_ssize_t i

for i in prange(n, nogil=True): if i == 8: with gil: raise Exception() elif i == 4: break elif i == 2: return i

This is an example from the cython documentation: http://docs.cython.org/src/userguide/parallelism.html

Interesting. I'm trying to imagine how this could be implemented in CPython by turning the for-loop body into a coroutine. It would be a complicated transformation because of the interaction with local variables in the code surrounding the for-loop. Perhaps the compiler could mark all such variables as implicitly nonlocal. The Cython example also shows other interesting issues -- what should return or break do?

About return and break in cython, there is a section in the documentation: "For prange() this means that the loop body is skipped after the first break, return or exception for any subsequent iteration in any thread. It is undefined which value shall be returned if multiple different values may be returned, as the iterations are in no particular order."

...

In any case, I don't want this idea to distract the PEP 492 discussion -- it's a much thornier problem, and maybe coroutine concurrency isn't what we should be after here -- the use cases here seem to be true (GIL-free) parallelism. I'm imagining that pyparallel has already solved this (if it has solved anything :-).

-- --Guido van Rossum (python.org/~guido <http://python.org/~guido>)

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Andrew Barnert

4:16 p.m.

On May 1, 2015, at 08:52, Joseph Martinot-Lagarde <joseph.martinot-lagarde@m4x.org> wrote:

...

Le 01/05/2015 02:35, Steven D'Aprano a écrit :

...
If you still wish to argue for this, one thing which may help your case is if you can identify other programming languages that have already done something similar.

Cython has prange. It replaces range() in the for loop but runs the loop body in parallel using openmp:

I think that's pretty good evidence that this proposal (I meant the syntax for loop modifiers, not "some way to do loops in parallel would be nice") isn't needed. What OpenMP has to do with loop modifier syntax, Cython can do with just a special iterator in normal Python syntax. Of course that doesn't guarantee that something similar to prange could be built for Python 3.5's Pool, Executor, etc. types without changes, but if even if it can't, a change to the iterator protocol to make prange bulldable doesn't seem as disruptive as a change to the basic syntax of the for loop. (Unless there just is no reasonable change to the protocol that could work.)

...

from cython.parallel import prange

cdef int func(Py_ssize_t n): cdef Py_ssize_t i

for i in prange(n, nogil=True): if i == 8: with gil: raise Exception() elif i == 4: break elif i == 2: return i

This is an example from the cython documentation: http://docs.cython.org/src/userguide/parallelism.html

Joseph

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Joseph Martinot-Lagarde

9:52 p.m.

Le 02/05/2015 18:16, Andrew Barnert via Python-ideas a écrit :

...

On May 1, 2015, at 08:52, Joseph Martinot-Lagarde <joseph.martinot-lagarde@m4x.org> wrote:

...
Le 01/05/2015 02:35, Steven D'Aprano a écrit :

...
If you still wish to argue for this, one thing which may help your case is if you can identify other programming languages that have already done something similar.

Cython has prange. It replaces range() in the for loop but runs the loop body in parallel using openmp:

I think that's pretty good evidence that this proposal (I meant the syntax for loop modifiers, not "some way to do loops in parallel would be nice") isn't needed. What OpenMP has to do with loop modifier syntax, Cython can do with just a special iterator in normal Python syntax.

Cython uses python syntax but the behavior is different. This is especially obvious seeing how break and return are managed, where the difference in not only in the iterator.

...

Of course that doesn't guarantee that something similar to prange could be built for Python 3.5's Pool, Executor, etc. types without changes, but if even if it can't, a change to the iterator protocol to make prange bulldable doesn't seem as disruptive as a change to the basic syntax of the for loop. (Unless there just is no reasonable change to the protocol that could work.)

...
from cython.parallel import prange

cdef int func(Py_ssize_t n): cdef Py_ssize_t i

for i in prange(n, nogil=True): if i == 8: with gil: raise Exception() elif i == 4: break elif i == 2: return i

This is an example from the cython documentation: http://docs.cython.org/src/userguide/parallelism.html

Joseph

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Joseph Martinot-Lagarde

9:55 p.m.

Le 03/05/2015 23:52, Joseph Martinot-Lagarde a écrit :

...

Le 02/05/2015 18:16, Andrew Barnert via Python-ideas a écrit :

...
On May 1, 2015, at 08:52, Joseph Martinot-Lagarde <joseph.martinot-lagarde@m4x.org> wrote:

...
Le 01/05/2015 02:35, Steven D'Aprano a écrit :

...
If you still wish to argue for this, one thing which may help your case is if you can identify other programming languages that have already done something similar.

Cython has prange. It replaces range() in the for loop but runs the loop body in parallel using openmp:

I think that's pretty good evidence that this proposal (I meant the syntax for loop modifiers, not "some way to do loops in parallel would be nice") isn't needed. What OpenMP has to do with loop modifier syntax, Cython can do with just a special iterator in normal Python syntax.

Cython uses python syntax but the behavior is different. This is especially obvious seeing how break and return are managed, where the difference in not only in the iterator.

Sorry, ignore my last email. I agree that no new *syntax* is needed.

...

...
Of course that doesn't guarantee that something similar to prange could be built for Python 3.5's Pool, Executor, etc. types without changes, but if even if it can't, a change to the iterator protocol to make prange bulldable doesn't seem as disruptive as a change to the basic syntax of the for loop. (Unless there just is no reasonable change to the protocol that could work.)

...
from cython.parallel import prange

cdef int func(Py_ssize_t n): cdef Py_ssize_t i

for i in prange(n, nogil=True): if i == 8: with gil: raise Exception() elif i == 4: break elif i == 2: return i

This is an example from the cython documentation: http://docs.cython.org/src/userguide/parallelism.html

Joseph

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Yury Selivanov

April 2015

4:58 p.m.

On 2015-04-30 5:48 AM, Todd wrote:

...

For example, you could create a multiprocessing pool, and let the pool handle the items in a "for" loop, like so:

from multiprocessing import Pool

mypool = Pool(10, maxtasksperchild=2)

mypool for item in items:

This looks "OK" for a simple snippet, but how will you define this new syntax in Python grammar? Unless you restrict such syntax to use only NAME tokens before 'for', you can easily expect users to write code like this: some_namespace.module.function(arg=123) for item in items(): ... Yury

Todd

5:07 p.m.

On Thu, Apr 30, 2015 at 6:58 PM, Yury Selivanov <yselivanov.ml@gmail.com> wrote:

...

On 2015-04-30 5:48 AM, Todd wrote:

...
For example, you could create a multiprocessing pool, and let the pool handle the items in a "for" loop, like so:

from multiprocessing import Pool

mypool = Pool(10, maxtasksperchild=2)

mypool for item in items:

This looks "OK" for a simple snippet, but how will you define this new syntax in Python grammar?

Unless you restrict such syntax to use only NAME tokens before 'for', you can easily expect users to write code like this:

some_namespace.module.function(arg=123) for item in items(): ...

pep8, probably. You can write ugly code now. This isn't much better: some_namespace.module.function(arg=123).map(items())

Joao S. O. Bueno

6:31 p.m.

On 30 April 2015 at 06:48, Todd <toddrjen@gmail.com> wrote:

...

Looking at pep 492, it seems to me the handling of "for" loops has use outside of just asyncio. The primary use-case I can think of is multiprocessing and multithreading.

For example, you could create a multiprocessing pool, and let the pool handle the items in a "for" loop, like so:

from multiprocessing import Pool

mypool = Pool(10, maxtasksperchild=2)

mypool for item in items: do_something_here do_something_else do_yet_another_thing

While the idea is cool, maybe the original "for" is quite enough for doing that - as can be seen in this real-world (if simple) package: https://github.com/npryce/python-parallelize --------------- import os from parallelize import parallelize for i in parallelize(range(100)): print(os.getpid(), i)

3576

Age (days ago)

3579

Last active (days ago)

List overview

Download

30 comments

13 participants

participants (13)

Andrew Barnert
Ethan Furman
Greg
Guido van Rossum
Joao S. O. Bueno
Joseph Martinot-Lagarde
Paul Moore
Ron Adam
Stefan Behnel
Stephen J. Turnbull
Steven D'Aprano
Todd
Yury Selivanov

More general "for" loop handling

Joseph Martinot-Lagarde

Joseph Martinot-Lagarde

Joseph Martinot-Lagarde

Joseph Martinot-Lagarde

tags

participants (13)