Mailman 3 yield from multiple iterables (was Re: The async API of the future: yield-from) - Python-ideas

newer
Re: [Python-ideas] yield from...

yield from multiple iterables (was Re: The async API of the future: yield-from)

older
Re: [Python-ideas] yield from...

Eric Snow

14 Oct 2012 14 Oct '12

4:15 p.m.

On Oct 14, 2012 8:42 AM, "Guido van Rossum" <guido@python.org> wrote:

...

Sadly it looks that

r = yield from (f1(), f2())

ends up interpreting the tuple as the iterator, and you end up with

r = (f1(), f2())

(i.e., a tuple of generators) rather than the desired

r = ((yield from f1()), (yield from f2()))

Didn't want this tangent to get lost to the async discussion. Would it be too late to make a change along these lines? Would it be enough of an improvement to be warranted? -eric

Attachments:

attachment.htm (text/html — 664 bytes)

Show replies by date

Guido van Rossum

14 Oct 14 Oct

5:33 p.m.

On Sun, Oct 14, 2012 at 9:15 AM, Eric Snow <ericsnowcurrently@gmail.com> wrote:

...

On Oct 14, 2012 8:42 AM, "Guido van Rossum" <guido@python.org> wrote:

...
Sadly it looks that

r = yield from (f1(), f2())

ends up interpreting the tuple as the iterator, and you end up with

r = (f1(), f2())

(i.e., a tuple of generators) rather than the desired

r = ((yield from f1()), (yield from f2()))

Didn't want this tangent to get lost to the async discussion. Would it be too late to make a change along these lines? Would it be enough of an improvement to be warranted?

3.3 has been released. It's too late. Also I'm not sure what change *could* be made. Surely yield from <a function returning a tuple> should just iterate over that tuple -- that's fundamental to yield from. The only thing that could be done might be to change "yield from x, y" to mean something different than "yield from (x, y)" -- but that's questionable at best, and violates many other contexts (e.g. "return x, y", "yield x, y", "for i in x, y:"). -- --Guido van Rossum (python.org/~guido)

Steven D'Aprano

7:14 p.m.

On 15/10/12 03:15, Eric Snow wrote:

...

On Oct 14, 2012 8:42 AM, "Guido van Rossum"<guido@python.org> wrote:

...
Sadly it looks that

r = yield from (f1(), f2())

ends up interpreting the tuple as the iterator, and you end up with

r = (f1(), f2())

(i.e., a tuple of generators) rather than the desired

r = ((yield from f1()), (yield from f2()))

How about this? r = yield from *(f1(), f2()) which currently is a SyntaxError in 3.3. -- Steven

Guido van Rossum

7:19 p.m.

On Sun, Oct 14, 2012 at 12:14 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

On 15/10/12 03:15, Eric Snow wrote:

...
On Oct 14, 2012 8:42 AM, "Guido van Rossum"<guido@python.org> wrote:

...
Sadly it looks that

r = yield from (f1(), f2())

ends up interpreting the tuple as the iterator, and you end up with

r = (f1(), f2())

(i.e., a tuple of generators) rather than the desired

r = ((yield from f1()), (yield from f2()))

How about this?

r = yield from *(f1(), f2())

which currently is a SyntaxError in 3.3.

I think it's too early to start proposing new syntax for a problem we don't even know is common at all. Greg Ewing's proposal works for me: r = yield from par(f1(), f2()) -- --Guido van Rossum (python.org/~guido)

Christian Tismer

8:55 p.m.

Hmmm... On 14.10.12 21:19, Guido van Rossum wrote:

...

On Sun, Oct 14, 2012 at 12:14 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...
On 15/10/12 03:15, Eric Snow wrote:

...
On Oct 14, 2012 8:42 AM, "Guido van Rossum"<guido@python.org> wrote:

...
Sadly it looks that

r = yield from (f1(), f2())

ends up interpreting the tuple as the iterator, and you end up with

r = (f1(), f2())

(i.e., a tuple of generators) rather than the desired

r = ((yield from f1()), (yield from f2()))

How about this?

r = yield from *(f1(), f2())

which currently is a SyntaxError in 3.3. I think it's too early to start proposing new syntax for a problem we don't even know is common at all.

Greg Ewing's proposal works for me:

r = yield from par(f1(), f2())

I'm not very positive about all I've read in the last 50 hours. The concept of generators IMHO gets overly bent towards modelling a sensible syntax for a problem that not even had a convincing solution in a dialect that already has full coroutines. 'par' and 'join' and friends should be considered without thinking of generators in the first place. This is attacking too many problems in one shot. My approach would be to first find out how async operations should be modelled the best under the assumption that we have a coroutine concept that works without headaches about yielding in and out from something to whatnot. After that is settled and gets consensus, then I would think about bearable patterns to implement that using generators. And when we know what we really need, maybe considering more suitable Syntax. my 0.2 thousand yen - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Greg Ewing

15 Oct 15 Oct

5:34 a.m.

Christian Tismer wrote:

...

My approach would be to first find out how async operations should be modelled the best under the assumption that we have a coroutine concept that works without headaches about yielding in and out from something to whatnot.

I think we already know that. People like Dijkstra and Hoare figured it all out decades ago. That's what my generator-oriented approach is based on -- using standard techniques for managing concurrency.

...

After that is settled and gets consensus, then I would think about bearable patterns to implement that using generators. And when we know what we really need, maybe considering more suitable Syntax.

Given that we don't want to use OS threads or greenlets, but we're happy to use generators, all that's left is to find bearable patterns for doing so. -- Greg

Christian Tismer

8:24 a.m.

On 15.10.12 07:34, Greg Ewing wrote:

...

Christian Tismer wrote:

...
My approach would be to first find out how async operations should be modelled the best under the assumption that we have a coroutine concept that works without headaches about yielding in and out from something to whatnot.

I think we already know that. People like Dijkstra and Hoare figured it all out decades ago.

That's what my generator-oriented approach is based on -- using standard techniques for managing concurrency.

Sure, the theory is clear and well-known. Not so clear is which of the concepts to implement to what detail, and things like the C10K problem still are a challenge to solve efficiently for Python. http://www.kegel.com/c10k.html I think it is necessary to take these considerations into account at least and to think about updating large sets of waiting channels efficiently, using appropriate data structures.

...

...
After that is settled and gets consensus, then I would think about bearable patterns to implement that using generators. And when we know what we really need, maybe considering more suitable Syntax.

Given that we don't want to use OS threads or greenlets, but we're happy to use generators, all that's left is to find bearable patterns for doing so.

Question: Is it already given that something like greenlets is out of consideration? I did not find a final say on that (but I'm bad at searching...) Is the whole discussion about what would be best, or just "you can choose any implementation provided it's generators" ? :-) cheers - Chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Laurens Van Houtven

9:29 a.m.

On Mon, Oct 15, 2012 at 10:24 AM, Christian Tismer <tismer@stackless.com>wrote:

...

Question: Is it already given that something like greenlets is out of consideration? I did not find a final say on that (but I'm bad at searching...)

I think an number of people have expressed a distaste for implicit task switching. That doesn't mean "no", but I'm guessing what's going to happen is having some kind of explicit, generator based thing, with an underlying API that makes implementing greenlets pretty easy.

...

Is the whole discussion about what would be best, or just "you can choose any implementation provided it's generators" ? :-)

cheers - Chris

-- cheers lvh

Christian Tismer

10:38 a.m.

On 15.10.12 11:29, Laurens Van Houtven wrote:

...

On Mon, Oct 15, 2012 at 10:24 AM, Christian Tismer <tismer@stackless.com <mailto:tismer@stackless.com>> wrote:

Question: Is it already given that something like greenlets is out of consideration? I did not find a final say on that (but I'm bad at searching...)

I think an number of people have expressed a distaste for implicit task switching. That doesn't mean "no", but I'm guessing what's going to happen is having some kind of explicit, generator based thing, with an underlying API that makes implementing greenlets pretty easy.

Is the whole discussion about what would be best, or just "you can choose any implementation provided it's generators" ? :-)

Thanks for your reply. Just one thing that I don't get. What do you mean by 'implicit taskswitching' ? There is no such thing in greenlet, if you really meant that Library from Armin Rigo. greenlets do everything explicitly, no pre-emption at all. So, is there a general understanding what a greenlet is and what not? Just to make sure that the discussed terms are clearly defined. cheers - Chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Nick Coghlan

12:18 p.m.

On Mon, Oct 15, 2012 at 8:38 PM, Christian Tismer <tismer@stackless.com> wrote:

...

Just one thing that I don't get. What do you mean by 'implicit taskswitching' ? There is no such thing in greenlet, if you really meant that Library from Armin Rigo.

greenlets do everything explicitly, no pre-emption at all.

So, is there a general understanding what a greenlet is and what not? Just to make sure that the discussed terms are clearly defined.

With greenlets, your potential switching points are every function call (because you can call switch() from anywhere, and you can't reliably know the name of *every* IO operation, or operation that implicitly invokes an IO operation). With generators, there is always an explicit *local* marker within the generator body of the potential switching points: yield expressions (including yield from). Ordinary function calls cannot cause the function to be suspended. So greenlets give you the scalability benefits of microthreading (as almost any OS supports a couple of orders of magnitude more sockets than it can threads), but without the same benefits of locally visible suspension points that are provided by generators and explicit callbacks. That's the philosophical reason. As a *practical* matter, there's still the problem you described in more detail elsewhere that CPython relies too much on the C stack to support suspension of arbitrary call chains without the stack switching assembly code in Stackless/greenlets. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Christian Tismer

1:57 p.m.

Hey Nick, On 15.10.12 14:18, Nick Coghlan wrote:

...

On Mon, Oct 15, 2012 at 8:38 PM, Christian Tismer <tismer@stackless.com> wrote:

...
Just one thing that I don't get. What do you mean by 'implicit taskswitching' ? There is no such thing in greenlet, if you really meant that Library from Armin Rigo.

greenlets do everything explicitly, no pre-emption at all.

So, is there a general understanding what a greenlet is and what not? Just to make sure that the discussed terms are clearly defined. With greenlets, your potential switching points are every function call (because you can call switch() from anywhere, and you can't reliably know the name of *every* IO operation, or operation that implicitly invokes an IO operation).

That's true, and you will wonder: I never liked that! See below (you'll wonder even more)

...

With generators, there is always an explicit *local* marker within the generator body of the potential switching points: yield expressions (including yield from). Ordinary function calls cannot cause the function to be suspended.

So greenlets give you the scalability benefits of microthreading (as almost any OS supports a couple of orders of magnitude more sockets than it can threads), but without the same benefits of locally visible suspension points that are provided by generators and explicit callbacks.

Yes, I understood that a lot better now. The nice trick of the (actually a bit ugly) explicit down-chaining of the locally visible switching points is the one thing that makes a huge difference, both for Stackless and Greenlets. Because we could never know the exact switching points, things became so difficult to handle.

...

That's the philosophical reason. As a *practical* matter, there's still the problem you described in more detail elsewhere that CPython relies too much on the C stack to support suspension of arbitrary call chains without the stack switching assembly code in Stackless/greenlets.

Right, CPython still keeps unneccessary crap on the C stack. But that's not the point right now, because on the other hand, in the context of a possible yield (from or not), the C stack is clean, and this enables switching. And actually in such clean positions, Stackless Python (as opposed to Greenlets) does soft-switching, which is very similar to what the generators are doing - there is no assembly stuff involved at all. So in the context of switching, CPython is presumably more efficient than greenlet (because of stack slicing), and a bit less efficient than stackless because of the generator chaining. I have begun studying the code for YIELD_FROM. As it is written, every next iteration elevates the chain of generators once up and down. Maybe that can be avoided by changing the frame chain, so this can become a cheaper O(1) operation. Alternatively I could also imagine to write real generators or coroutines as an extension module. It would use the same concept as generators, internally. No big deal, not changing the interpreter, maybe adding a bit. I think this would make Greenlet and even Stackless obsolete in most cases which are of real use. I would like to discuss this and maybe do a prototype. cheers - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Nick Coghlan

2:46 p.m.

On Mon, Oct 15, 2012 at 11:57 PM, Christian Tismer <tismer@stackless.com> wrote:

...

So in the context of switching, CPython is presumably more efficient than greenlet (because of stack slicing), and a bit less efficient than stackless because of the generator chaining.

I have begun studying the code for YIELD_FROM. As it is written, every next iteration elevates the chain of generators once up and down. Maybe that can be avoided by changing the frame chain, so this can become a cheaper O(1) operation.

Yes, we certainly talked about that, but I don't believe anyone came up with the code needed to make it behave itself properly when unwinding the stack. (Either that or someone *did* try it, and then undid it because it broke the test suite, which amounts to the same thing. Mercurial could say for sure)

...

Alternatively I could also imagine to write real generators or coroutines as an extension module. It would use the same concept as generators, internally. No big deal, not changing the interpreter, maybe adding a bit.

Tangentially related, there are some patches [1,2] on the tracker looking to shuffle a few things related to generator state around to get them out of the frame objects and into the generator objects where they belong. There are definitely a few things that could do with cleaning up in this space. [1] http://bugs.python.org/issue13897 [2] http://bugs.python.org/issue13607

...

I think this would make Greenlet and even Stackless obsolete in most cases which are of real use.

The "take this synchronous code and magically make it scale better" aspect is still a nice feature of greenlets & gevent.

...

I would like to discuss this and maybe do a prototype.

Sure, I think there's several things we can do better here, and I think the test suite is comprehensive enough to keep us honest. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Christian Tismer

4:41 p.m.

On 15.10.12 16:46, Nick Coghlan wrote:

...

...
Alternatively I could also imagine to write real generators or coroutines as an extension module. It would use the same concept as generators, internally. No big deal, not changing the interpreter, maybe adding a bit. Tangentially related, there are some patches [1,2] on the tracker looking to shuffle a few things related to generator state around to get them out of the frame objects and into the generator objects where

On Mon, Oct 15, 2012 at 11:57 PM, Christian Tismer <tismer@stackless.com> wrote: ... they belong. There are definitely a few things that could do with cleaning up in this space.

[1] http://bugs.python.org/issue13897 [2] http://bugs.python.org/issue13607

Thanks for pointing me at that. I think Mark Shannon has quite similar thoughts. I need to talk to him.

...

...
I think this would make Greenlet and even Stackless obsolete in most cases which are of real use. The "take this synchronous code and magically make it scale better" aspect is still a nice feature of greenlets & gevent.

I had a deeper look into gevent and how it uses greenlet and does its monkey-patching. Indeed, cute! My assumption was that I could write a surrogate greenlet using the advanced generators. But I overlooked that for this to work, everything must behave like generators. Not only the surrogate greenlet, but also the code that it wants to switch. Argh... A work-around for gevent would be a rewrite of all supported modules to patch. Not a cake walk. Thanks, you gave me a lot of insight!

...

...
I would like to discuss this and maybe do a prototype. Sure, I think there's several things we can do better here, and I think the test suite is comprehensive enough to keep us honest.

Cheers, Nick.

Cheers - Chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Christian Tismer

5:25 p.m.

On 15.10.12 15:57, Christian Tismer wrote:

...

Right, CPython still keeps unneccessary crap on the C stack. But that's not the point right now, because on the other hand, in the context of a possible yield (from or not), the C stack is clean, and this enables switching. And actually in such clean positions, Stackless Python (as opposed to Greenlets) does soft-switching, which is very similar to what the generators are doing - there is no assembly stuff involved at all.

I'm sorry about the expression "crap". Please read this as "stuff". I was not aware of the unfriendliness of this word and will be more careful next time. cheers - Chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Greg Ewing

16 Oct 16 Oct

12:44 a.m.

Christian Tismer wrote:

...

Right, CPython still keeps unneccessary crap on the C stack.

It's not just Python leaving stuff on the stack that's a problem, it's external C code that calls back into Python.

...

But that's not the point right now, because on the other hand, in the context of a possible yield (from or not), the C stack is clean, and this enables switching.

...

And actually in such clean positions, Stackless Python (as opposed to Greenlets) does soft-switching, which is very similar to what the generators are doing - there is no assembly stuff involved at all.

But the assembly code still needs to be there to handle the cases where you *can't* do soft switching. It's the presence of the code that's the issue here, not how frequently it gets called.

...

I have begun studying the code for YIELD_FROM. As it is written, every next iteration elevates the chain of generators once up and down. Maybe that can be avoided by changing the frame chain, so this can become a cheaper O(1) operation.

My original implementation of yield-from actually *did* avoid this, by keeping a C-level pointer chain of yielding-from frames. But that part was ripped out at the last minute when someone discovered that it had a detrimental effect on tracebacks. There are probably other ways the traceback problem could be fixed, so maybe we will get this optimisation back one day. -- Greg

Nick Coghlan

1:49 a.m.

On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

My original implementation of yield-from actually *did* avoid this, by keeping a C-level pointer chain of yielding-from frames. But that part was ripped out at the last minute when someone discovered that it had a detrimental effect on tracebacks.

There are probably other ways the traceback problem could be fixed, so maybe we will get this optimisation back one day.

Ah, I thought I remembered something along those lines. IIRC, it was a bug report on one of the alphas that prompted us to change it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Christian Tismer

19 Oct 19 Oct

12:55 p.m.

Hi Nick, On 16.10.12 03:49, Nick Coghlan wrote:

...

On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...
My original implementation of yield-from actually *did* avoid this, by keeping a C-level pointer chain of yielding-from frames. But that part was ripped out at the last minute when someone discovered that it had a detrimental effect on tracebacks.

There are probably other ways the traceback problem could be fixed, so maybe we will get this optimisation back one day. Ah, I thought I remembered something along those lines. IIRC, it was a bug report on one of the alphas that prompted us to change it.

I was curious and searched quite a lot. It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220 from Marc Shannon, patched by Benjamin. Now I found the original implementation. That looks very much as I'm thinking it should be. Quite a dramatic change which works well, but really seems to remove what I would call "now I can emulate most of Stackless" efficiently. Maybe I should just try to think it would be implemented as before, build an abstraction and just use it for now. I will spend my time at PyCon de for sprinting on "yield from". cheers - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Nick Coghlan

1:56 p.m.

On Fri, Oct 19, 2012 at 10:55 PM, Christian Tismer <tismer@stackless.com> wrote:

...

I was curious and searched quite a lot. It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220 from Marc Shannon, patched by Benjamin.

Now I found the original implementation. That looks very much as I'm thinking it should be.

Quite a dramatic change which works well, but really seems to remove what I would call "now I can emulate most of Stackless" efficiently.

Maybe I should just try to think it would be implemented as before, build an abstraction and just use it for now.

I will spend my time at PyCon de for sprinting on "yield from".

Yeah, if we can get Greg's original optimised behaviour while still supporting introspection properly, that's really where we want to be. That's the main reason I'm a fan of Mark's other patches moving more of the generator state from the frame objects out into the generator objects - my suspicion is that generator objects themselves need to be maintaining a full "generator stack" independent of the frame stack in the main eval loop in order to get the best of both worlds (i.e. optimised suspend/resume with confusing debuggers). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Mark Lawrence

2:05 p.m.

On 19/10/2012 14:56, Nick Coghlan wrote:

...

On Fri, Oct 19, 2012 at 10:55 PM, Christian Tismer <tismer@stackless.com> wrote:

...
I was curious and searched quite a lot. It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220 from Marc Shannon, patched by Benjamin.

Now I found the original implementation. That looks very much as I'm thinking it should be.

Quite a dramatic change which works well, but really seems to remove what I would call "now I can emulate most of Stackless" efficiently.

Maybe I should just try to think it would be implemented as before, build an abstraction and just use it for now.

I will spend my time at PyCon de for sprinting on "yield from".

Yeah, if we can get Greg's original optimised behaviour while still supporting introspection properly, that's really where we want to be. That's the main reason I'm a fan of Mark's other patches moving more of the generator state from the frame objects out into the generator objects - my suspicion is that generator objects themselves need to be maintaining a full "generator stack" independent of the frame stack in the main eval loop in order to get the best of both worlds (i.e. optimised suspend/resume with confusing debuggers).

Cheers, Nick.

There's nothing like confusing debuggers or have I read that wrong? :) -- Cheers. Mark Lawrence.

Nick Coghlan

2:16 p.m.

On Sat, Oct 20, 2012 at 12:05 AM, Mark Lawrence <breamoreboy@yahoo.co.uk> wrote:

...

There's nothing like confusing debuggers or have I read that wrong? :)

Yeah, that was the main issue that resulted in the design change - the optimised approach confused a lot of the introspection machinery. So the challenge is to restore the optimisation while *also* adding in mechanisms to preserve the introspection support. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Christian Tismer

4:18 p.m.

On 19.10.12 15:56, Nick Coghlan wrote:

...

...
I was curious and searched quite a lot. It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220 from Marc Shannon, patched by Benjamin.

Now I found the original implementation. That looks very much as I'm thinking it should be.

Quite a dramatic change which works well, but really seems to remove what I would call "now I can emulate most of Stackless" efficiently.

Maybe I should just try to think it would be implemented as before, build an abstraction and just use it for now.

I will spend my time at PyCon de for sprinting on "yield from". Yeah, if we can get Greg's original optimised behaviour while still supporting introspection properly, that's really where we want to be. That's the main reason I'm a fan of Mark's other patches moving more of the generator state from the frame objects out into the generator objects - my suspicion is that generator objects themselves need to be

On Fri, Oct 19, 2012 at 10:55 PM, Christian Tismer <tismer@stackless.com> wrote: maintaining a full "generator stack" independent of the frame stack in the main eval loop in order to get the best of both worlds (i.e. optimised suspend/resume with confusing debuggers).

That may be very true in order to get real generators. The storm in my brain is quite intense the last days... Actually I would like to have a python context where it gets into "async mode" and interprets all functions defined in that mode as generators. In that mode, generators are not meant as generators, but async-enabled functions. I see "yield from" as a low-level construct that should not even be exposed, but be applied automatically in async mode. That way, we could write normal functions and could implement a real "Yield" without the "yield from" helper visible everywhere. Not sure how to do that right. I'm playing with AST a bit to get a feeling for this. To give you an idea where my thoughts are meandering around, I would like to point you at http://doc.pypy.org/en/latest/stackless.html That is an implementation that comes close to what I'm thinking. The drawback of the current PyPy implementation is that it used greenlet style for its underlying switching. That is what I want to replace with some "yield from" construct. cheers - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Greg Ewing

20 Oct 20 Oct

12:50 a.m.

Christian Tismer wrote:

...

Actually I would like to have a python context where it gets into "async mode" and interprets all functions defined in that mode as generators.

That sounds somewhat similar to another idea I proposed a while ago: There would be a special kind of function called a "cofunction", that you define using "codef" instead of "def". A cofunction is essentially a generator, but with a special property: when one cofunction calls another, the call is implicitly made as a "yield from" call. This scheme wouldn't be completely transparent, since the cofunctions have to be defined in a special way. But the calls would look like ordinary calls. There's a PEP describing a variation on the idea here: http://www.python.org/dev/peps/pep-3152/ In that version, calls to cofunctions are specially marked using a "cocall" keyword. But since writing that, I've come to believe that my original idea (where the cocalls are implicit) was better. -- Greg

Nick Coghlan

1:56 a.m.

On Sat, Oct 20, 2012 at 10:50 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

Christian Tismer wrote:

...
Actually I would like to have a python context where it gets into "async mode" and interprets all functions defined in that mode as generators.

That sounds somewhat similar to another idea I proposed a while ago:

There would be a special kind of function called a "cofunction", that you define using "codef" instead of "def". A cofunction is essentially a generator, but with a special property: when one cofunction calls another, the call is implicitly made as a "yield from" call.

This scheme wouldn't be completely transparent, since the cofunctions have to be defined in a special way. But the calls would look like ordinary calls.

Please don't lose sight of the fact that yield-based suspension points looking like something other than an ordinary function call is a *feature*, not a bug. The idea is that the flow control, especially the fact that "other code may run here, so the world may have changed before we get to the next expression", is visible *locally* in each function, rather than relying on global knowledge of which calls may lead to a task switch. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Steve Dower

2:41 a.m.

I'm not entirely sure whether I'm hijacking the thread here... I have to admit I've somewhat lost track with all the renames. The discussion has been very interesting (I really like the 'codef' idea, and decorators can provide this without requiring syntax changes) regardless of which thread is active. I have spent a bit of time writing up the approach that we (Dino, who posted it here originally, myself and with some advice from colleagues who are working on a similar API for C++) came up with and implemented. I must apologise for the length - I got a little carried away with background information, but I do believe that it is important for us to understand exactly what problem we're trying to solve so that we aren't distracted by "new toys". The write-up is here: http://stevedower.id.au/blog/async-api-for-python/ I included code, since there have been a few people asking for prototype implementations, so if you want to skip ahead to the code (which is quite heavily annotated) it is at http://stevedower.id.au/blog/async-api-for-python/#thecode or http://stevedower.id.au/downloads/PythonAsync.zip (I based my example on Greg's socket spam, so thanks for that!) And no, I'm not collecting any ad revenue from the page, so feel free to visit as often as you like and use up my bandwidth. Let the tearing apart of my approach commence! :) Cheers, Steve

Guido van Rossum

6:12 p.m.

On Fri, Oct 19, 2012 at 7:41 PM, Steve Dower <Steve.Dower@microsoft.com> wrote:

...

I'm not entirely sure whether I'm hijacking the thread here... I have to admit I've somewhat lost track with all the renames. The discussion has been very interesting (I really like the 'codef' idea, and decorators can provide this without requiring syntax changes) regardless of which thread is active.

I have spent a bit of time writing up the approach that we (Dino, who posted it here originally, myself and with some advice from colleagues who are working on a similar API for C++) came up with and implemented.

I must apologise for the length - I got a little carried away with background information, but I do believe that it is important for us to understand exactly what problem we're trying to solve so that we aren't distracted by "new toys".

The write-up is here: http://stevedower.id.au/blog/async-api-for-python/

I included code, since there have been a few people asking for prototype implementations, so if you want to skip ahead to the code (which is quite heavily annotated) it is at http://stevedower.id.au/blog/async-api-for-python/#thecode or http://stevedower.id.au/downloads/PythonAsync.zip (I based my example on Greg's socket spam, so thanks for that!)

And no, I'm not collecting any ad revenue from the page, so feel free to visit as often as you like and use up my bandwidth.

Let the tearing apart of my approach commence! :)

Couple of questions and comments. - You mention a query interface a few times but there are no details in your example code; can you elaborate? (Or was that a typo for queue?) - This is almost completely isomorphic with NDB's tasklets, except that you borrow the Future class implementation from concurrent.futures -- I think that's the wrong building block to start with, because it is linked too closely to threads. - There is a big speed difference between yield from <generator> and yield <future>. With yield <future>, the scheduler has to do significant work for each yield at an intermediate level, whereas with yield from, the schedule is only involved when actual blocking needs to be performed. In my experience, real code has lots of intermediate levels. Therefore I would like to use yield from. You can already do most things with yield from that you can do with Futures; there are a few operations that need a helper (in particular spawning truly concurrent tasks), but the helper code can be much simpler than the Future object, and isn't needed as often, so it's still a bare win. - Nit: I don't like calling the event loop context; there are too many things called context (e.g. context managers in Python), so I prefer to call it what it is -- event loop or I/O loop. - Nittier nit: you have a few stray colons, e.g. "it = iter(fn(*args, **kwargs)):" . -- --Guido van Rossum (python.org/~guido)

Steve Dower

7:31 p.m.

...

- Nit: I don't like calling the event loop context; there are too many things called context (e.g. context managers in Python), so I prefer to call it what it is -- event loop or I/O loop.

The naming collision with context managers has been brought up before, so I'm okay with changing that. We used context mainly because it's close to the terminology used in .NET, where you schedule tasks/continuations in a particular SynchronizationContext. I believe "I/O loop" would be inaccurate, but "event loop" is probably appropriate.

...

- You mention a query interface a few times but there are no details in your example code; can you elaborate? (Or was that a typo for queue?)

I think I just changed terminology while writing - this is the 'get_future_for' call, which is not guaranteed to provide a waitable/pollable object for any type. The intent is to allow an event loop to optionally provide support for (say) select(), but not to force that upon all implementations. If (when) someone implements a Windows GetMessage() based loop then requiring 'native' select() support is unfair. (Also, an implementation for Windows 8 would not directly involve an event loop, but would pass everything through to the underlying OS.)

...

- This is almost completely isomorphic with NDB's tasklets, except that you borrow the Future class implementation from concurrent.futures -- I think that's the wrong building block to start with, because it is linked too closely to threads.

As far as I can see, the only link that futures have with threads is that the ThreadPoolExecutor class is in the same module. `Future` itself is merely an object that can be polled, waited on, or assigned a callback, which means it represents all asynchronous operations. Some uses are direct (e.g., polling a future that represents pollable I/O) while others require emulation (adding a callback for pollable I/O), which is partly why the 'get_future_for' function exists - to allow the event loop to use the object directly if it can.

...

- There is a big speed difference between yield from <generator> and yield <future>. With yield <future>, the scheduler has to do significant work for each yield at an intermediate level, whereas with yield from, the schedule is only involved when actual blocking needs to be performed. In my experience, real code has lots of intermediate levels. Therefore I would like to use yield from. You can already do most things with yield from that you can do with Futures; there are a few operations that need a helper (in particular spawning truly concurrent tasks), but the helper code can be much simpler than the Future object, and isn't needed as often, so it's still a bare win.

I don't believe the scheduler is involved that frequently, but it is true that more Futures than are strictly necessary are created. The first step (up to a yield) of any @async method is always run immediately - if there is no yield, then the returned future is already completed and has the result. The event loop as implemented could be optimised slightly for this case, but since Future calls new callbacks immediately if it has already completed then we never 'unschedule' the task. yield from can of course be used for the intermediate levels in exactly the same way as it is used for refactoring generators. The difference is that the top level is an @async decorator, at which point a Future is created. So 'read_async' might have @async applied, but it can 'yield from' any other generators that yield futures. Then the person calling 'read_async' is free to use any Future compatible interface rather than being forced into continuing the 'yield from' chain all the way to the top. (In particular, I think this works much better in the interactive scenario - I can write "x = read_async().result()", but how do you implement a 'yield from' approach in a REPL?)

Guido van Rossum

11:11 p.m.

On Sat, Oct 20, 2012 at 12:31 PM, Steve Dower <Steve.Dower@microsoft.com> wrote:

...

...
- Nit: I don't like calling the event loop context; there are too many things called context (e.g. context managers in Python), so I prefer to call it what it is -- event loop or I/O loop.

The naming collision with context managers has been brought up before, so I'm okay with changing that. We used context mainly because it's close to the terminology used in .NET, where you schedule tasks/continuations in a particular SynchronizationContext. I believe "I/O loop" would be inaccurate, but "event loop" is probably appropriate.

I'm happy to settle on event loop. (Terminology in this area seems fraught with conflicting conventions; Twisted calls it a reactor, after the reactor pattern, but I've been chided by others for using this term without explanation; Tornado calls it I/O loop.)

...

...
- You mention a query interface a few times but there are no details in your example code; can you elaborate? (Or was that a typo for queue?)

I think I just changed terminology while writing - this is the 'get_future_for' call, which is not guaranteed to provide a waitable/pollable object for any type.

Then what is the use? What *is* its contract?

...

The intent is to allow an event loop to optionally provide support for (say) select(), but not to force that upon all implementations. If (when) someone implements a Windows GetMessage() based loop then requiring 'native' select() support is unfair. (Also, an implementation for Windows 8 would not directly involve an event loop, but would pass everything through to the underlying OS.)

I'm all for declaring select() an implementation detail. It doesn't scale on any platform; on Windows it only works for sockets; the properly scaling alternative varies per platform. (It is IOCP on Windows, right?) This *probably* also means that the concept of file descriptor is out the window (even though Tornado apparently cannot do anything without it -- it's probably not used on Windows at all). And I suspect that it means that the implementation of the socket abstraction will vary per platform. The collection of other implementations of the same abstraction available, and even available other abstractions, will also vary per platform -- on Unix, there are pseudo ttys, pipes, named pipes, and unix domain sockets; I don't recall the set available on Windows, but I'm sure it is different. Then there is SSL/TLS, which feels like it requires special handling but in the end implements an abstraction similar to sockets. I assume that in many cases it is easy to bridge from the various platform-specific abstractions and implementation to more cross-platform abstractions; this is where the notions of transports and protocols seem most important. I haven't explored those enough, sadly. One note inspired by my mention of SSL, but also by discussions about GUI event loops in other threads: it is easy to think that everything is reducible to a file descriptor, but often it is not that easy. E.g. with something like SSL, you can't just select on the underlying socket, and then when it's ready call the read() method of the SSL layer -- it's possible that the read() will still block because the socket didn't have enough bytes to be able to decrypt the next block of data. Similar for sockets associated with e.g. GUI event management (e.g. X).

...

...
- This is almost completely isomorphic with NDB's tasklets, except that you borrow the Future class implementation from concurrent.futures -- I think that's the wrong building block to start with, because it is linked too closely to threads.

As far as I can see, the only link that futures have with threads is that the ThreadPoolExecutor class is in the same module. `Future` itself is merely an object that can be polled, waited on, or assigned a callback, which means it represents all asynchronous operations. Some uses are direct (e.g., polling a future that represents pollable I/O) while others require emulation (adding a callback for pollable I/O), which is partly why the 'get_future_for' function exists - to allow the event loop to use the object directly if it can.

I wish it was true. But the Future class contains a condition variable, and the Waiter class used by the implementation uses an event. Both are directly imported from the threading module, and if you block on either of these, it is a hard block (not even interruptable by a signal). Don't worry too much about this -- it's just the particular implementation (concurrent.futures.Future). We can define a better Future class for our purposes elsewhere, with the same interface (or a subset -- I don't care much for the whole cancellation feature) but without references to threading. For those Futures, we'll have to decide what should happen if you call result() when the Future isn't done yet -- raise an error (similar to EWOULDBLOCK), or somehow block, possibly running a recursive event loop? (That's what NDB does, but not everybody likes it.) I think the normal approach would be to ask the scheduler to suspend the current task until the Future is ready -- it can easily arrange for that by adding a callback. In NDB this is spelled "yield <future>". In the yield-from <generator> world we could spell it that way too (i.e. yield, not yield from), or we could make it so that we can write yield from <future>, or perhaps we need a helper call: yield from wait(<future>) or maybe a method on the Future class (since it is our own), yield from <future>.wait(). These are API design details. (I also have a need to block for the Futures returned by ThreadPoolExecutor and ProcessPoolExecutor -- those are handy when you really can't run something inline in the event loop -- the simplest example being getaddrinfo(), which may block for DNS.)

...

...
- There is a big speed difference between yield from <generator> and yield <future>. With yield <future>, the scheduler has to do significant work for each yield at an intermediate level, whereas with yield from, the schedule is only involved when actual blocking needs to be performed. In my experience, real code has lots of intermediate levels. Therefore I would like to use yield from. You can already do most things with yield from that you can do with Futures; there are a few operations that need a helper (in particular spawning truly concurrent tasks), but the helper code can be much simpler than the Future object, and isn't needed as often, so it's still a bare win.

I don't believe the scheduler is involved that frequently, but it is true that more Futures than are strictly necessary are created.

IIUC every yield must pass a Future, and every time that happens the scheduler gets it and must arrange for a callback on that Future which resumes the generator. I have code like that in NDB and you have very similar code like that in your version (wrapper in @async, and later _Awaiter._step()).

...

The first step (up to a yield) of any @async method is always run immediately - if there is no yield, then the returned future is already completed and has the result. The event loop as implemented could be optimised slightly for this case, but since Future calls new callbacks immediately if it has already completed then we never 'unschedule' the task.

Interesting that you always run the first step immediately. I don't do this in NDB. Can you explain why you think you need it? (It may simply be an optimization I've overlooked. :-)

...

yield from can of course be used for the intermediate levels in exactly the same way as it is used for refactoring generators. The difference is that the top level is an @async decorator, at which point a Future is created. So 'read_async' might have @async applied, but it can 'yield from' any other generators that yield futures. Then the person calling 'read_async' is free to use any Future compatible interface rather than being forced into continuing the 'yield from' chain all the way to the top. (In particular, I think this works much better in the interactive scenario - I can write "x = read_async().result()", but how do you implement a 'yield from' approach in a REPL?)

Yeah, this is what I do in NDB, as I mentioned above (the recursive event loop call). But I suspect it would be very easy to write a helper function that you give a generator and which runs it to completion. It would also have to invoke the event loop, but that seems unavoidable, and otherwise the event loop isn't running in interactive mode, right? (Unless it runs in a separate thread, in which case the helper function should just communicate with that thread.) Final remark: I keep wondering if it's better to try and stay "pure" in the public API and use only yield from, plus some helpers like spawn(), join() and par(), or if a decent, pragmatic public API can offer a combination. I worry that most users will have a hard time remembering when to use yield and when yield from. -- --Guido van Rossum (python.org/~guido)

Greg Ewing

21 Oct 21 Oct

5:19 a.m.

Guido van Rossum wrote:

...

In the yield-from <generator> world we could spell it that way too (i.e. yield, not yield from), or we could make it so that we can write yield from <future>, or perhaps we need a helper call: yield from wait(<future>) or maybe a method on the Future class (since it is our own), yield from <future>.wait().

This will depend to some extent on whether Futures are considered part of the tasks layer or part of the callbacks layer. If they're considered part of the callbacks layer, they shouldn't have any methods that must be called with yield-from.

...

Final remark: I keep wondering if it's better to try and stay "pure" in the public API and use only yield from, plus some helpers like spawn(), join() and par(), or if a decent, pragmatic public API can offer a combination. I worry that most users will have a hard time remembering when to use yield and when yield from.

As I've said, I think it would be better to have only 'yield from' calls in the public API, because it gives the implementation the greatest freedom. -- Greg

Steve Dower

4:47 p.m.

Greg Ewing wrote:

...

This will depend to some extent on whether Futures are considered part of the tasks layer or part of the callbacks layer. If they're considered part of the callbacks layer, they shouldn't have any methods that must be called with yield-from.

I put Futures very firmly in the callbacks layer (I guess the easiest reasoning for this is the complete lack of threading/async code in their implementation). Every time someone suggests "yielding a sentinel value" it seems that a Future is ideal for this - it even provides the other thread/tasklet/coroutine with a way to reactivate the original one, whether the two functions were written with knowledge of each other or not.

...

As I've said, I think it would be better to have only 'yield from' calls in the public API, because it gives the implementation the greatest freedom.

I agree with this, though I still feel that we should be aiming for only 'yield' in the public API and leaving 'yield from' as a generalisation of this. For example, the two following pieces of code are basically equivalent: @async def task1(): yield do_something_async_returning_a_future() @async def task2(): yield task1() yield task1() @async def task3(): yield task2() task3().result() And doing the same thing with yield from: def task1(): yield do_something_async_returning_a_future() def task2(): yield from task1() yield from task1() @async def task3(): yield from task2() task3().result() This is also equivalent to this code: @async def task3(): yield do_something_async_returning_a_future() yield do_something_async_returning_a_future() task3().result() And this: def task(): f = Future() do_something_async_returning_a_future().add_done_callback( lambda _: do_something_async_returning_a_future().add_done_callback( lambda _: f.set_result(None) ) ) return f My point is that once we are using yield, yield from automatically becomes an option for composing operations. Teaching and validating this style is also easier, because the rule can be 'always use @async/yield in public APIs and just yield from in private APIs', and the biggest problem with not using yield from is that more Future objects are created. (The upsides were in my essay, but include compatibility with other Future-based APIs and composability between code from different sources.) Cheers, Steve

Guido van Rossum

5:08 p.m.

On Oct 21, 2012 9:48 AM, "Steve Dower" <Steve.Dower@microsoft.com> wrote:

...

Greg Ewing wrote:

...
This will depend to some extent on whether Futures are considered part of the tasks layer or part of the callbacks layer. If they're considered part of the callbacks layer, they shouldn't have any methods that must be called with yield-from.

I put Futures very firmly in the callbacks layer (I guess the easiest

...

Every time someone suggests "yielding a sentinel value" it seems that a Future is ideal for this - it even provides the other

...

...
As I've said, I think it would be better to have only 'yield from' calls in the public API, because it gives the implementation the greatest freedom.

I agree with this, though I still feel that we should be aiming for only 'yield' in the public API and leaving 'yield from' as a generalisation of

reasoning for this is the complete lack of threading/async code in their implementation). Did you check the source? That's simply incorrect. It uses locks, of the threading variety. ( However one could write an implementation with the same interface that doesn't.) thread/tasklet/coroutine with a way to reactivate the original one, whether the two functions were written with knowledge of each other or not. This I like. this. For example, the two following pieces of code are basically equivalent:

...

@async def task1(): yield do_something_async_returning_a_future()

@async def task2(): yield task1() yield task1()

@async def task3(): yield task2()

task3().result()

And doing the same thing with yield from:

def task1(): yield do_something_async_returning_a_future()

def task2(): yield from task1() yield from task1()

@async def task3(): yield from task2()

task3().result()

This is also equivalent to this code:

@async def task3(): yield do_something_async_returning_a_future() yield do_something_async_returning_a_future()

task3().result()

And this:

def task(): f = Future() do_something_async_returning_a_future().add_done_callback( lambda _:

do_something_async_returning_a_future().add_done_callback(

...

lambda _: f.set_result(None) ) ) return f

My point is that once we are using yield, yield from automatically

becomes an option for composing operations. Teaching and validating this style is also easier, because the rule can be 'always use @async/yield in public APIs and just yield from in private APIs', and the biggest problem with not using yield from is that more Future objects are created. (The upsides were in my essay, but include compatibility with other Future-based APIs and composability between code from different sources.) Hm. I think it'll be confusing. And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions. --Guido van Rossum (sent from Android phone)

Steve Dower

8:07 p.m.

...

Did you check the source? That's simply incorrect. It uses locks, of the threading variety.

Yes, I've spent a lot of time in the source for Future while working on this. It has synchronisation which is _aware_ of threads, but it never creates, requires or uses them. It simply ensures thread-safe reentrancy, which will be required for any general solution unless it is completely banned from interacting across CPU threads.

...

( However one could write an implementation with the same interface that doesn't.)

And this is as simple as replacing threading.Condition() with no-op acquire() and release() functions. Regardless, the big advantage of requiring 'Future' as an interface* is that other implementations can be substituted. (Maybe making the implementation of future a property of the active event loop? I don't mind particular event loops from banning CPU threads, but the entire API should allow their existence.) (*I'm inclined to define this as 'result()', 'done()', 'add_done_callback()', 'exception()', 'set_result()' and 'set_exception()' functions. Maybe more, but I think that's sufficient. The current '_waiters' list is an optimisation for add_done_callback(), and doesn't need to be part of the interface.)

...

Hm. I think it'll be confusing.

I think the basic case ("just make it work") will be simpler, and the advanced case ("minimise memory/CPU usage") will be more complicated.

...

And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions.

Personally, I'd prefer developers to get a correct solution without having to understand how the whole thing works (the "pit of success"). I'm also sceptical of any other rule being as portable and composable - I don't think a standard library should have APIs where "you must only call this function with yield-from". ('await' in C# is not compulsory - you can take the Task returned from an async method and do whatever you like with it.) Cheers, Steve

Guido van Rossum

22 Oct 22 Oct

12:23 a.m.

On Sun, Oct 21, 2012 at 1:07 PM, Steve Dower <Steve.Dower@microsoft.com> wrote:

...

...
Did you check the source? That's simply incorrect. It uses locks, of the threading variety.

Yes, I've spent a lot of time in the source for Future while working on this.

Sorry, I should have realized this, since your code example contained monkey-patching that Future class...

...

It has synchronisation which is _aware_ of threads, but it never creates, requires or uses them. It simply ensures thread-safe reentrancy, which will be required for any general solution unless it is completely banned from interacting across CPU threads.

I don't see it that way. Any time you acquire a lock, you may be blocked for a long time. In a typical event loop that's an absolute no-no. Typically, to wait for another thread, you give the other thread a callback that adds a new event for *this* thread. Now, it's possible that in Windows, when using IOCP, the philosophy is different -- I think I've read in http://msdn.microsoft.com/en-us/library/aa365198%28VS.85%29.aspx that there can be multiple threads reading events from a single queue. But AFAIK, in Twisted and Tornado and similar systems, and probably even in gevent and Stackless, there is a strong culture around having only a single thread handling events (at least only one thread at a time), since the assumption is that as long as you don't suspend, you can trust that the world doesn't change, and that assumption becomes invalid when other threads may also be handling events from the same queue. It's possible to design a world where different threads have their own event queues, and this assumption would only be valid for events belonging to the same queue; however that seems complicated. And you still don't want to ever attempt to acquire a *threading* lock, because you end up blocking the entire event loop.

...

...
( However one could write an implementation with the same interface that doesn't.)

And this is as simple as replacing threading.Condition() with no-op acquire() and release() functions. Regardless, the big advantage of requiring 'Future' as an interface* is that other implementations can be substituted.

Yes, here I think we are in (possibly violent :-) agreement.

...

(Maybe making the implementation of future a property of the active event loop? I don't mind particular event loops from banning CPU threads, but the entire API should allow their existence.)

Perhaps. Lots of possibilities in this design space.

...

(*I'm inclined to define this as 'result()', 'done()', 'add_done_callback()', 'exception()', 'set_result()' and 'set_exception()' functions. Maybe more, but I think that's sufficient. The current '_waiters' list is an optimisation for add_done_callback(), and doesn't need to be part of the interface.)

Agreed. I don't see much use for the cancellation stuff and all the extra complexity that adds to the interface. BTW, I think concurrent.futures.Future doesn't stop you from calling set_result() or set_exception() more than once, which I think is a mistake -- I do enforce that in NDB's Futures. [Here you snipped some context. You proposed having public APIs that use "yield <future>" and leaving "yield from <generator>" as something the user can use in her own program. To which I replied:]

...

...
Hm. I think it'll be confusing.

I think the basic case ("just make it work") will be simpler, and the advanced case ("minimise memory/CPU usage") will be more complicated.

Let's agree to disagree on this. I think they are both valid design choices with different trade-offs. We should explore both directions further so as to form a better opinion.

...

...
And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions.

Personally, I'd prefer developers to get a correct solution without having to understand how the whole thing works (the "pit of success"). I'm also sceptical of any other rule being as portable and composable - I don't think a standard library should have APIs where "you must only call this function with yield-from". ('await' in C# is not compulsory - you can take the Task returned from an async method and do whatever you like with it.)

Surely "whatever you like" is constrained by whatever the Task type defines. Maybe it looks like a Future and has a blocking method to wait for the result, like .result() on concurrent.futures.Future? If you want that functionality for generators you just have to call some function, passing it the generator as an argument. Remember, Python doesn't consider that an inferior choice of API design compared to making something a method of the object itself -- witness len(), repr() and many others. FWIW, if I may sound antagonistic, I actually think that we're mostly in violent agreement, and I think we're getting closer to coming up with a sensible set of requirements and possibly even an API proposal. Keep it coming! -- --Guido van Rossum (python.org/~guido)

Eric V. Smith

1:18 a.m.

On 10/21/2012 8:23 PM, Guido van Rossum wrote:

...

I don't see it that way. Any time you acquire a lock, you may be blocked for a long time. In a typical event loop that's an absolute no-no. Typically, to wait for another thread, you give the other thread a callback that adds a new event for *this* thread.

Now, it's possible that in Windows, when using IOCP, the philosophy is different -- I think I've read in http://msdn.microsoft.com/en-us/library/aa365198%28VS.85%29.aspx that there can be multiple threads reading events from a single queue.

Correct. The typical usage of an IOCP is that you create as many threads as you have CPUs (or cores, or execution units, or whatever the kids call them these days), then they can all wait on the same IOCP. So if you have, say 4 CPUs so 4 threads, they can all be woken up to do useful work if the IOCP has work items for them. -- Eric.

Guido van Rossum

4:10 a.m.

On Sun, Oct 21, 2012 at 6:18 PM, Eric V. Smith <eric@trueblade.com> wrote:

...

On 10/21/2012 8:23 PM, Guido van Rossum wrote:

...
I don't see it that way. Any time you acquire a lock, you may be blocked for a long time. In a typical event loop that's an absolute no-no. Typically, to wait for another thread, you give the other thread a callback that adds a new event for *this* thread.

Now, it's possible that in Windows, when using IOCP, the philosophy is different -- I think I've read in http://msdn.microsoft.com/en-us/library/aa365198%28VS.85%29.aspx that there can be multiple threads reading events from a single queue.

Correct. The typical usage of an IOCP is that you create as many threads as you have CPUs (or cores, or execution units, or whatever the kids call them these days), then they can all wait on the same IOCP. So if you have, say 4 CPUs so 4 threads, they can all be woken up to do useful work if the IOCP has work items for them.

So what's the typical way to do locking in such a system? Waiting for a lock seems bad; and you can't assume that no other callbacks may run while you are running. What synchronization primitives are typically used? -- --Guido van Rossum (python.org/~guido)

Eric V. Smith

11:21 a.m.

On 10/22/2012 12:10 AM, Guido van Rossum wrote:

...

On Sun, Oct 21, 2012 at 6:18 PM, Eric V. Smith <eric@trueblade.com> wrote:

...
On 10/21/2012 8:23 PM, Guido van Rossum wrote:

...
I don't see it that way. Any time you acquire a lock, you may be blocked for a long time. In a typical event loop that's an absolute no-no. Typically, to wait for another thread, you give the other thread a callback that adds a new event for *this* thread.

Now, it's possible that in Windows, when using IOCP, the philosophy is different -- I think I've read in http://msdn.microsoft.com/en-us/library/aa365198%28VS.85%29.aspx that there can be multiple threads reading events from a single queue.

Correct. The typical usage of an IOCP is that you create as many threads as you have CPUs (or cores, or execution units, or whatever the kids call them these days), then they can all wait on the same IOCP. So if you have, say 4 CPUs so 4 threads, they can all be woken up to do useful work if the IOCP has work items for them.

So what's the typical way to do locking in such a system? Waiting for a lock seems bad; and you can't assume that no other callbacks may run while you are running. What synchronization primitives are typically used?

When I've done it (admittedly 10 years ago) we just used critical sections, since we weren't blocking for long (mostly memory management). I'm not sure if that's a best practice or not. The IOCP will actually let you block, then it will release another thread. So if you know you're going to block, you should create more threads than you have CPUs. Here's the relevant paragraph from the IOCP link you posted above: "The system also allows a thread waiting in GetQueuedCompletionStatus to process a completion packet if another running thread associated with the same I/O completion port enters a wait state for other reasons, for example the SuspendThread function. When the thread in the wait state begins running again, there may be a brief period when the number of active threads exceeds the concurrency value. However, the system quickly reduces this number by not allowing any new active threads until the number of active threads falls below the concurrency value. This is one reason to have your application create more threads in its thread pool than the concurrency value. Thread pool management is beyond the scope of this topic, but a good rule of thumb is to have a minimum of twice as many threads in the thread pool as there are processors on the system. For additional information about thread pooling, see Thread Pools."

Steve Dower

5:30 a.m.

(Sorry about cutting context, I'll try not to do that again, but I also try to avoid reposting an entire email.)

...

...
It has synchronisation which is _aware_ of threads, but it never creates, requires or uses them. It simply ensures thread-safe reentrancy, which will be required for any general solution unless it is completely banned from interacting across CPU threads.

I don't see it that way. Any time you acquire a lock, you may be blocked for a long time. In a typical event loop that's an absolute no-no. Typically, to wait for another thread, you give the other thread a callback that adds a new event for *this* thread.

Agreed, but when you're waiting for another thread to stop reading its queue so you can add to it, how are you supposed to queue an event while you wait? The lock in Future is only an issue in result() where we wait for another thread to complete the event, but that is the entire point of that function. FWIW I don't see any scheduler ever calling result(), but there are valid situations for a user to call it (REPL, already on a worker thread, unit tests). Everywhere else the lock is required for thread safety. It could be a different lock to the one in result, but I don't think anything is gained from that. Rewriting Future in C and using CPU CAS primitives might be possible, but probably only of limited value.

...

Now, it's possible that in Windows, when using IOCP, the philosophy is different -- I think I've read in http://msdn.microsoft.com/en-us/library/aa365198%28VS.85%29.aspx that there can be multiple threads reading events from a single queue. But AFAIK, in Twisted and Tornado and similar systems, and probably even in gevent and Stackless, there is a strong culture around having only a single thread handling events (at least only one thread at a time), since the assumption is that as long as you don't suspend, you can trust that the world doesn't change, and that assumption becomes invalid when other threads may also be handling events from the same queue.

This is true, and my understanding is that IOCP is basically just a thread pool, and the 'single queue' means that all the threads are waiting on all the events and you can't guarantee which thread will get which. This is better than creating a new thread for each file, but I think that's all it is meant to be. We can easily write a single thread that can wait on all I/O, scheduling callbacks on the main thread, if necessary. I'm pretty sure that all platforms have better ways to do this though, but because they're all different it will need different implementations.

...

It's possible to design a world where different threads have their own event queues, and this assumption would only be valid for events belonging to the same queue; however that seems complicated. And you still don't want to ever attempt to acquire a *threading* lock, because you end up blocking the entire event loop.

Multiple threads with independent queues should be okay, though definitely an advanced scenario. I'm sure this would be preferable to having multiple processes with one thread/queue each in some cases. In any case, this is easy enough to implement with TLS.

...

...
(*I'm inclined to define [the required Future interface] as 'result()', 'done()', 'add_done_callback()', 'exception()', 'set_result()' and 'set_exception()' functions. Maybe more, but I think that's sufficient. The current '_waiters' list is an optimisation for add_done_callback(), and doesn't need to be part of the interface.)

Agreed. I don't see much use for the cancellation stuff and all the extra complexity that adds to the interface. BTW, I think concurrent.futures.Future doesn't stop you from calling set_result() or set_exception() more than once, which I think is a mistake -- I do enforce that in NDB's Futures.

I agree, there should be no way to set the result or exception more than once. On cancellation, while there is some complexity involved I do think we can make use of 'cancel' and 'cancelled' functions to pass a signal back into the worker: op = do_something_async() # not yielded button.on_click += lambda: op.cancel() try: result = yield op except CancelledError: return False def do_something_async(): f = Future() def threadproc(): total = 0 for i in range(10000): if f.cancelled(): raise CancelledError total += i f.set_result(total) Thread(target=threadproc).run() return f I certainly would not want to see the CancelledError be raised automatically - this is no thread.abort() call - but it may be convenient to have an interface for "self._cancelled = True" and "return self._cancelled" that at least saves people from coming up with their own way of passing it in. The worker may completely ignore it, or complete anyway, but for long running operations it may be very handy. (I'll stop before I start thinking about partial results... :) )

...

[Here you snipped some context. You proposed having public APIs that use "yield <future>" and leaving "yield from <generator>" as something the user can use in her own program. To which I replied:]

...
...
Hm. I think it'll be confusing.

I think the basic case ("just make it work") will be simpler, and the advanced case ("minimise memory/CPU usage") will be more complicated.

Let's agree to disagree on this. I think they are both valid design choices with different trade-offs. We should explore both directions further so as to form a better opinion.

Probably we need some working implementations to code against.

...

...
...
And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions.

Personally, I'd prefer developers to get a correct solution without having to understand how the whole thing works (the "pit of success"). I'm also sceptical of any other rule being as portable and composable - I don't think a standard library should have APIs where "you must only call this function with yield-from". ('await' in C# is not compulsory - you can take the Task returned from an async method and do whatever you like with it.)

Surely "whatever you like" is constrained by whatever the Task type defines. Maybe it looks like a Future and has a blocking method to wait for the result, like .result() on concurrent.futures.Future? If you want that functionality for generators you just have to call some function, passing it the generator as an argument. Remember, Python doesn't consider that an inferior choice of API design compared to making something a method of the object itself -- witness len(), repr() and many others.

I'm interested that you skipped my "portable and composable" claim and went straight for my aside about another language. I'd prefer to avoid introducing top-level names, especially since this is an API with plenty of predecessors... what sort of trouble would we be having if sched or asyncore had claimed 'wait()'? Even more so because it's Python, since it is so easy to overwrite the value. (And as it happens, Task handles both the asynchrony and the callbacks, so it looks a bit like Thread and Future mixed together. Personally, I prefer to keep the concepts separate.)

...

FWIW, if I may sound antagonistic, I actually think that we're mostly in violent agreement, and I think we're getting closer to coming up with a sensible set of requirements and possibly even an API proposal. Keep it coming!

I do my best work when someone is arguing with me :) Cheers, Steve

Guido van Rossum

2:59 p.m.

On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower <Steve.Dower@microsoft.com> wrote: [Stuff about Futures and threads] Personally, I'm interested in designing a system, including an event loop, where you can rely on the properties of cooperative scheduling to avoid ever touching (OS) threading locks. I think such a system should be "pure" and all interaction with threads should be mediated by the event loop. (It's okay if this means that the implementation of the event loop must at some point acquire a threading lock.) The Futures used by the tasks to coordinate amongst themselves should not require locking -- they should themselves be able to rely on the guarantees of the event loop not to invoke multiple callbacks in parallel. IIUC you can do this on Windows with IOCP too, simply by only having a single thread reading events.

...

...
...
...
And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions.

Personally, I'd prefer developers to get a correct solution without having to understand how the whole thing works (the "pit of success"). I'm also sceptical of any other rule being as portable and composable - I don't think a standard library should have APIs where "you must only call this function with yield-from". ('await' in C# is not compulsory - you can take the Task returned from an async method and do whatever you like with it.)

Surely "whatever you like" is constrained by whatever the Task type defines. Maybe it looks like a Future and has a blocking method to wait for the result, like .result() on concurrent.futures.Future? If you want that functionality for generators you just have to call some function, passing it the generator as an argument. Remember, Python doesn't consider that an inferior choice of API design compared to making something a method of the object itself -- witness len(), repr() and many others.

I'm interested that you skipped my "portable and composable" claim and went straight for my aside about another language. I'd prefer to avoid introducing top-level names, especially since this is an API with plenty of predecessors... what sort of trouble would we be having if sched or asyncore had claimed 'wait()'? Even more so because it's Python, since it is so easy to overwrite the value.

Sorry, probably just got distracted (I was reading on three different devices while on a family outing :-). But my answer is short: to me, the PEP 380 style is perfectly portable and composable. If you think it isn't, please elaborate.

...

(And as it happens, Task handles both the asynchrony and the callbacks, so it looks a bit like Thread and Future mixed together. Personally, I prefer to keep the concepts separate.)

Same here. -- --Guido van Rossum (python.org/~guido)

Steve Dower

3:55 p.m.

...

Personally, I'm interested in designing a system, including an event loop, where you can rely on the properties of cooperative scheduling to avoid ever touching (OS) threading locks. I think such a system should be "pure" and all interaction with threads should be mediated by the event loop. (It's okay if this means that the implementation of the event loop must at some point acquire a threading lock.) The Futures used by the tasks to coordinate amongst themselves should not require locking -- they should themselves be able to rely on the guarantees of the event loop not to invoke multiple callbacks in parallel.

Unfortunately, a "pure" system means that no async operation can ever have an OS provided callback (or one that comes from outside the world of the scheduler). The purity in this case becomes infectious and limits what operations can be continued from(/waited on/blocked on/yielded/etc.). Only code invoked by the loop could schedule other code for that loop, whether by modifying a queue or setting a Future. This kind of system does not help with callback-based I/O. That's not to say that I want big heavy locks everywhere, but as soon as you potentially have two interrupt-scheduled pieces of code queuing to the same loop you need to synchronise access to the data structure. As soon as you get the state and result of a future non-atomically, you need synchronization. I don't doubt there are ways around this (CAS goes a long way, also the GIL will probably help, assuming it's all Python code), and the current implementation of Future is a bit on the heavy side (but also suitable for much more arbitrary uses), but I really believe that avoiding all locks is a bad idea. (Also, I don't consider cooperative multitasking to be "async" - async requires at least two simultaneous (or at least non-deterministically switching) tasks, whether these are CPU threads or hardware-controlled I/O.)

...

IIUC you can do this on Windows with IOCP too, simply by only having a single thread reading events.

Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any more completions) you need to schedule it back to another thread. This requires synchronization. [ My claim that using "yield from" exclusively is less portable and composable than "yield" predominantly. ]

...

To me, the PEP 380 style is perfectly portable and composable. If you think it isn't, please elaborate.

I think the abstract for PEP 380 sums is up pretty well: "A syntax is proposed for a generator to delegate part of its operations to another generator." Using 'yield from' (YF, for convenience) requires (a) that the caller is a generator and (b) that the callee is a generator. For the scheduling behavior to work correctly, it requires the event loop to be the one enumerating the generator, which means that if "open_async" must be called with YF then the entire user's call stack must be generators. Suddenly, wanting to use one async function has affected every single function. By contrast, with @async/yield, the "scheduler" is actually in @async, so as soon as the function is called the subsequent step can be scheduled. There is no need to yield all the way up to the event loop, since the Future that was yielded inside open_async will queue the continuation when it completes (possibly triggered from another thread). Here, the user still gets the benefits like: def not_an_async_func(): ops = list(map(get_url_async, list_of_urls)) # all URLs are now downloading in parallel, let's do some other synchronous stuff results = list(map(Future.result, ops)) Where multiple tasks are running simultaneously, even though they eventually use a blocking wait (or a wait_all or as_completed). Doing this with YF based tasks will require the user to create the scheduler explicitly (unlike the implicit one with @async) and prevent any other asynchronous tasks from running. (And as I mentioned in earlier emails, YF can be used for its stated purpose by delegating to subgenerators - an @async function is a generator yielding futures, so there is no problem with it YFing subgenerators that also yield futures. But the @async decorator is where they are collected, and not the very base of the stack.) However, as you pointed out earlier, if all you are trying to achieve is "pure" coroutines, then YF is perfectly appropriate. But this is because of the high level of cooperation required between the involved tasklets. As I understand it, coroutines gain me nothing once I call into a long OpenCV operation, because OpenCV does not know that it is supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are great for within a program, but they don't extend so well into libraries, and certainly provide no compatibility with existing ones (whereas, at worst, I can always write "yield thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library [except maybe a threading library... don't take that "any" too literally]). Cheers, Steve

Guido van Rossum

5:34 p.m.

On Mon, Oct 22, 2012 at 8:55 AM, Steve Dower <Steve.Dower@microsoft.com> wrote:

...

...
Personally, I'm interested in designing a system, including an event loop, where you can rely on the properties of cooperative scheduling to avoid ever touching (OS) threading locks. I think such a system should be "pure" and all interaction with threads should be mediated by the event loop. (It's okay if this means that the implementation of the event loop must at some point acquire a threading lock.) The Futures used by the tasks to coordinate amongst themselves should not require locking -- they should themselves be able to rely on the guarantees of the event loop not to invoke multiple callbacks in parallel.

Unfortunately, a "pure" system means that no async operation can ever have an OS provided callback (or one that comes from outside the world of the scheduler). The purity in this case becomes infectious and limits what operations can be continued from(/waited on/blocked on/yielded/etc.). Only code invoked by the loop could schedule other code for that loop, whether by modifying a queue or setting a Future. This kind of system does not help with callback-based I/O.

I'm curious what the Twisted folks have to say about this. Or the folks using gevent. I think your world view is colored by Windows; that's fine, we need input from experienced Windows users. But I can certainly imagine other ways of dealing with this. For example, in CPython, at least, a callback that is called directly by the OS cannot call straight into Python anyway -- you have to acquire the GIL first. This pretty much means that an unconstrained callback directly from the OS cannot call straight into Python -- it has to put something into a queue, and the bytecode interpreter will eventuall call it (possibly in another thread). This is how signal handlers are invoked too.

...

That's not to say that I want big heavy locks everywhere, but as soon as you potentially have two interrupt-scheduled pieces of code

If interrupt-scheduled means what I think it means, this can only be C code. For the Python callback, see above.

...

queuing to the same loop you need to synchronise access to the data structure. As soon as you get the state and result of a future non-atomically, you need synchronization. I don't doubt there are ways around this (CAS goes a long way, also the GIL will probably help, assuming it's all Python code), and the current implementation of Future is a bit on the heavy side (but also suitable for much more arbitrary uses), but I really believe that avoiding all locks is a bad idea.

I don't actually believe we should avoid all locks. I do believe that there should be a separate mechanism, likely OS-specific, whereby the "pure" async world and the "messy" threading world can hand off data to each other. It is probably unavoidable that the implementation of this mechanism touches a threading lock. But this does not mean that the rest of the "pure" world should need to use a Future class that touches threading locks.

...

(Also, I don't consider cooperative multitasking to be "async" - async requires at least two simultaneous (or at least non-deterministically switching) tasks, whether these are CPU threads or hardware-controlled I/O.)

This sounds like a potentially fatal clash in terminology. In the way I use 'async', Twisted, Tornado and gevent certainly qualify, and all those have huge parts of their API where there is no non-deterministic switching in sight -- in fact, they all carefully fence off the part that does interact with threads. For example, the Twisted folks have argued that one of the big advantages of using Twisted's Deferred class is that while a callback is running, the state of the world remains constant (except for actions made by the callback itself, obviously). What other term should we use to encompass this world view (which IMO is a perfectly valid abstraction for a lot of I/O-related concurrency)?

...

...
IIUC you can do this on Windows with IOCP too, simply by only having a single thread reading events.

Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any more completions) you need to schedule it back to another thread. This requires synchronization.

It does sound like this may be unique to Windows, or at least not shared with most of the UNIX world (UNIX ports of IOCP notwithstanding).

...

[ My claim that using "yield from" exclusively is less portable and composable than "yield" predominantly. ]

...
To me, the PEP 380 style is perfectly portable and composable. If you think it isn't, please elaborate.

I think the abstract for PEP 380 sums is up pretty well: "A syntax is proposed for a generator to delegate part of its operations to another generator." Using 'yield from' (YF, for convenience) requires (a) that the caller is a generator and (b) that the callee is a generator. For the scheduling behavior to work correctly, it requires the event loop to be the one enumerating the generator, which means that if "open_async" must be called with YF then the entire user's call stack must be generators. Suddenly, wanting to use one async function has affected every single function.

And that is by design -- Greg *wants* it to be that way, and so far I haven't found a reason to disagree with him. It seems you just fundamentally disagree with the design, but your arguments come from a fundamentally different world view.

...

By contrast, with @async/yield, the "scheduler" is actually in @async, so as soon as the function is called the subsequent step can be scheduled. There is no need to yield all the way up to the event loop, since the Future that was yielded inside open_async will queue the continuation when it completes (possibly triggered from another thread).

Note that in the YF world, there are also ways to stop the yield to bubble all the way to the top. You simply call the generator function, which gives you a generator object, and the scheduler module or class can offer a variety of APIs to do things with it -- e.g. run it without waiting for it (yet), run several of these in parallel until one of them (or all of them) completes, etc.

...

Here, the user still gets the benefits like:

def not_an_async_func(): ops = list(map(get_url_async, list_of_urls)) # all URLs are now downloading in parallel, let's do some other synchronous stuff results = list(map(Future.result, ops))

And in the YF world you can do that too.

...

Where multiple tasks are running simultaneously, even though they eventually use a blocking wait (or a wait_all or as_completed). Doing this with YF based tasks will require the user to create the scheduler explicitly (unlike the implicit one with @async) and prevent any other asynchronous tasks from running.

I don't see that. The user just has to be able to get a reference to the schedule, which should be part of the scheduler's API (e.g. a function in its module that returns the current scheduler instance).

...

(And as I mentioned in earlier emails, YF can be used for its stated purpose by delegating to subgenerators - an @async function is a generator yielding futures, so there is no problem with it YFing subgenerators that also yield futures. But the @async decorator is where they are collected, and not the very base of the stack.)

With YF it doesn't have to be the base of the stack. It just usually is. I feel we are going around in circles.

...

However, as you pointed out earlier, if all you are trying to achieve is "pure" coroutines, then YF is perfectly appropriate. But this is because of the high level of cooperation required between the involved tasklets. As I understand it, coroutines gain me nothing once I call into a long OpenCV operation, because OpenCV does not know that it is supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are great for within a program, but they don't extend so well into libraries, and certainly provide no compatibility with existing ones (whereas, at worst, I can always write "yield thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library [except maybe a threading library... don't take that "any" too literally]).

I don't know what OpenCV is, but assuming it is something that doesn't know about YF, then it needs to run in a thread of its own (or a threadpool). It is perfectly possible to add a primitive operation to the YF scheduler that says "run this in a threadpool and wake me up when it produces a result". The public API for that primitive can certainly use YF itself -- the messing interface with threads can be completely hidden from view. IMO YF scheduler worth using for real work must provide such a primitive (it was one of the first things I had to do in my own prototype, to be able to call socket.getaddrinfo()). -- --Guido van Rossum (python.org/~guido)

Steve Dower

7:18 p.m.

...

...
...
Personally, I'm interested in designing a system, including an event loop, where you can rely on the properties of cooperative scheduling to avoid ever touching (OS) threading locks. I think such a system should be "pure" and all interaction with threads should be mediated by the event loop. (It's okay if this means that the implementation of the event loop must at some point acquire a threading lock.) The Futures used by the tasks to coordinate amongst themselves should not require locking -- they should themselves be able to rely on the guarantees of the event loop not to invoke multiple callbacks in parallel.

Unfortunately, a "pure" system means that no async operation can ever have an OS provided callback (or one that comes from outside the world of the scheduler). The purity in this case becomes infectious and limits what operations can be continued from(/waited on/blocked on/yielded/etc.). Only code invoked by the loop could schedule other code for that loop, whether by modifying a queue or setting a Future. This kind of system does not help with callback-based I/O.

I'm curious what the Twisted folks have to say about this. Or the folks using gevent.

So am I, but my guess would be that as long as you stay within their 'world' everything is fine (I haven't seen any Twisted code to make me believe otherwise, but happy to accept examples - I have no experience with it directly, though I believe I've used similar concepts before). This is fine for a library or framework, but I don't think it's appropriate for a standard library - maybe this is where our views differ?

...

I think your world view is colored by Windows; that's fine, we need input from experienced Windows users. But I can certainly imagine other ways of dealing with this.

Coloured by threads is probably more accurate, but then again, throwing threads around wildly is definitely a Windows thing :). I also have a background in microcontrollers, including writing my own pre-emptive and cooperative schedulers that worked with external devices, so I'm trying to draw on that as much as my Windows experience.

...

For example, in CPython, at least, a callback that is called directly by the OS cannot call straight into Python anyway -- you have to acquire the GIL first. This pretty much means that an unconstrained callback directly from the OS cannot call straight into Python -- it has to put something into a queue, and the bytecode interpreter will eventuall call it (possibly in another thread). This is how signal handlers are invoked too.

I'm nervous about relying on the GIL like this, especially since many (most? all?) other interpreters often promote the fact that they don't have a GIL. In any case, it's an implementation detail - if the lock already exists, then we don't need to add another one, but it will need to be noted (in code comments) that we rely on keeping the GIL during the entire callback (which, as I'll go into more detail on later, I don't expect to be very long at all, ever).

...

...
That's not to say that I want big heavy locks everywhere, but as soon as you potentially have two interrupt-scheduled pieces of code

If interrupt-scheduled means what I think it means, this can only be C code. For the Python callback, see above.

I basically meant it to mean any code running that interrupts the current code, whether because of a callback or preemption. Because of the GIL, you are right, but since arbitrary Python code could release the GIL at any time I don't think we could rely on it.

...

...
queuing to the same loop you need to synchronise access to the data structure. As soon as you get the state and result of a future non-atomically, you need synchronization. I don't doubt there are ways around this (CAS goes a long way, also the GIL will probably help, assuming it's all Python code), and the current implementation of Future is a bit on the heavy side (but also suitable for much more arbitrary uses), but I really believe that avoiding all locks is a bad idea.

I don't actually believe we should avoid all locks. I do believe that there should be a separate mechanism, likely OS-specific, whereby the "pure" async world and the "messy" threading world can hand off data to each other. It is probably unavoidable that the implementation of this mechanism touches a threading lock. But this does not mean that the rest of the "pure" world should need to use a Future class that touches threading locks.

We can achieve this by making the implementation of Future a property of the scheduler. So rather than using 'concurrent.futures.Future' to construct a new future, it could be 'concurrent.eventloop.get_current().Future()'. This way a user can choose a non-thread safe event loop if they know they don't need one (though I guess users/libraries could use a thread-safe Future deliberately when they know that a thread will be involved). This adds another level of optimization on top of the 'get_future_for' function I've already suggested, and does it without exposing any complexity to the user.

...

...
(Also, I don't consider cooperative multitasking to be "async" - async requires at least two simultaneous (or at least non-deterministically switching) tasks, whether these are CPU threads or hardware-controlled I/O.)

This sounds like a potentially fatal clash in terminology. In the way I use 'async', Twisted, Tornado and gevent certainly qualify, and all those have huge parts of their API where there is no non-deterministic switching in sight -- in fact, they all carefully fence off the part that does interact with threads. For example, the Twisted folks have argued that one of the big advantages of using Twisted's Deferred class is that while a callback is running, the state of the world remains constant (except for actions made by the callback itself, obviously).

What other term should we use to encompass this world view (which IMO is a perfectly valid abstraction for a lot of I/O-related concurrency)?

It depends on the significance of the callback. In my world view, the callback only ever schedules a task (or I sometime use the word 'continuation') in the main loop. Because the callback could run anywhere, it needs to synchronise the queue, but the continuation is going to run synchronously anyway, so it does not require any locks. (I included the with_options(f, callback_context=None) function to allow the continuation to run wherever the callback does, which _would_ require synchronization, but it also requires an explicit declaration by the developer that they know what they are doing.)

...

...
...
IIUC you can do this on Windows with IOCP too, simply by only having a single thread reading events.

Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any more completions) you need to schedule it back to another thread. This requires synchronization.

It does sound like this may be unique to Windows, or at least not shared with most of the UNIX world (UNIX ports of IOCP notwithstanding).

IOCP looks like a solution to a problem that was so common they shared it with everyone (I don't say it _IS_ a solution, because I know nothing about its history and I have to be careful of anything I say being taken as fact). You can create threads in any OS to wait for blocking I/O, so it's probably most accurate to say it's unique to IOCP or threadpools in general. Again, it's an implementation detail that doesn't change the public API, which is required to execute continuations within the event loop.

...

...
However, as you pointed out earlier, if all you are trying to achieve is "pure" coroutines, then YF is perfectly appropriate. But this is because of the high level of cooperation required between the involved tasklets. As I understand it, coroutines gain me nothing once I call into a long OpenCV operation, because OpenCV does not know that it is supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are great for within a program, but they don't extend so well into libraries, and certainly provide no compatibility with existing ones (whereas, at worst, I can always write "yield thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library [except maybe a threading library... don't take that "any" too literally]).

I don't know what OpenCV is, but assuming it is something that doesn't know about YF, then it needs to run in a thread of its own (or a threadpool). It is perfectly possible to add a primitive operation to the YF scheduler that says "run this in a threadpool and wake me up when it produces a result". The public API for that primitive can certainly use YF itself -- the messing interface with threads can be completely hidden from view. IMO YF scheduler worth using for real work must provide such a primitive (it was one of the first things I had to do in my own prototype, to be able to call socket.getaddrinfo()).

Here's that violent agreement again :) I think this may be a difference of opinion on API design: with @async the user never needs to touch the scheduler directly. All they need are tools that are already in the standard library - threads and futures - and presumably the new set of *_async() functions we will add. The only new thing to learn is @async (and for advanced users, with_options() and YF, but having taught Python to classes of undergraduates I can guarantee that not everyone needs these). Cheers, Steve

Guido van Rossum

8:26 p.m.

On Mon, Oct 22, 2012 at 12:18 PM, Steve Dower <Steve.Dower@microsoft.com> wrote: [Quoting me]

...

...
For example, in CPython, at least, a callback that is called directly by the OS cannot call straight into Python anyway -- you have to acquire the GIL first. This pretty much means that an unconstrained callback directly from the OS cannot call straight into Python -- it has to put something into a queue, and the bytecode interpreter will eventuall call it (possibly in another thread). This is how signal handlers are invoked too.

I'm nervous about relying on the GIL like this, especially since many (most? all?) other interpreters often promote the fact that they don't have a GIL. In any case, it's an implementation detail - if the lock already exists, then we don't need to add another one, but it will need to be noted (in code comments) that we rely on keeping the GIL during the entire callback (which, as I'll go into more detail on later, I don't expect to be very long at all, ever).

Ok, forget the GIL (though PyPy has one). Anyway, the existing mechanism I was referring to does *not* guarantee that the callback keeps the GIL as long as it runs. The GIL is used to emulate preemptive scheduling while still protecting CPython's internal data structures from concurrent access. It makes no guarantees for user data. Even "x = d[key]" may release the GIL if the dict contains keys whose __eq__ is implemented in Python. But the crucial point of the mechanism is that you don't call straight into Python from the OS-level callback (which is written in C or some other low-level language). You arrange for the interpreter to call the Python-level callback at some later time. So you might as well use this to enforce single-threading, if that's the way of your world.

...

...
...
That's not to say that I want big heavy locks everywhere, but as soon as you potentially have two interrupt-scheduled pieces of code

If interrupt-scheduled means what I think it means, this can only be C code. For the Python callback, see above.

I basically meant it to mean any code running that interrupts the current code, whether because of a callback or preemption. Because of the GIL, you are right, but since arbitrary Python code could release the GIL at any time I don't think we could rely on it.

At least in CPython, it's not just the GIL. The queue I'm talking about above must exist even in a CPython version that has no threading support (and hence no GIL). You still cannot call into Python from a signal handler or other callback called directly by the OS kernel. You must delay it until the bytecode interpreter is at a good stopping point. Check out this code: http://hg.python.org/cpython/file/daad150b4670/Python/ceval.c#l496 (AddPendingCall and friends).

...

...
...
queuing to the same loop you need to synchronise access to the data structure. As soon as you get the state and result of a future non-atomically, you need synchronization. I don't doubt there are ways around this (CAS goes a long way, also the GIL will probably help, assuming it's all Python code), and the current implementation of Future is a bit on the heavy side (but also suitable for much more arbitrary uses), but I really believe that avoiding all locks is a bad idea.

I don't actually believe we should avoid all locks. I do believe that there should be a separate mechanism, likely OS-specific, whereby the "pure" async world and the "messy" threading world can hand off data to each other. It is probably unavoidable that the implementation of this mechanism touches a threading lock. But this does not mean that the rest of the "pure" world should need to use a Future class that touches threading locks.

We can achieve this by making the implementation of Future a property of the scheduler. So rather than using 'concurrent.futures.Future' to construct a new future, it could be 'concurrent.eventloop.get_current().Future()'. This way a user can choose a non-thread safe event loop if they know they don't need one (though I guess users/libraries could use a thread-safe Future deliberately when they know that a thread will be involved). This adds another level of optimization on top of the 'get_future_for' function I've already suggested, and does it without exposing any complexity to the user.

Yes, this sounds find. I note that the existing APIs already encourage leaving the creation of the Future to library code -- you don't construct a Future, typically, but call an executor's submit() method.

...

...
...
(Also, I don't consider cooperative multitasking to be "async" - async requires at least two simultaneous (or at least non-deterministically switching) tasks, whether these are CPU threads or hardware-controlled I/O.)

This sounds like a potentially fatal clash in terminology. In the way I use 'async', Twisted, Tornado and gevent certainly qualify, and all those have huge parts of their API where there is no non-deterministic switching in sight -- in fact, they all carefully fence off the part that does interact with threads. For example, the Twisted folks have argued that one of the big advantages of using Twisted's Deferred class is that while a callback is running, the state of the world remains constant (except for actions made by the callback itself, obviously).

What other term should we use to encompass this world view (which IMO is a perfectly valid abstraction for a lot of I/O-related concurrency)?

It depends on the significance of the callback. In my world view, the callback only ever schedules a task (or I sometime use the word 'continuation') in the main loop. Because the callback could run anywhere, it needs to synchronise the queue, but the continuation is going to run synchronously anyway, so it does not require any locks. (I included the with_options(f, callback_context=None) function to allow the continuation to run wherever the callback does, which _would_ require synchronization, but it also requires an explicit declaration by the developer that they know what they are doing.)

Hm. I guess you are talking about the low-level (or should I say OS-kernel-called) callback; most event frameworks for Python (except perhaps gevent?) use user-level callback extensively -- in fact that's where Twisted wants you to do all the work. So, again a clash of terminology... (Aside: please don't use 'continuation' for 'task'. The use of this term in Scheme has forever tainted the word for me.)

...

...
...
...
IIUC you can do this on Windows with IOCP too, simply by only having a single thread reading events.

Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any more completions) you need to schedule it back to another thread. This requires synchronization.

It does sound like this may be unique to Windows, or at least not shared with most of the UNIX world (UNIX ports of IOCP notwithstanding).

IOCP looks like a solution to a problem that was so common they shared it with everyone (I don't say it _IS_ a solution, because I know nothing about its history and I have to be careful of anything I say being taken as fact). You can create threads in any OS to wait for blocking I/O, so it's probably most accurate to say it's unique to IOCP or threadpools in general. Again, it's an implementation detail that doesn't change the public API, which is required to execute continuations within the event loop.

So maybe IOCP is not all that relevant. Very early on in this discussion, IOCP was brought up as an important example of a system for async I/O that had a significantly *different* API than the typical select/poll/etc.-based systems found on UNIX platforms. But its relevance may well decompose into a few separable concerns: - Don't assume everything is a file descriptor. - On some systems, the natural way to do async I/O is *not* to wait until the socket (or other event source) is ready, but to ask it to perform a specific operation in "overlapping" (or async) mode, and you will get an event back when it is done. - Event queues are powerful. - You cannot ignore threads everywhere.

...

...
...
However, as you pointed out earlier, if all you are trying to achieve is "pure" coroutines, then YF is perfectly appropriate. But this is because of the high level of cooperation required between the involved tasklets. As I understand it, coroutines gain me nothing once I call into a long OpenCV operation, because OpenCV does not know that it is supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are great for within a program, but they don't extend so well into libraries, and certainly provide no compatibility with existing ones (whereas, at worst, I can always write "yield thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library [except maybe a threading library... don't take that "any" too literally]).

I don't know what OpenCV is, but assuming it is something that doesn't know about YF, then it needs to run in a thread of its own (or a threadpool). It is perfectly possible to add a primitive operation to the YF scheduler that says "run this in a threadpool and wake me up when it produces a result". The public API for that primitive can certainly use YF itself -- the messing interface with threads can be completely hidden from view. IMO YF scheduler worth using for real work must provide such a primitive (it was one of the first things I had to do in my own prototype, to be able to call socket.getaddrinfo()).

Here's that violent agreement again :) I think this may be a difference of opinion on API design: with @async the user never needs to touch the scheduler directly. All they need are tools that are already in the standard library - threads and futures - and presumably the new set of *_async() functions we will add. The only new thing to learn is @async (and for advanced users, with_options() and YF, but having taught Python to classes of undergraduates I can guarantee that not everyone needs these).

But @async must imported from *somewhere*, and that's where the decisions are made on how the scheduler works. If you want to use a different scheduler you still have to import a different @async. (TBH I don't understand your with_options() thing. If that's how you propose switching scheduler implementations, there's still a default behavior that you'd have to change on a per-call basis.) And about threads and futures: I am making a principled stance that you shouldn't have to use threads, and you shouldn't have to use a future implementation that's tied to threads. But maybe we should hear from some Twisted folks... -- --Guido van Rossum (python.org/~guido)

Guido van Rossum

8:58 p.m.

Steve, I realize that continued point-by-point rebuttals probably are getting pointless. Maybe your enthusiasm and energy would be better spent trying to propose and implement (a prototype) of an API in the style that you prefer? Maybe we can battle it out in code more easily... -- --Guido van Rossum (python.org/~guido)

Steve Dower

9:32 p.m.

Sounds good. I'll make some revisions to the code I posted earlier and come up with some comparable/benchmarkable examples. Apart from the network server and client examples that have already been discussed, any particular problems I should be looking at solving with this? (Anyone?) I don't want to only come up with 'good' examples. -----Original Message----- From: gvanrossum@gmail.com [mailto:gvanrossum@gmail.com] On Behalf Of Guido van Rossum Sent: Monday, October 22, 2012 1358 To: Steve Dower Cc: python-ideas@python.org Subject: Re: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) Steve, I realize that continued point-by-point rebuttals probably are getting pointless. Maybe your enthusiasm and energy would be better spent trying to propose and implement (a prototype) of an API in the style that you prefer? Maybe we can battle it out in code more easily... -- --Guido van Rossum (python.org/~guido)

Guido van Rossum

9:41 p.m.

On Mon, Oct 22, 2012 at 2:32 PM, Steve Dower <Steve.Dower@microsoft.com> wrote:

...

Sounds good. I'll make some revisions to the code I posted earlier and come up with some comparable/benchmarkable examples.

Apart from the network server and client examples that have already been discussed, any particular problems I should be looking at solving with this? (Anyone?) I don't want to only come up with 'good' examples.

I have a prototype implementing an async web client that fetches a page given a URL. Primitives I have in mind include running several of these concurrently and waiting for the first to come up with a result, or waiting for all results, or getting the results as they are ready. I have an event loop that can use select, poll, epoll, and kqueue (though I've only lightly tested it, on Linux and OSX, so I'm sure I've missed some corner cases and optimization possibilities). The fetcher calls socket.getaddrinfo() in a threadpool. -- --Guido van Rossum (python.org/~guido)

Greg Ewing

10:33 p.m.

Guido van Rossum wrote:

...

(Aside: please don't use 'continuation' for 'task'. The use of this term in Scheme has forever tainted the word for me.)

It has a broader meaning than the one in Scheme; essentially it's a synonym for "callback". I agree it shouldn't be used as a synonym for "task", though. In any of its forms, a continuation isn't an entire task, it's something that you call to cause the resumption of a task from a particular suspension point. -- Greg

Guido van Rossum

10:35 p.m.

On Mon, Oct 22, 2012 at 3:33 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

Guido van Rossum wrote:

...
(Aside: please don't use 'continuation' for 'task'. The use of this term in Scheme has forever tainted the word for me.)

It has a broader meaning than the one in Scheme; essentially it's a synonym for "callback".

(Off-topic:) But does that meaning apply to Scheme? If so, I wish someone would have told me 15 years ago...

...

I agree it shouldn't be used as a synonym for "task", though. In any of its forms, a continuation isn't an entire task, it's something that you call to cause the resumption of a task from a particular suspension point.

I guess that was just Steve showing off. :-) -- --Guido van Rossum (python.org/~guido)

Steve Dower

10:49 p.m.

Alertable I/O (<http://msdn.microsoft.com/en-us/library/windows/desktop/aa363772.aspx>) and overlapped I/O are two alternatives to IOCP on Windows.

...

...
I agree [continuation] shouldn't be used as a synonym for "task", though. In any of its forms, a continuation isn't an entire task, it's something that you call to cause the resumption of a task from a particular suspension point.

I guess that was just Steve showing off. :-)

Not intentionally - the team here that did async/await in C# talks a lot about "continuation-passing style", which is where I picked the term up from. I don't use it as a synonym for "task" - it's always meant the "bit that runs after we come back from the yield" (hmm... I think that definition needs some work...).

Guido van Rossum

10:56 p.m.

On Mon, Oct 22, 2012 at 3:49 PM, Steve Dower <Steve.Dower@microsoft.com> wrote:

...

Alertable I/O (<http://msdn.microsoft.com/en-us/library/windows/desktop/aa363772.aspx>) and overlapped I/O are two alternatives to IOCP on Windows.

...
...
I agree [continuation] shouldn't be used as a synonym for "task", though. In any of its forms, a continuation isn't an entire task, it's something that you call to cause the resumption of a task from a particular suspension point.

I guess that was just Steve showing off. :-)

Not intentionally - the team here that did async/await in C# talks a lot about "continuation-passing style", which is where I picked the term up from. I don't use it as a synonym for "task" - it's always meant the "bit that runs after we come back from the yield" (hmm... I think that definition needs some work...).

Yeah, I have the same terminology hang-up with the term "continuation-passing-style" for web callbacks. Reading back what you wrote, you were indeed trying to distinguish between the "callback" (which you consider the thing that's directly invoked by the OS) and "the rest of the task" (e.g. the code that runs when the yield is resumed), which you were referring to as "continuation". I'd just use "the rest of the task" here. -- --Guido van Rossum (python.org/~guido)

Greg Ewing

11:48 p.m.

Guido van Rossum wrote:

...

On Mon, Oct 22, 2012 at 3:33 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...
It has a broader meaning than the one in Scheme; essentially it's a synonym for "callback".

(Off-topic:) But does that meaning apply to Scheme? If so, I wish someone would have told me 15 years ago...

It does, in the sense that a continuation appears to the Scheme programmer as a callable object. The connection goes deeper as well. There's a style of programming called "continuation-passing style", in which nothing ever returns -- every function is passed another function to be called with its result. In a language such as Scheme that supports tail calls, you can use this style extensively without fear of overflowing the call stack. You're using this style whenever you chain callbacks together using Futures or Deferreds. The callbacks don't return values; instead, each callback arranges for another callback to be called, passing it the result. This is also the way monadic I/O works in Haskell. None of the I/O functions ever return, they just call another function and pass it the result. A combination of currying and syntactic sugar is used to hide the fact that you're passing callbacks -- aka continuations -- around all over the place. Now, it turns out that you can define all the semantics of Scheme, including its continuations, by writing a Scheme interpreter in Scheme that doesn't itself use Scheme continuations. You do it by writing the whole interpereter in continuation-passing style, and it becomes clear that at that level, the "continuations" are just ordinary functions, relying on lexical scoping to capture all of the necessary state.

...

I guess that was just Steve showing off. :-)

Not really -- to someone with a Scheme or FP background, it's near-impossible to look at something like a chain of Deferred callbacks without the word "continuation" springing to mind. I agree that it's not helpful to anyone without such a background, however. -- Greg

Guido van Rossum

11:54 p.m.

And, predictably, that gave me a headache... :-) --Guido van Rossum (sent from Android phone) On Oct 22, 2012 4:49 PM, "Greg Ewing" <greg.ewing@canterbury.ac.nz> wrote:

...

Guido van Rossum wrote:

...
On Mon, Oct 22, 2012 at 3:33 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

It has a broader meaning than the one in Scheme; essentially

...
it's a synonym for "callback".

(Off-topic:) But does that meaning apply to Scheme? If so, I wish someone would have told me 15 years ago...

It does, in the sense that a continuation appears to the Scheme programmer as a callable object.

The connection goes deeper as well. There's a style of programming called "continuation-passing style", in which nothing ever returns -- every function is passed another function to be called with its result. In a language such as Scheme that supports tail calls, you can use this style extensively without fear of overflowing the call stack.

You're using this style whenever you chain callbacks together using Futures or Deferreds. The callbacks don't return values; instead, each callback arranges for another callback to be called, passing it the result.

This is also the way monadic I/O works in Haskell. None of the I/O functions ever return, they just call another function and pass it the result. A combination of currying and syntactic sugar is used to hide the fact that you're passing callbacks -- aka continuations -- around all over the place.

Now, it turns out that you can define all the semantics of Scheme, including its continuations, by writing a Scheme interpreter in Scheme that doesn't itself use Scheme continuations. You do it by writing the whole interpereter in continuation-passing style, and it becomes clear that at that level, the "continuations" are just ordinary functions, relying on lexical scoping to capture all of the necessary state.

I guess that was just Steve showing off. :-)

...
Not really -- to someone with a Scheme or FP background, it's near-impossible to look at something like a chain of Deferred callbacks without the word "continuation" springing to mind. I agree that it's not helpful to anyone without such a background, however.

-- Greg

______________________________**_________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/**mailman/listinfo/python-ideas<http://mail.python.org/mailman/listinfo/python-ideas>

Greg Ewing

23 Oct 23 Oct

12:07 a.m.

Guido van Rossum wrote:

...

And, predictably, that gave me a headache... :-)

Oops, sorry, Guido -- I shouldn't have mentioned the M-word. :-) -- Greg

Christian Tismer

24 Oct 24 Oct

12:24 a.m.

On 23.10.12 00:35, Guido van Rossum wrote:

...

On Mon, Oct 22, 2012 at 3:33 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...
Guido van Rossum wrote:

...
(Aside: please don't use 'continuation' for 'task'. The use of this term in Scheme has forever tainted the word for me.) It has a broader meaning than the one in Scheme; essentially it's a synonym for "callback". (Off-topic:) But does that meaning apply to Scheme? If so, I wish someone would have told me 15 years ago...

As used quite often, the definition is more like "half a coroutine", that means the part that can resume it at some point. Sticking two together, you get a coroutine (tasklet, greenlet etc). The are one-shot continuations, they are gone after resuming. The meaning in Scheme is much weider, and you were right to be scared. In Scheme, these beasts survive their reactivation as a constant. My big design error in 1998 was to implement exactly those full continuations for Python. I'm scared myself when I recall that ... ;-) ciao - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Sam Rushing

12:58 a.m.

On 10/23/12 5:24 PM, Christian Tismer wrote:

...

As used quite often, the definition is more like "half a coroutine", that means the part that can resume it at some point. Sticking two together, you get a coroutine (tasklet, greenlet etc). The are one-shot continuations, they are gone after resuming.

The meaning in Scheme is much weider, and you were right to be scared. In Scheme, these beasts survive their reactivation as a constant. My big design error in 1998 was to implement exactly those full continuations for Python.

I'm scared myself when I recall that ... ;-)

Come on Christian, take the red pill and see how far down the rabbit hole goes... 8^) https://github.com/samrushing/irken-compiler I never noticed before, but there really are two different meanings to 'continuation': 1) in the phrase 'continuation-passing style', it means a 'callback' of sorts. 2) as a separate term, it means an object that represents the future of a computation. Like Greg said, you can apply the CPS transformation to any bit of code (or write it that way from the start), and when you do you might be tempted to refer to your callbacks as 'continuations'. -Sam

Steve Dower

1:53 a.m.

Since I was the one to first mention the term 'continuation' in this discussion, I'll clarify that I meant it as the "'callback' of sorts", and specifically in the situation where the person writing it does not realise that it is a callback. For example: @async def my_func(): # part a x = yield y # part b Part B is the continuation here - the piece of code that continues after 'y' completes. There are various other pieces involved (a callback and a generator, and possibly others, depending on the implementation) so rather than muddying the waters with adjectives I muddied the waters with a noun. "The rest of the task" is close enough (when used in context) that I'm happy to stick to that. "Callback" is an implementation detail IMO, and not one that is necessary to leak through our abstraction. (I also didn't realise people were so traumatised by the C-word, or I would have picked another one. Add this to the list of reasons to not learn functional programming... :) )

Greg Ewing

5:06 a.m.

Sam Rushing wrote:

...

1) in the phrase 'continuation-passing style', it means a 'callback' of sorts. 2) as a separate term, it means an object that represents the future of a computation.

They're not really different things. When you call a continuation function in a continuation-passing style program, you're effectively invoking *all* of the rest of the computation, not just the part represented by that function. This becomes particularly clear if you're able to make the continuation calls using tail calls. Then it's not so much a "callback" as a "callforward". Nothing ever returns, so forward is the only way to go! -- Greg

Greg Ewing

22 Oct 22 Oct

10:09 p.m.

Guido van Rossum wrote:

...

On Mon, Oct 22, 2012 at 8:55 AM, Steve Dower <Steve.Dower@microsoft.com> wrote:

...

...
Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any more completions) you need to schedule it back to another thread. This requires synchronization.

I think there's an assumption behind this whole async tasks discussion that the tasks being scheduled are I/O bound. We're trying to overlap CPU activity with I/O, and different I/O activities with each other. We're *not* trying to achieve concurrency of CPU-bound tasks -- the GIL prevents that anyway for pure Python code. The whole Windows IOCP thing, on the other hand, seems to be geared towards having a pool of threads, any of which can handle any I/O operation. That's not helpful for us; when one of our tasks blocks waiting for I/O, the completion of that I/O must wake up *that particular task*, and it must be run using the same OS thread that was running it before. I gather that Windows provides a way of making an async I/O request and specifying a callback for that request. If that's the case, do we need to bother with an IOCP at all? Just have the callback wake up the associated task directly. -- Greg

Guido van Rossum

10:30 p.m.

On Mon, Oct 22, 2012 at 3:09 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

I think there's an assumption behind this whole async tasks discussion that the tasks being scheduled are I/O bound. We're trying to overlap CPU activity with I/O, and different I/O activities with each other. We're *not* trying to achieve concurrency of CPU-bound tasks -- the GIL prevents that anyway for pure Python code.

Right. Of course.

...

The whole Windows IOCP thing, on the other hand, seems to be geared towards having a pool of threads, any of which can handle any I/O operation. That's not helpful for us; when one of our tasks blocks waiting for I/O, the completion of that I/O must wake up *that particular task*, and it must be run using the same OS thread that was running it before.

The reason we can't ignore IOCP is that it is apparently the *only* way to do async I/O in a scalable way. The only other polling primitive available is select() which does not scale. (Or so it is asserted by many folks; I haven't tested this, but I believe the argument against select() scaling in general.)

...

I gather that Windows provides a way of making an async I/O request and specifying a callback for that request. If that's the case, do we need to bother with an IOCP at all? Just have the callback wake up the associated task directly.

AFAICT the way to do that goes through IOCP... -- --Guido van Rossum (python.org/~guido)

Greg Ewing

11:04 p.m.

Guido van Rossum wrote:

...

The reason we can't ignore IOCP is that it is apparently the *only* way to do async I/O in a scalable way. The only other polling primitive available is select() which does not scale.

There seems to be an alternative to polling, though. There are functions called ReadFileEx and WriteFileEx that allow you to pass in a routine to be called when the operation completes: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365468%28v=vs.85%2... http://msdn.microsoft.com/en-us/library/windows/desktop/aa365748%28v=vs.85%2... Is there some reason that this doesn't scale either? -- Greg

Guido van Rossum

11:09 p.m.

On Mon, Oct 22, 2012 at 4:04 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

Guido van Rossum wrote:

...
The reason we can't ignore IOCP is that it is apparently the *only* way to do async I/O in a scalable way. The only other polling primitive available is select() which does not scale.

There seems to be an alternative to polling, though. There are functions called ReadFileEx and WriteFileEx that allow you to pass in a routine to be called when the operation completes:

http://msdn.microsoft.com/en-us/library/windows/desktop/aa365468%28v=vs.85%2... http://msdn.microsoft.com/en-us/library/windows/desktop/aa365748%28v=vs.85%2...

Is there some reason that this doesn't scale either?

I don't know, we've reached territory I don't know at all. Are there also similar calls for Accept() and Connect() on sockets? Those seem the other major blocking primitives that are frequently used. FWIW, here is where I read about IOCP being the only scalable way on Windows: http://google-opensource.blogspot.com/2010/01/libevent-20x-like-libevent-14x... -- --Guido van Rossum (python.org/~guido)

Tom Hoover

11:36 p.m.

On Mon, Oct 22, 2012 at 4:09 PM, Guido van Rossum <guido@python.org> wrote:

...

On Mon, Oct 22, 2012 at 4:04 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...
Guido van Rossum wrote:

...
The reason we can't ignore IOCP is that it is apparently the *only* way to do async I/O in a scalable way. The only other polling primitive available is select() which does not scale.

There seems to be an alternative to polling, though. There are functions called ReadFileEx and WriteFileEx that allow you to pass in a routine to be called when the operation completes:

http://msdn.microsoft.com/en-us/library/windows/desktop/aa365468%28v=vs.85%2... http://msdn.microsoft.com/en-us/library/windows/desktop/aa365748%28v=vs.85%2...

Is there some reason that this doesn't scale either?

I don't know, we've reached territory I don't know at all. Are there also similar calls for Accept() and Connect() on sockets? Those seem the other major blocking primitives that are frequently used.

FWIW, here is where I read about IOCP being the only scalable way on Windows: http://google-opensource.blogspot.com/2010/01/libevent-20x-like-libevent-14x...

It's been years since I've looked at this stuff, but I believe that you want to use AcceptEx and ConnectEx in conjunction with IOCP. event_iocp.c and listener.c in libevent 2.0.x could help shed some light on the details.

Dino Viehland

23 Oct 23 Oct

4:46 a.m.

Greg wrote:

...

Guido van Rossum wrote:

...
The reason we can't ignore IOCP is that it is apparently the *only* way to do async I/O in a scalable way. The only other polling primitive available is select() which does not scale.

There seems to be an alternative to polling, though. There are functions called ReadFileEx and WriteFileEx that allow you to pass in a routine to be called when the operation completes:

http://msdn.microsoft.com/en- us/library/windows/desktop/aa365468%28v=vs.85%29.aspx http://msdn.microsoft.com/en- us/library/windows/desktop/aa365748%28v=vs.85%29.aspx

Is there some reason that this doesn't scale either?

I suspect it's because it has the completion routine is being invoked on the same thread that issued the I/O. The thread has to first block in an alertable wait (e.g. WaitForMultipleObjectsEx or WSAWaitForMultipleEvents). So you'll only get 1 thread doing I/Os and CPU work vs IOCP's where many threads can share both workloads.

Steve Dower

22 Oct 22 Oct

10:31 p.m.

...

...
...
Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any more completions) you need to schedule it back to another thread. This requires synchronization.

I think there's an assumption behind this whole async tasks discussion that the tasks being scheduled are I/O bound. We're trying to overlap CPU activity with I/O, and different I/O activities with each other. We're *not* trying to achieve concurrency of CPU-bound tasks -- the GIL prevents that anyway for pure Python code.

Sure, but it's easy enough to slip it in for (nearly) free. The only other option is complete exclusion of CPU-bound concurrency, which also rules out running C functions (outside the GIL) on a separate thread.

...

The whole Windows IOCP thing, on the other hand, seems to be geared towards having a pool of threads, any of which can handle any I/O operation. That's not helpful for us; when one of our tasks blocks waiting for I/O, the completion of that I/O must wake up *that particular task*, and it must be run using the same OS thread that was running it before.

I gather that Windows provides a way of making an async I/O request and specifying a callback for that request. If that's the case, do we need to bother with an IOCP at all? Just have the callback wake up the associated task directly.

IOCP is probably not useful at all, and as Guido said, it was brought up as an example of a non-select style of waiting. APIs like ReadFileEx/WriteFileEx let you provide the callback directly without using IOCP. In any case, even if we did use IOCP it would be an implementation detail and would not affect how the API is exposed. (Also, love your work on PEP 380. Despite my hesitation about using yield from for this API, I do really like using it with generators.) Cheers, Steve

Eric Snow

23 Oct 23 Oct

5:07 a.m.

On Mon, Oct 22, 2012 at 9:55 AM, Steve Dower <Steve.Dower@microsoft.com> wrote:

...

I think the abstract for PEP 380 sums is up pretty well: "A syntax is proposed for a generator to delegate part of its operations to another generator." Using 'yield from' (YF, for convenience) requires (a) that the caller is a generator and (b) that the callee is a generator.

Rather, the callee must be some iterable: def f(): yield from [1, 2, 3] for x in f(): print(x) -eric

Benoit Chesneau

7:19 a.m.

New subject: yield from multiple iterables (was Re: The async API of the future: yield-from)

On Oct 22, 2012, at 4:59 PM, Guido van Rossum <guido@python.org> wrote:

...

On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower <Steve.Dower@microsoft.com> wrote: [Stuff about Futures and threads]

Personally, I'm interested in designing a system, including an event loop, where you can rely on the properties of cooperative scheduling to avoid ever touching (OS) threading locks. I think such a system should be "pure" and all interaction with threads should be mediated by the event loop. (It's okay if this means that the implementation of the event loop must at some point acquire a threading lock.) The Futures used by the tasks to coordinate amongst themselves should not require locking -- they should themselves be able to rely on the guarantees of the event loop not to invoke multiple callbacks in parallel.

IIUC you can do this on Windows with IOCP too, simply by only having a single thread reading events.

Maybe it is worth to have a look on libuv and the way it mixes threads and and event loop [1]. Libuv is one of the good event loop around able to use IOCP and other events systems on other arch (kqueue, …) and I was thinking when reading all the exchange around that it would perfectly fit in our cases. Or at least something like it: - It provides a common api for IO watchers: read, write, writelines, readable, writeable that can probably be extend over remote systems - Have a job queue system for threds that is working mostly like the Futures but using the event loop In any case there is a pyuv binding [2] if some want to test. Even a twisted reactor [3] I myself toying with the idea of porting the Go concurrency model to Python [4] using greenlets and pyuv. Both the scheduler and the way IOs are handled: - In Go all coroutines are independent from each others and can only communicate via channel. Which has the advantage to allows them to run on different threads when one is blocking. In normal case they are mostly working like grrenlets on a single thread and are simply scheduled in a round-robin way. (mostly like in stackless). On the difference that goroutines can be executed in parallel. When one is blocking another thread will be created to handle other goroutines in the runnable queue. - For I/Os it exists a common api to all Connections and Listeners (Conn & Listen classes) that generally ask on a poll server. This poll server has for only task to register FDs and wake up the groutines that wait on read or fd events. This this poll server is running in a blocking loop it is automatically let by the scheduler in a thread. This pol server could be likely be replaced by an event loop if someone want. In my opinion the Go concurrency & memory model [5] could perfectly fit in the Python world and I'm surprised none already spoke about it. In flower greenlets could probably be replaced by generators but i like the API proposed by any coroutine pattern. I wonder if continulets [6] couldn't be ported in cpython to handle that… - benoît [1] http://nikhilm.github.com/uvbook/threads.html & http://github.com/joyent/libuv [2] https://github.com/saghul/pyuv [3] https://github.com/saghul/twisted-pyuv [4] https://github.com/benoitc/flower [5] http://golang.org/ref/mem [6] http://doc.pypy.org/en/latest/stackless.html#continulets

Guido van Rossum

2:48 p.m.

Thanks for the pointer to and description of libuv; it had come up in my research yet but so far I have not looked it up actively. Now I will. Also thanks for your reminder of the Goroutine model -- this is definitely something to look at for inspiration as well. (Though does Go run on Windows? Or is it part of a secret anti-Microsoft plan? :-) --Guido On Tue, Oct 23, 2012 at 12:19 AM, Benoit Chesneau <benoitc@gunicorn.org> wrote:

...

On Oct 22, 2012, at 4:59 PM, Guido van Rossum <guido@python.org> wrote:

...
On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower <Steve.Dower@microsoft.com> wrote: [Stuff about Futures and threads]

Personally, I'm interested in designing a system, including an event loop, where you can rely on the properties of cooperative scheduling to avoid ever touching (OS) threading locks. I think such a system should be "pure" and all interaction with threads should be mediated by the event loop. (It's okay if this means that the implementation of the event loop must at some point acquire a threading lock.) The Futures used by the tasks to coordinate amongst themselves should not require locking -- they should themselves be able to rely on the guarantees of the event loop not to invoke multiple callbacks in parallel.

IIUC you can do this on Windows with IOCP too, simply by only having a single thread reading events.

Maybe it is worth to have a look on libuv and the way it mixes threads and and event loop [1]. Libuv is one of the good event loop around able to use IOCP and other events systems on other arch (kqueue, …) and I was thinking when reading all the exchange around that it would perfectly fit in our cases. Or at least something like it:

- It provides a common api for IO watchers: read, write, writelines, readable, writeable that can probably be extend over remote systems - Have a job queue system for threds that is working mostly like the Futures but using the event loop

In any case there is a pyuv binding [2] if some want to test. Even a twisted reactor [3]

I myself toying with the idea of porting the Go concurrency model to Python [4] using greenlets and pyuv. Both the scheduler and the way IOs are handled:

- In Go all coroutines are independent from each others and can only communicate via channel. Which has the advantage to allows them to run on different threads when one is blocking. In normal case they are mostly working like grrenlets on a single thread and are simply scheduled in a round-robin way. (mostly like in stackless). On the difference that goroutines can be executed in parallel. When one is blocking another thread will be created to handle other goroutines in the runnable queue.

- For I/Os it exists a common api to all Connections and Listeners (Conn & Listen classes) that generally ask on a poll server. This poll server has for only task to register FDs and wake up the groutines that wait on read or fd events. This this poll server is running in a blocking loop it is automatically let by the scheduler in a thread. This pol server could be likely be replaced by an event loop if someone want.

In my opinion the Go concurrency & memory model [5] could perfectly fit in the Python world and I'm surprised none already spoke about it.

In flower greenlets could probably be replaced by generators but i like the API proposed by any coroutine pattern. I wonder if continulets [6] couldn't be ported in cpython to handle that…

- benoît

[1] http://nikhilm.github.com/uvbook/threads.html & http://github.com/joyent/libuv [2] https://github.com/saghul/pyuv [3] https://github.com/saghul/twisted-pyuv [4] https://github.com/benoitc/flower [5] http://golang.org/ref/mem [6] http://doc.pypy.org/en/latest/stackless.html#continulets

-- --Guido van Rossum (python.org/~guido)

Brett Cannon

4:09 p.m.

Go is available for Windows: http://golang.org/doc/install#windows On Tue, Oct 23, 2012 at 10:48 AM, Guido van Rossum <guido@python.org> wrote:

...

Thanks for the pointer to and description of libuv; it had come up in my research yet but so far I have not looked it up actively. Now I will. Also thanks for your reminder of the Goroutine model -- this is definitely something to look at for inspiration as well. (Though does Go run on Windows? Or is it part of a secret anti-Microsoft plan? :-)

--Guido

On Tue, Oct 23, 2012 at 12:19 AM, Benoit Chesneau <benoitc@gunicorn.org> wrote:

...
On Oct 22, 2012, at 4:59 PM, Guido van Rossum <guido@python.org> wrote:

...
On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower <

...
...
[Stuff about Futures and threads]

Personally, I'm interested in designing a system, including an event loop, where you can rely on the properties of cooperative scheduling to avoid ever touching (OS) threading locks. I think such a system should be "pure" and all interaction with threads should be mediated by the event loop. (It's okay if this means that the implementation of the event loop must at some point acquire a threading lock.) The Futures used by the tasks to coordinate amongst themselves should not require locking -- they should themselves be able to rely on the guarantees of the event loop not to invoke multiple callbacks in parallel.

IIUC you can do this on Windows with IOCP too, simply by only having a single thread reading events.

Maybe it is worth to have a look on libuv and the way it mixes threads and and event loop [1]. Libuv is one of the good event loop around able to use IOCP and other events systems on other arch (kqueue, …) and I was

Steve.Dower@microsoft.com> wrote: thinking when reading all the exchange around that it would perfectly fit in our cases. Or at least something like it:

...
- It provides a common api for IO watchers: read, write, writelines,

...
- Have a job queue system for threds that is working mostly like the Futures but using the event loop

In any case there is a pyuv binding [2] if some want to test. Even a twisted reactor [3]

I myself toying with the idea of porting the Go concurrency model to Python [4] using greenlets and pyuv. Both the scheduler and the way IOs are handled:

- In Go all coroutines are independent from each others and can only communicate via channel. Which has the advantage to allows them to run on different threads when one is blocking. In normal case they are mostly working like grrenlets on a single thread and are simply scheduled in a round-robin way. (mostly like in stackless). On the difference that goroutines can be executed in parallel. When one is blocking another thread will be created to handle other goroutines in the runnable queue.

- For I/Os it exists a common api to all Connections and Listeners (Conn & Listen classes) that generally ask on a poll server. This poll server has for only task to register FDs and wake up the groutines that wait on read or fd events. This this poll server is running in a blocking loop it is automatically let by the scheduler in a thread. This pol server could be

readable, writeable that can probably be extend over remote systems likely be replaced by an event loop if someone want.

...
In my opinion the Go concurrency & memory model [5] could perfectly fit

in the Python world and I'm surprised none already spoke about it.

...
In flower greenlets could probably be replaced by generators but i like

the API proposed by any coroutine pattern. I wonder if continulets [6] couldn't be ported in cpython to handle that…

...
- benoît

[1] http://nikhilm.github.com/uvbook/threads.html &

http://github.com/joyent/libuv

...
[2] https://github.com/saghul/pyuv [3] https://github.com/saghul/twisted-pyuv [4] https://github.com/benoitc/flower [5] http://golang.org/ref/mem [6] http://doc.pypy.org/en/latest/stackless.html#continulets

-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Guido van Rossum

4:54 p.m.

But does it let you use any Windows APIs? On Tue, Oct 23, 2012 at 9:09 AM, Brett Cannon <brett@python.org> wrote:

...

Go is available for Windows: http://golang.org/doc/install#windows

On Tue, Oct 23, 2012 at 10:48 AM, Guido van Rossum <guido@python.org> wrote:

...
Thanks for the pointer to and description of libuv; it had come up in my research yet but so far I have not looked it up actively. Now I will. Also thanks for your reminder of the Goroutine model -- this is definitely something to look at for inspiration as well. (Though does Go run on Windows? Or is it part of a secret anti-Microsoft plan? :-)

--Guido

On Tue, Oct 23, 2012 at 12:19 AM, Benoit Chesneau <benoitc@gunicorn.org> wrote:

...
On Oct 22, 2012, at 4:59 PM, Guido van Rossum <guido@python.org> wrote:

...
On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower <Steve.Dower@microsoft.com> wrote: [Stuff about Futures and threads]

Personally, I'm interested in designing a system, including an event loop, where you can rely on the properties of cooperative scheduling to avoid ever touching (OS) threading locks. I think such a system should be "pure" and all interaction with threads should be mediated by the event loop. (It's okay if this means that the implementation of the event loop must at some point acquire a threading lock.) The Futures used by the tasks to coordinate amongst themselves should not require locking -- they should themselves be able to rely on the guarantees of the event loop not to invoke multiple callbacks in parallel.

IIUC you can do this on Windows with IOCP too, simply by only having a single thread reading events.

Maybe it is worth to have a look on libuv and the way it mixes threads and and event loop [1]. Libuv is one of the good event loop around able to use IOCP and other events systems on other arch (kqueue, …) and I was thinking when reading all the exchange around that it would perfectly fit in our cases. Or at least something like it:

- It provides a common api for IO watchers: read, write, writelines, readable, writeable that can probably be extend over remote systems - Have a job queue system for threds that is working mostly like the Futures but using the event loop

In any case there is a pyuv binding [2] if some want to test. Even a twisted reactor [3]

I myself toying with the idea of porting the Go concurrency model to Python [4] using greenlets and pyuv. Both the scheduler and the way IOs are handled:

- In Go all coroutines are independent from each others and can only communicate via channel. Which has the advantage to allows them to run on different threads when one is blocking. In normal case they are mostly working like grrenlets on a single thread and are simply scheduled in a round-robin way. (mostly like in stackless). On the difference that goroutines can be executed in parallel. When one is blocking another thread will be created to handle other goroutines in the runnable queue.

- For I/Os it exists a common api to all Connections and Listeners (Conn & Listen classes) that generally ask on a poll server. This poll server has for only task to register FDs and wake up the groutines that wait on read or fd events. This this poll server is running in a blocking loop it is automatically let by the scheduler in a thread. This pol server could be likely be replaced by an event loop if someone want.

In my opinion the Go concurrency & memory model [5] could perfectly fit in the Python world and I'm surprised none already spoke about it.

In flower greenlets could probably be replaced by generators but i like the API proposed by any coroutine pattern. I wonder if continulets [6] couldn't be ported in cpython to handle that…

- benoît

[1] http://nikhilm.github.com/uvbook/threads.html & http://github.com/joyent/libuv [2] https://github.com/saghul/pyuv [3] https://github.com/saghul/twisted-pyuv [4] https://github.com/benoitc/flower [5] http://golang.org/ref/mem [6] http://doc.pypy.org/en/latest/stackless.html#continulets

-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

-- --Guido van Rossum (python.org/~guido)

Brett Cannon

5:08 p.m.

On Tue, Oct 23, 2012 at 12:54 PM, Guido van Rossum <guido@python.org> wrote:

...

But does it let you use any Windows APIs?

That I don't know.

...

...
Go is available for Windows: http://golang.org/doc/install#windows

On Tue, Oct 23, 2012 at 10:48 AM, Guido van Rossum <guido@python.org> wrote:

...
Thanks for the pointer to and description of libuv; it had come up in my research yet but so far I have not looked it up actively. Now I will. Also thanks for your reminder of the Goroutine model -- this is definitely something to look at for inspiration as well. (Though does Go run on Windows? Or is it part of a secret anti-Microsoft plan? :-)

--Guido

On Tue, Oct 23, 2012 at 12:19 AM, Benoit Chesneau <benoitc@gunicorn.org

...
wrote:

...
On Oct 22, 2012, at 4:59 PM, Guido van Rossum <guido@python.org>

wrote:

...
...
On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower <Steve.Dower@microsoft.com> wrote: [Stuff about Futures and threads]

Personally, I'm interested in designing a system, including an event loop, where you can rely on the properties of cooperative scheduling to avoid ever touching (OS) threading locks. I think such a system should be "pure" and all interaction with threads should be mediated by the event loop. (It's okay if this means that the implementation

of

...
...
the event loop must at some point acquire a threading lock.) The Futures used by the tasks to coordinate amongst themselves should not require locking -- they should themselves be able to rely on the guarantees of the event loop not to invoke multiple callbacks in parallel.

IIUC you can do this on Windows with IOCP too, simply by only having a single thread reading events.

Maybe it is worth to have a look on libuv and the way it mixes threads and and event loop [1]. Libuv is one of the good event loop around able to use IOCP and other events systems on other arch (kqueue, …) and I was thinking when reading all the exchange around that it would perfectly fit in our cases. Or at least something like it:

- It provides a common api for IO watchers: read, write, writelines, readable, writeable that can probably be extend over remote systems - Have a job queue system for threds that is working mostly like the Futures but using the event loop

In any case there is a pyuv binding [2] if some want to test. Even a twisted reactor [3]

I myself toying with the idea of porting the Go concurrency model to Python [4] using greenlets and pyuv. Both the scheduler and the way IOs are handled:

- In Go all coroutines are independent from each others and can only communicate via channel. Which has the advantage to allows them to run on different threads when one is blocking. In normal case they are mostly working like grrenlets on a single thread and are simply scheduled in a round-robin way. (mostly like in stackless). On the difference that goroutines can be executed in parallel. When one is blocking another

...
...
...
will be created to handle other goroutines in the runnable queue.

- For I/Os it exists a common api to all Connections and Listeners (Conn & Listen classes) that generally ask on a poll server. This poll server has for only task to register FDs and wake up the groutines that wait on read or fd events. This this poll server is running in a blocking loop it is automatically let by the scheduler in a thread. This pol server could be likely be replaced by an event loop if someone want.

In my opinion the Go concurrency & memory model [5] could perfectly fit in the Python world and I'm surprised none already spoke about it.

In flower greenlets could probably be replaced by generators but i

On Tue, Oct 23, 2012 at 9:09 AM, Brett Cannon <brett@python.org> wrote: thread like

...
...
...
the API proposed by any coroutine pattern. I wonder if continulets [6] couldn't be ported in cpython to handle that…

- benoît

[1] http://nikhilm.github.com/uvbook/threads.html & http://github.com/joyent/libuv [2] https://github.com/saghul/pyuv [3] https://github.com/saghul/twisted-pyuv [4] https://github.com/benoitc/flower [5] http://golang.org/ref/mem [6] http://doc.pypy.org/en/latest/stackless.html#continulets

-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

-- --Guido van Rossum (python.org/~guido)

Vinay Sajip

6:49 p.m.

New subject: yield from multiple iterables (was Re: The async API of the future: yield-from)

Guido van Rossum <guido@...> writes:

...

But does it let you use any Windows APIs?

It seems you can: https://github.com/AllenDang/w32 Quote from that page: "w32 is a wrapper of windows apis for the Go Programming Language. It wraps win32 apis to "Go style" to make them easier to use." Regards, Vinay Sajip

Jim Jewett

7:34 a.m.

On 10/21/12, Guido van Rossum <guido@python.org> wrote:

...

On Sun, Oct 21, 2012 at 1:07 PM, Steve Dower <Steve.Dower@microsoft.com> wrote:

...

...
It has synchronisation which is _aware_ of threads, but it never creates, requires or uses them. It simply ensures thread-safe reentrancy, which will be required for any general solution unless it is completely banned from interacting across CPU threads.

...

I don't see it that way. Any time you acquire a lock, you may be blocked for a long time. In a typical event loop that's an absolute no-no. Typically, to wait for another thread, you give the other thread a callback that adds a new event for *this* thread.

That (with or without rescheduling this thread to actually process the event) is a perfectly reasonable solution, but I'm not sure how obvious it is. People willing to deal with the conventions and contortions of twisted are likely to just use twisted. A general API should have a straightforward way to weight for a result; even explicitly calling wait() may be too much to ask if you want to keep assuming that other events will cooperate.

...

Perhaps. Lots of possibilities in this design space.

...
(*I'm inclined to define this [the Future interface] as 'result()', 'done()', 'add_done_callback()', 'exception()', 'set_result()' and 'set_exception()' functions. Maybe more, but I think that's sufficient. The current '_waiters' list is an optimisation for add_done_callback(), and doesn't need to be part of the interface.)

...

Agreed. I don't see much use for the cancellation stuff and all the extra complexity that adds to the interface.

wait_for_any may well be launching different strategies to solve the same problem, and intending to ignore all but the fastest. It makes sense to go ahead and cancel the slower strategies. (That said, I agree that the API shouldn't guarantee that other tasks are actually cancelled, let alone that they are cancelled before side effects occur.) -jJ

Guido van Rossum

2:54 p.m.

On Tue, Oct 23, 2012 at 12:34 AM, Jim Jewett <jimjjewett@gmail.com> wrote:

...

On 10/21/12, Guido van Rossum <guido@python.org> wrote:

...
On Sun, Oct 21, 2012 at 1:07 PM, Steve Dower <Steve.Dower@microsoft.com> wrote:

...
...
It has synchronisation which is _aware_ of threads, but it never creates, requires or uses them. It simply ensures thread-safe reentrancy, which will be required for any general solution unless it is completely banned from interacting across CPU threads.

...
I don't see it that way. Any time you acquire a lock, you may be blocked for a long time. In a typical event loop that's an absolute no-no. Typically, to wait for another thread, you give the other thread a callback that adds a new event for *this* thread.

That (with or without rescheduling this thread to actually process the event) is a perfectly reasonable solution, but I'm not sure how obvious it is. People willing to deal with the conventions and contortions of twisted are likely to just use twisted.

I think part of my point is that we can package all this up in a way that is a lot less scary than Twisted's reputation. And remember, there are many other frameworks that use similar machinery. There's Tornado, Monocle (which runs on top of Tornado *or* Twisted), and of course the stdlib's asyncore, which is antiquated but still much used -- AFAIL Zope is still built around it.

...

A general API should have a straightforward way to wait for a result; even explicitly calling wait() may be too much to ask if you want to keep assuming that other events will cooperate.

Here I have some real world relevant experience: NDB, App Engine's new Datastore API (which I wrote). It is async under the hood (yield + its own flavor of Futures), and users who want the most performance from their app are encouraged to use the async APIs directly -- but users who don't care can ignore their existence completely. There are thousands of users, and I've seen people explain the async stuff to each other on StackOverflow, so I think it is quite accessible.

...

...
Agreed. I don't see much use for the cancellation stuff and all the extra complexity that adds to the interface.

wait_for_any may well be launching different strategies to solve the same problem, and intending to ignore all but the fastest. It makes sense to go ahead and cancel the slower strategies. (That said, I agree that the API shouldn't guarantee that other tasks are actually cancelled, let alone that they are cancelled before side effects occur.)

Agreed. And it's not hard to implement a custom cancellation mechanism either. -- --Guido van Rossum (python.org/~guido)

Greg Ewing

20 Oct 20 Oct

5:37 a.m.

Nick Coghlan wrote:

...

Please don't lose sight of the fact that yield-based suspension points looking like something other than an ordinary function call is a *feature*, not a bug.

People keep asserting that, but I don't think we have enough experience with the explicit-switching-point-markers-all-the- way-up style of coding to tell whether it's a good idea or not. My gut feeling is that the explicit markers will help at the lowest levels, where you're trying to protect a critical section, but at the upper levels they will just be noise that causes unnecessary worry. In one of Guido's earlier posts (which I can't find now, unfortunately), he said something that made it sound like he was coming around to that point of view too, but he seems to have gone back on that recently. -- Greg

Guido van Rossum

4:30 p.m.

On Fri, Oct 19, 2012 at 10:37 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

Nick Coghlan wrote:

...
Please don't lose sight of the fact that yield-based suspension points looking like something other than an ordinary function call is a *feature*, not a bug.

(Ironically, Christian just revived an old thread where Nick was of a different opinion.)

...

People keep asserting that, but I don't think we have enough experience with the explicit-switching-point-markers-all-the- way-up style of coding to tell whether it's a good idea or not.

Hm. I would say I have a lot of real world experience with this style: App Engine's NDB. It's in use by 1000s of people. I've written a lot of complicated database code in it (integrating several layers of caching). This style really does hold up well. Now, I think that if it could yield from, NDB would be even better, but for most code it would be a very minimal change, and the issue here (the requirement to mark suspension points) is the same. In C# they also have a lot of experience with this style -- functions declared as async must be waited for using await, and the type checker enforces that it's await all the way up (I think a function using await must be declared as async, too).

...

My gut feeling is that the explicit markers will help at the lowest levels, where you're trying to protect a critical section, but at the upper levels they will just be noise that causes unnecessary worry.

Actually, an earlier experience (like 25 years earier :-) suggests that it's at the higher levels where you get in trouble without the markers -- because you still have critical sections in end user code, but it's impossible to remember which functions you call may cause a task switch.

...

In one of Guido's earlier posts (which I can't find now, unfortunately), he said something that made it sound like he was coming around to that point of view too, but he seems to have gone back on that recently.

I was probably more waxing philosophically on the reasons why people like greenlets/gevent (if they like it). I feel I am pretty consistently in favor of marking switch points, at least in the context we are currently discussing (where high-speed async event handling is the thing to do). For less-performant situations I'm fine with writing classic synchronous-looking code, and running it in multiple OS threads for concurrency reasons. But the purpose of designing a new async API is to break the barrier of one thread per connection. -- --Guido van Rossum (python.org/~guido)

Christian Tismer

5:48 p.m.

On 20.10.12 18:30, Guido van Rossum wrote:

...

On Fri, Oct 19, 2012 at 10:37 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...
Nick Coghlan wrote:

...
Please don't lose sight of the fact that yield-based suspension points looking like something other than an ordinary function call is a *feature*, not a bug. (Ironically, Christian just revived an old thread where Nick was of a different opinion.)

...
People keep asserting that, but I don't think we have enough experience with the explicit-switching-point-markers-all-the- way-up style of coding to tell whether it's a good idea or not. Hm. I would say I have a lot of real world experience with this style: App Engine's NDB. It's in use by 1000s of people. I've written a lot of complicated database code in it (integrating several layers of caching). This style really does hold up well.

Now, I think that if it could yield from, NDB would be even better, but for most code it would be a very minimal change, and the issue here (the requirement to mark suspension points) is the same.

In C# they also have a lot of experience with this style -- functions declared as async must be waited for using await, and the type checker enforces that it's await all the way up (I think a function using await must be declared as async, too).

...
My gut feeling is that the explicit markers will help at the lowest levels, where you're trying to protect a critical section, but at the upper levels they will just be noise that causes unnecessary worry. Actually, an earlier experience (like 25 years earier :-) suggests that it's at the higher levels where you get in trouble without the markers -- because you still have critical sections in end user code, but it's impossible to remember which functions you call may cause a task switch.

...
In one of Guido's earlier posts (which I can't find now, unfortunately), he said something that made it sound like he was coming around to that point of view too, but he seems to have gone back on that recently. I was probably more waxing philosophically on the reasons why people like greenlets/gevent (if they like it). I feel I am pretty consistently in favor of marking switch points, at least in the context we are currently discussing (where high-speed async event handling is the thing to do).

For less-performant situations I'm fine with writing classic synchronous-looking code, and running it in multiple OS threads for concurrency reasons. But the purpose of designing a new async API is to break the barrier of one thread per connection.

It is of course a bit confusing to find out who thought what and when ;-) And yes, I see your point, but I have difficulties to see how it is done best. If I take Stackless as an example, well, there would everything potentially be marked as some codef, because it is simply everywhere enabled. But just for the fact that something _supports_ suspension or switching is IMHO not enough reason to clutter the code everywhere. What I think is need is a way to distinguish critical code paths. Not sure how this should be. Maybe it's my limited understanding. The generator-based functions do not get switched from alone. If they want to do that, they call some special function, and I would mark them for doing that. But all the tree up? I can't see the benefit so much. Maybe it would be less verbose to have decorators that assert something does _not_ switch, like guards? Or maybe add properties? I agree that one needs certain information about the program that can easily be extracted. Perhaps this could be done with an analysing tool. This tool would only need additional hints if things are very dynamic, like variables holding certain constructs which are known at runtime only. -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Guido van Rossum

6:29 p.m.

Maybe it would help if I was more straightforward. I do not want to solve this problem by introducing yet another language mechanism. This rules out codef, greenlets, and Stackless. I want to solve it using what we have in Python 3.3. And I want to solve it for all platforms where Python 3.3 runs and for all Python 3.3 implementations (assuming Jython, IronPython and PyPy will eventually get there). Basically this leaves as options OS threads, callbacks (+ Deferred), or yield [from]. Using OS threads the problem is solved without writing any code, but does not scale, so it does not really solve the problem. To scale we need either callbacks or yield [from], or both. I accept that some people prefer to use callbacks and Deferred. I want to respect this choice and I want to see integration with this style at the event loop level. But for myself, I know that I want to write *most* code without callbacks (or Deferreds), and I am quite comfortable to use yield or yield from instead. (I have written a lot of code using yield <future>, and I am now practicing yield from <generator> -- the transition is quite easy and I like what I see.) If you are not happy with what we can do in (portable) Python 3.3, we are not talking about solving the same problem. If you are happy using OS threads, we are also not talking about solving the same problem. (To be sure, there is a place for them in my solution -- but it is only needed for things we cannot run asynchronously, such as socket.getaddrinfo().) If you are not happy using callbacks/Deferred nor using yield[from], you're welcome to use greenlets or Stackless. But they will not be in the standard library. -- --Guido van Rossum (python.org/~guido)

Jasper St. Pierre

7:25 p.m.

I'm curious now... you keep mentioning Futures and Deferreds like they're two separate entities. What distinction between the two do you see? On Sat, Oct 20, 2012 at 2:29 PM, Guido van Rossum <guido@python.org> wrote:

...

Maybe it would help if I was more straightforward.

I do not want to solve this problem by introducing yet another language mechanism. This rules out codef, greenlets, and Stackless.

I want to solve it using what we have in Python 3.3. And I want to solve it for all platforms where Python 3.3 runs and for all Python 3.3 implementations (assuming Jython, IronPython and PyPy will eventually get there).

Basically this leaves as options OS threads, callbacks (+ Deferred), or yield [from].

Using OS threads the problem is solved without writing any code, but does not scale, so it does not really solve the problem.

To scale we need either callbacks or yield [from], or both.

I accept that some people prefer to use callbacks and Deferred. I want to respect this choice and I want to see integration with this style at the event loop level.

But for myself, I know that I want to write *most* code without callbacks (or Deferreds), and I am quite comfortable to use yield or yield from instead. (I have written a lot of code using yield <future>, and I am now practicing yield from <generator> -- the transition is quite easy and I like what I see.)

If you are not happy with what we can do in (portable) Python 3.3, we are not talking about solving the same problem.

If you are happy using OS threads, we are also not talking about solving the same problem. (To be sure, there is a place for them in my solution -- but it is only needed for things we cannot run asynchronously, such as socket.getaddrinfo().)

If you are not happy using callbacks/Deferred nor using yield[from], you're welcome to use greenlets or Stackless. But they will not be in the standard library.

-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

-- Jasper

Guido van Rossum

10:38 p.m.

On Sat, Oct 20, 2012 at 12:25 PM, Jasper St. Pierre <jstpierre@mecheye.net> wrote:

...

I'm curious now... you keep mentioning Futures and Deferreds like they're two separate entities. What distinction between the two do you see?

They have different interfaces and you end up using them differently. In particular, quoting myself from another thread, here is how I use the terms: - Future: something with roughly the interface but not necessarily the implementation of PEP 3148. - Deferred: the Twisted Deferred class or something with very similar functionality (there are some in the JavaScript world). The big difference between Futures and Deferreds is that Deferreds can easily be chains together to create multiple stages, and each callback is called with the value returned from the previous stage; also, Deferreds have separate callback chains for regular values and errors. -- --Guido van Rossum (python.org/~guido)

Jasper St. Pierre

22 Oct 22 Oct

4:46 p.m.

On Sat, Oct 20, 2012 at 6:38 PM, Guido van Rossum <guido@python.org> wrote:

...

On Sat, Oct 20, 2012 at 12:25 PM, Jasper St. Pierre <jstpierre@mecheye.net> wrote:

...
I'm curious now... you keep mentioning Futures and Deferreds like they're two separate entities. What distinction between the two do you see?

They have different interfaces and you end up using them differently.

Who is "you" supposed to refer to?

...

In particular, quoting myself from another thread, here is how I use the terms:

- Future: something with roughly the interface but not necessarily the implementation of PEP 3148.

- Deferred: the Twisted Deferred class or something with very similar functionality (there are some in the JavaScript world).

The big difference between Futures and Deferreds is that Deferreds can easily be chains together to create multiple stages, and each callback is called with the value returned from the previous stage; also, Deferreds have separate callback chains for regular values and errors.

Chaining is an add-on to the system and not necessarily required. Dojo's Deferreds, modelled directly after Twisted's, don't have direct chaining with multiple callbacks per Deferred, but instead addCallback returns a new Deferred, which it may pass on to. This means that each Deferred has one result, and chaining is done slightly differently. The whole point of chaining is just convenience of mutating a value before it's passed to the caller. It's possible to live without it. Compare: from async_http_client import fetch_page from some_xml_library import parse_xml def fetch_xml(url): d = fetch_page(url) d.add_callback(parse_xml) return d with: def fetch_xml(url): def parse_page(result): d.callback(parse_xml(result)) d = Deferred() page = fetch_page(url) page.add_callback(parse_page) return d The two functions, treated as a black box, are equivalent. The distinction is convenience.

...

-- --Guido van Rossum (python.org/~guido)

-- Jasper

Guido van Rossum

5:03 p.m.

On Mon, Oct 22, 2012 at 9:46 AM, Jasper St. Pierre <jstpierre@mecheye.net> wrote:

...

On Sat, Oct 20, 2012 at 6:38 PM, Guido van Rossum <guido@python.org> wrote:

...
On Sat, Oct 20, 2012 at 12:25 PM, Jasper St. Pierre <jstpierre@mecheye.net> wrote:

...
I'm curious now... you keep mentioning Futures and Deferreds like they're two separate entities. What distinction between the two do you see?

They have different interfaces and you end up using them differently.

Who is "you" supposed to refer to?

...
In particular, quoting myself from another thread, here is how I use the terms:

- Future: something with roughly the interface but not necessarily the implementation of PEP 3148.

- Deferred: the Twisted Deferred class or something with very similar functionality (there are some in the JavaScript world).

The big difference between Futures and Deferreds is that Deferreds can easily be chains together to create multiple stages, and each callback is called with the value returned from the previous stage; also, Deferreds have separate callback chains for regular values and errors.

Chaining is an add-on to the system and not necessarily required. Dojo's Deferreds, modelled directly after Twisted's, don't have direct chaining with multiple callbacks per Deferred, but instead addCallback returns a new Deferred, which it may pass on to. This means that each Deferred has one result, and chaining is done slightly differently.

The whole point of chaining is just convenience of mutating a value before it's passed to the caller. It's possible to live without it. Compare:

from async_http_client import fetch_page from some_xml_library import parse_xml

def fetch_xml(url): d = fetch_page(url) d.add_callback(parse_xml) return d

with:

def fetch_xml(url): def parse_page(result): d.callback(parse_xml(result))

d = Deferred() page = fetch_page(url) page.add_callback(parse_page) return d

The two functions, treated as a black box, are equivalent. The distinction is convenience.

Jasper, I don't know you. You may be a wizard-levelTwisted user, or maybe you once saw a Twisted tutorial. All I know is that when I started this discussion I used the term Future thinking Deferreds were just Futures, and then Twisted core developers started explaining me that Deferreds are so much more than Futures (I think it may have been Glyph himself, in one of his longer posts). So please go argue the distinction or similarity with the Twisted core developers, not with me. -- --Guido van Rossum (python.org/~guido)

Nick Coghlan

21 Oct 21 Oct

2:18 a.m.

On Sun, Oct 21, 2012 at 2:30 AM, Guido van Rossum <guido@python.org> wrote:

...

On Fri, Oct 19, 2012 at 10:37 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...
Nick Coghlan wrote:

...
Please don't lose sight of the fact that yield-based suspension points looking like something other than an ordinary function call is a *feature*, not a bug.

(Ironically, Christian just revived an old thread where Nick was of a different opinion.)

I like greenlets too, just for the ease of converting the scaling constraints of existing concurrent code from number-of-threads-per-process to number-of-open-sockets-per-process. I've come to the conclusion that they're no substitute for explicitly asynchronous code, though, and the assembler magic needed to make them work with arbitrary C code (either in the language core or in C extensions) makes them a poor fit for the standard library. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Christian Tismer

20 Oct 20 Oct

11:52 a.m.

Hi Greg, On 20.10.12 02:50, Greg Ewing wrote:

...

Christian Tismer wrote:

...
Actually I would like to have a python context where it gets into "async mode" and interprets all functions defined in that mode as generators.

That sounds somewhat similar to another idea I proposed a while ago:

There would be a special kind of function called a "cofunction", that you define using "codef" instead of "def". A cofunction is essentially a generator, but with a special property: when one cofunction calls another, the call is implicitly made as a "yield from" call.

Whow, I had a look at the patch. Without talking about the syntax, this is very close to what I'm trying without a patch. No, it is almost identical.

...

This scheme wouldn't be completely transparent, since the cofunctions have to be defined in a special way. But the calls would look like ordinary calls.

There's a PEP describing a variation on the idea here:

http://www.python.org/dev/peps/pep-3152/

In that version, calls to cofunctions are specially marked using a "cocall" keyword. But since writing that, I've come to believe that my original idea (where the cocalls are implicit) was better.

Yes, without the keyword it looks better. Would you raise an exception if something is called that is not a cofunction? Or would that be an ordinary call? The only difference is that I'm not aiming at coroutines in the first place, but just having the concept of a *suspendable* function. What has happened to the PEP, was it rejected? ciao - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Nick Coghlan

12:16 p.m.

On Sat, Oct 20, 2012 at 9:52 PM, Christian Tismer <tismer@stackless.com> wrote:

...

What has happened to the PEP, was it rejected?

No, it's still open. We just wanted to give the yield from PEP a chance to see some use on its own before we started trying to take it any further, and Greg was amenable to that approach. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Christian Tismer

8:06 p.m.

New subject: yield from multiple iterables (was Re: The async API of the future: yield-from)

Thanks! Then I have another short one... Sent from my Ei4Steve On Oct 20, 2012, at 14:16, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On Sat, Oct 20, 2012 at 9:52 PM, Christian Tismer <tismer@stackless.com> wrote:

...
What has happened to the PEP, was it rejected?

No, it's still open. We just wanted to give the yield from PEP a chance to see some use on its own before we started trying to take it any further, and Greg was amenable to that approach.

I often see the phrase "coroutine" but without explicit mention if symmetric (greenlet) or asymmetric (Lua). Then when I hear myself quibbling about "full generators" then I mean generators that are made up of a few functions that all can yield to the caller of this generator. Question: is "full generators" equivalent to "asymmetric coroutine"? Question 2: when people talk about coroutines, is "asymmetric" the default here? The reason that I ask is my fear to create problems when answering to messages with different default meanings in mind. Thanks - chris

Guido van Rossum

10:39 p.m.

On Sat, Oct 20, 2012 at 1:06 PM, Christian Tismer <tismer@stackless.com> wrote:

...

I often see the phrase "coroutine" but without explicit mention if symmetric (greenlet) or asymmetric (Lua).

Then when I hear myself quibbling about "full generators" then I mean generators that are made up of a few functions that all can yield to the caller of this generator.

Question: is "full generators" equivalent to "asymmetric coroutine"?

Question 2: when people talk about coroutines, is "asymmetric" the default here?

The reason that I ask is my fear to create problems when answering to messages with different default meanings in mind.

I believe I just answered this is another thread (that you also started). -- --Guido van Rossum (python.org/~guido)

Christian Tismer

12:54 p.m.

On 20.10.12 13:52, Christian Tismer wrote:

...

Hi Greg,

On 20.10.12 02:50, Greg Ewing wrote:

...
Christian Tismer wrote:

...
Actually I would like to have a python context where it gets into "async mode" and interprets all functions defined in that mode as generators.

That sounds somewhat similar to another idea I proposed a while ago:

There would be a special kind of function called a "cofunction", that you define using "codef" instead of "def". A cofunction is essentially a generator, but with a special property: when one cofunction calls another, the call is implicitly made as a "yield from" call.

Whow, I had a look at the patch. Without talking about the syntax, this is very close to what I'm trying without a patch. No, it is almost identical.

...
This scheme wouldn't be completely transparent, since the cofunctions have to be defined in a special way. But the calls would look like ordinary calls.

There's a PEP describing a variation on the idea here:

http://www.python.org/dev/peps/pep-3152/

In that version, calls to cofunctions are specially marked using a "cocall" keyword. But since writing that, I've come to believe that my original idea (where the cocalls are implicit) was better.

Yes, without the keyword it looks better. Would you raise an exception if something is called that is not a cofunction? Or would that be an ordinary call?

The only difference is that I'm not aiming at coroutines in the first place, but just having the concept of a *suspendable* function.

What has happened to the PEP, was it rejected?

I just saw that it is in flux and did not please you as well. A rough idea would be to start the whole interpreter in suspendable mode. Maybe that's too much. I'm seeking a way to tell a whole bunch of functions that they should be suspendable. What if we had a flag (unclear how) that function calls should behave like cofunctions now. That flag would be initiated by a root call and then propagated to the callees, without any syntax change? In any case it should be possible to inquire/assert that a function is running as cofunction. -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Greg Ewing

21 Oct 21 Oct

12:09 a.m.

Christian Tismer wrote:

...

A rough idea would be to start the whole interpreter in suspendable mode. Maybe that's too much. I'm seeking a way to tell a whole bunch of functions that they should be suspendable.

I'm not sure it's really feasible to do that. It seems easy enough at first sight, but keep in mind that it would only work for pure Python code called directly from other pure Python code. There are many corners where it wouldn't work -- all the operator methods, for example, and anything else called through a type slot -- unless you went to a *lot* of work to provide alternative suspendable versions of all the type slots. -- Greg

Greg Ewing

20 Oct 20 Oct

11:52 p.m.

Christian Tismer wrote:

...

Would you raise an exception if something is called that is not a cofunction? Or would that be an ordinary call?

A cofunction calling a non-cofunction is fine, it just makes an ordinary call. But if a non-cofunction tries to call a cofunction using an ordinary call, an exception raised. Effectively, cofunctions do *not* implement __call__ (instead they implement a new protocol __cocall__).

...

The only difference is that I'm not aiming at coroutines in the first place, but just having the concept of a *suspendable* function.

I'm not sure what the distinction is.

...

What has happened to the PEP, was it rejected?

No, its status is still listed as "draft". It's probably too soon to consider whether it should be accepted or rejected; we need more experience with yield-from based task systems first. -- Greg

Christian Tismer

21 Oct 21 Oct

4:40 a.m.

On 21.10.12 01:52, Greg Ewing wrote:

...

Christian Tismer wrote: ...

...
The only difference is that I'm not aiming at coroutines in the first place, but just having the concept of a *suspendable* function.

I'm not sure what the distinction is.

This comes maybe from my use of 'coroutine', 'tasklet', 'generator' etc. which differs from the meaning where others are thinking of. I'm mostly talking in the PyPy and Stackless community, which creates confusion. In that world, 'generator' for instance means a whole bunch of functions that can play together and yield to the caller of _the_ generator. The same holds for coroutines in that world. In python-world, things seem to be more often made of single functions. Switching context to that: 'coroutine' implies to think about coroutines, the intended action. 'suspendable' instead is neutral without any a-priori intent to switch or something. It just tells the ability that it can be suspended. That sounds more like a property. The 'suspendable' is meant as a building block for higher-level things, which include for instance coroutines (in any flavor). Technically the same, when we're talking about one single function that implements it. -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Greg Ewing

19 Oct 19 Oct

11:50 p.m.

Nick Coghlan wrote:

...

my suspicion is that generator objects themselves need to be maintaining a full "generator stack" independent of the frame stack in the main eval loop in order to get the best of both worlds (i.e. optimised suspend/resume with confusing debuggers).

The f_yieldfrom chain effectively *is* a generator stack, it's just linked in the opposite direction to the way stacks normally are. While you probably could move f_yieldfrom out of the frame object and into the generator-iterator object, I don't see how it would make any difference to the traceback issue. I'm not even sure why my original implementation was getting tracebacks wrong. What *should* happen is that if an exception comes out of a generator being yielded from, the tail is chopped off the f_yieldfrom chain and the exception is thrown into the next frame up, thereby adding its frame to the traceback. It may simply be that there was a minor bug in my implementation that could be fixed without ditching the whole f_yieldfrom idea. I may look into this if I find time. -- Greg

Mark Shannon

9:15 p.m.

On 19/10/12 13:55, Christian Tismer wrote:

...

Hi Nick,

On 16.10.12 03:49, Nick Coghlan wrote:

...
On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...
My original implementation of yield-from actually *did* avoid this, by keeping a C-level pointer chain of yielding-from frames. But that part was ripped out at the last minute when someone discovered that it had a detrimental effect on tracebacks.

There are probably other ways the traceback problem could be fixed, so maybe we will get this optimisation back one day. Ah, I thought I remembered something along those lines. IIRC, it was a bug report on one of the alphas that prompted us to change it.

I was curious and searched quite a lot. It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220 from Marc Shannon, patched by Benjamin.

Now I found the original implementation. That looks very much as I'm thinking it should be.

Quite a dramatic change which works well, but really seems to remove what I would call "now I can emulate most of Stackless" efficiently.

Maybe I should just try to think it would be implemented as before, build an abstraction and just use it for now.

I will spend my time at PyCon de for sprinting on "yield from".

The current implementation may not be much slower than Greg's original version. One of the main costs of making a call is the creation of a new frame. But calling into a generator does not need a new frame, so the cost will be reduced. Unless anyone has evidence to the contrary :) Rather than increasing the performance of this special case, I would suggest that improving the performance of calls & returns in general would be a more worthwhile goal. Calls and returns ought to be cheap. Cheers, Mark

Guido van Rossum

9:31 p.m.

On Fri, Oct 19, 2012 at 2:15 PM, Mark Shannon <mark@hotpy.org> wrote:

...

On 19/10/12 13:55, Christian Tismer wrote:

...
Hi Nick,

On 16.10.12 03:49, Nick Coghlan wrote:

...
On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...
My original implementation of yield-from actually *did* avoid this, by keeping a C-level pointer chain of yielding-from frames. But that part was ripped out at the last minute when someone discovered that it had a detrimental effect on tracebacks.

There are probably other ways the traceback problem could be fixed, so maybe we will get this optimisation back one day.

Ah, I thought I remembered something along those lines. IIRC, it was a bug report on one of the alphas that prompted us to change it.

I was curious and searched quite a lot. It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220 from Marc Shannon, patched by Benjamin.

Now I found the original implementation. That looks very much as I'm thinking it should be.

Quite a dramatic change which works well, but really seems to remove what I would call "now I can emulate most of Stackless" efficiently.

Maybe I should just try to think it would be implemented as before, build an abstraction and just use it for now.

I will spend my time at PyCon de for sprinting on "yield from".

The current implementation may not be much slower than Greg's original version. One of the main costs of making a call is the creation of a new frame. But calling into a generator does not need a new frame, so the cost will be reduced. Unless anyone has evidence to the contrary :)

Rather than increasing the performance of this special case, I would suggest that improving the performance of calls & returns in general would be a more worthwhile goal. Calls and returns ought to be cheap.

I did a basic timing test using a simple recursive function and a recursive PEP-380 coroutine computing the same value (see attachment). The coroutine version is a little over twice as slow as the function version. I find that acceptable. This went 20 deep, making 2 recursive calls at each level (except at the deepest level). Output on my MacBook Pro: plain 2097151 0.5880069732666016 coro. 2097151 1.2958409786224365 This was a Python 3.3 built a few days ago from the 3.3 branch. -- --Guido van Rossum (python.org/~guido)

Christian Tismer

10:31 p.m.

New subject: yield from multiple iterables (was Re: The async API of the future: yield-from)

Hi Guido, Marc, all, this is a veery promising result, telling me that the big Oh can in fact be neglected in real applications. 20 for twi is good! I will of course do an analysis and find the parameters of the quadratic, but my concern is pretty much tamed. For me that means there will soon be a library that contains real generators and more building blocks. I think using those would simplify the design of the async API quite a lot. I suggest to regard current generator constructs as low-level helpers for Implementing the real concurrency building blocks. Instead of using the existing re.compile("yield (from)?") pattern, I think we can abstract from this now and think in terms of higher level constructs. Let's assume generators and coroutines, and model concurrency from that. I believe this unwinds the brains and clarifies things a lot. I will provide sone classes for that at the pycon.de sprint, unless somebody implements it earlier (please don't). This email was written in non-linear order, so please ignore logic inversions. Cheers - chris Sent from my Ei4Steve On Oct 19, 2012, at 23:31, Guido van Rossum <guido@python.org> wrote:

...

On Fri, Oct 19, 2012 at 2:15 PM, Mark Shannon <mark@hotpy.org> wrote:

...
On 19/10/12 13:55, Christian Tismer wrote:

...
Hi Nick,

On 16.10.12 03:49, Nick Coghlan wrote:

...
On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...
My original implementation of yield-from actually *did* avoid this, by keeping a C-level pointer chain of yielding-from frames. But that part was ripped out at the last minute when someone discovered that it had a detrimental effect on tracebacks.

There are probably other ways the traceback problem could be fixed, so maybe we will get this optimisation back one day.

Ah, I thought I remembered something along those lines. IIRC, it was a bug report on one of the alphas that prompted us to change it.

I was curious and searched quite a lot. It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220 from Marc Shannon, patched by Benjamin.

Now I found the original implementation. That looks very much as I'm thinking it should be.

Quite a dramatic change which works well, but really seems to remove what I would call "now I can emulate most of Stackless" efficiently.

Maybe I should just try to think it would be implemented as before, build an abstraction and just use it for now.

I will spend my time at PyCon de for sprinting on "yield from".

The current implementation may not be much slower than Greg's original version. One of the main costs of making a call is the creation of a new frame. But calling into a generator does not need a new frame, so the cost will be reduced. Unless anyone has evidence to the contrary :)

Rather than increasing the performance of this special case, I would suggest that improving the performance of calls & returns in general would be a more worthwhile goal. Calls and returns ought to be cheap.

I did a basic timing test using a simple recursive function and a recursive PEP-380 coroutine computing the same value (see attachment). The coroutine version is a little over twice as slow as the function version. I find that acceptable. This went 20 deep, making 2 recursive calls at each level (except at the deepest level).

Output on my MacBook Pro:

plain 2097151 0.5880069732666016 coro. 2097151 1.2958409786224365

This was a Python 3.3 built a few days ago from the 3.3 branch.

-- --Guido van Rossum (python.org/~guido) <p3time.py> _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Christian Tismer

10:45 p.m.

New subject: yield from multiple iterables (was Re: The async API of the future: yield-from)

s/twi/two/ Sent from my Ei4Steve On Oct 20, 2012, at 0:31, Christian Tismer <tismer@stackless.com> wrote:

...

Hi Guido, Marc, all,

this is a veery promising result, telling me that the big Oh can in fact be neglected in real applications. 20 for twi is good!

I will of course do an analysis and find the parameters of the quadratic, but my concern is pretty much tamed.

For me that means there will soon be a library that contains real generators and more building blocks.

I think using those would simplify the design of the async API quite a lot.

I suggest to regard current generator constructs as low-level helpers for Implementing the real concurrency building blocks.

Instead of using the existing re.compile("yield (from)?") pattern, I think we can abstract from this now and think in terms of higher level constructs.

Let's assume generators and coroutines, and model concurrency from that. I believe this unwinds the brains and clarifies things a lot.

I will provide sone classes for that at the pycon.de sprint, unless somebody implements it earlier (please don't).

This email was written in non-linear order, so please ignore logic inversions.

Cheers - chris

Sent from my Ei4Steve

On Oct 19, 2012, at 23:31, Guido van Rossum <guido@python.org> wrote:

...
On Fri, Oct 19, 2012 at 2:15 PM, Mark Shannon <mark@hotpy.org> wrote:

...
On 19/10/12 13:55, Christian Tismer wrote:

...
Hi Nick,

On 16.10.12 03:49, Nick Coghlan wrote:

...
On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...
My original implementation of yield-from actually *did* avoid this, by keeping a C-level pointer chain of yielding-from frames. But that part was ripped out at the last minute when someone discovered that it had a detrimental effect on tracebacks.

There are probably other ways the traceback problem could be fixed, so maybe we will get this optimisation back one day.

Ah, I thought I remembered something along those lines. IIRC, it was a bug report on one of the alphas that prompted us to change it.

I was curious and searched quite a lot. It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220 from Marc Shannon, patched by Benjamin.

Now I found the original implementation. That looks very much as I'm thinking it should be.

Quite a dramatic change which works well, but really seems to remove what I would call "now I can emulate most of Stackless" efficiently.

Maybe I should just try to think it would be implemented as before, build an abstraction and just use it for now.

I will spend my time at PyCon de for sprinting on "yield from".

The current implementation may not be much slower than Greg's original version. One of the main costs of making a call is the creation of a new frame. But calling into a generator does not need a new frame, so the cost will be reduced. Unless anyone has evidence to the contrary :)

Rather than increasing the performance of this special case, I would suggest that improving the performance of calls & returns in general would be a more worthwhile goal. Calls and returns ought to be cheap.

I did a basic timing test using a simple recursive function and a recursive PEP-380 coroutine computing the same value (see attachment). The coroutine version is a little over twice as slow as the function version. I find that acceptable. This went 20 deep, making 2 recursive calls at each level (except at the deepest level).

Output on my MacBook Pro:

plain 2097151 0.5880069732666016 coro. 2097151 1.2958409786224365

This was a Python 3.3 built a few days ago from the 3.3 branch.

-- --Guido van Rossum (python.org/~guido) <p3time.py> _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Christian Tismer

20 Oct 20 Oct

1:17 a.m.

Errhm...., On 20.10.12 00:31, Christian Tismer wrote:

...

Hi Guido, Marc, all,

this is a veery promising result, telling me that the big Oh can in fact be neglected in real applications. 20 for twi is good!

I will of course do an analysis and find the parameters of the quadratic, but my concern is pretty much tamed.

For me that means there will soon be a library that contains real generators and more building blocks.

I think using those would simplify the design of the async API quite a lot.

I suggest to regard current generator constructs as low-level helpers for Implementing the real concurrency building blocks.

Instead of using the existing re.compile("yield (from)?") pattern, I think we can abstract from this now and think in terms of higher level constructs.

Let's assume generators and coroutines, and model concurrency from that. I believe this unwinds the brains and clarifies things a lot.

I will provide sone classes for that at the pycon.de sprint, unless somebody implements it earlier (please don't).

This email was written in non-linear order, so please ignore logic inversions.

Cheers - chris

Sent from my Ei4Steve

On Oct 19, 2012, at 23:31, Guido van Rossum <guido@python.org> wrote:

...
On Fri, Oct 19, 2012 at 2:15 PM, Mark Shannon <mark@hotpy.org> wrote:

...
...
Hi Nick,

On 16.10.12 03:49, Nick Coghlan wrote:

...
On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...
My original implementation of yield-from actually *did* avoid this, by keeping a C-level pointer chain of yielding-from frames. But that part was ripped out at the last minute when someone discovered that it had a detrimental effect on tracebacks.

There are probably other ways the traceback problem could be fixed, so maybe we will get this optimisation back one day. Ah, I thought I remembered something along those lines. IIRC, it was a bug report on one of the alphas that prompted us to change it. I was curious and searched quite a lot. It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220 from Marc Shannon, patched by Benjamin.

Now I found the original implementation. That looks very much as I'm thinking it should be.

Quite a dramatic change which works well, but really seems to remove what I would call "now I can emulate most of Stackless" efficiently.

Maybe I should just try to think it would be implemented as before, build an abstraction and just use it for now.

I will spend my time at PyCon de for sprinting on "yield from". The current implementation may not be much slower than Greg's original version. One of the main costs of making a call is the creation of a new

On 19/10/12 13:55, Christian Tismer wrote: frame. But calling into a generator does not need a new frame, so the cost will be reduced. Unless anyone has evidence to the contrary :)

Rather than increasing the performance of this special case, I would suggest that improving the performance of calls & returns in general would be a more worthwhile goal. Calls and returns ought to be cheap. I did a basic timing test using a simple recursive function and a recursive PEP-380 coroutine computing the same value (see attachment). The coroutine version is a little over twice as slow as the function version. I find that acceptable. This went 20 deep, making 2 recursive calls at each level (except at the deepest level).

Output on my MacBook Pro:

plain 2097151 0.5880069732666016 coro. 2097151 1.2958409786224365

This was a Python 3.3 built a few days ago from the 3.3 branch.

What you are comparing seems to have a constant factor of about 2.5. minimax:py3 tismer$ python3 p3time.py plain 0 1 0.00000 coro. 0 1 0.00001 relat 0 1 8.50000 plain 1 3 0.00000 coro. 1 3 0.00001 relat 1 3 2.77778 plain 2 7 0.00000 coro. 2 7 0.00001 relat 2 7 3.62500 plain 3 15 0.00000 coro. 3 15 0.00001 relat 3 15 2.87500 plain 4 31 0.00001 coro. 4 31 0.00002 relat 4 31 2.42424 plain 5 63 0.00002 coro. 5 63 0.00004 relat 5 63 2.46032 plain 6 127 0.00003 coro. 6 127 0.00007 relat 6 127 2.52542 plain 7 255 0.00006 coro. 7 255 0.00014 relat 7 255 2.38272 plain 8 511 0.00011 coro. 8 511 0.00028 relat 8 511 2.49356 plain 9 1023 0.00022 coro. 9 1023 0.00055 relat 9 1023 2.50327 plain 10 2047 0.00042 coro. 10 2047 0.00106 relat 10 2047 2.50956 plain 11 4095 0.00083 coro. 11 4095 0.00204 relat 11 4095 2.44699 plain 12 8191 0.00167 coro. 12 8191 0.00441 relat 12 8191 2.64792 plain 13 16383 0.00340 coro. 13 16383 0.00855 relat 13 16383 2.51881 plain 14 32767 0.00876 coro. 14 32767 0.01823 relat 14 32767 2.08106 plain 15 65535 0.01419 coro. 15 65535 0.03507 relat 15 65535 2.47131 plain 16 131071 0.02669 coro. 16 131071 0.06874 relat 16 131071 2.57515 plain 17 262143 0.05448 coro. 17 262143 0.13699 relat 17 262143 2.51467 plain 18 524287 0.10843 coro. 18 524287 0.27395 relat 18 524287 2.52660 plain 19 1048575 0.21310 coro. 19 1048575 0.54573 relat 19 1048575 2.56095 plain 20 2097151 0.42802 coro. 20 2097151 1.06199 relat 20 2097151 2.48114 plain 21 4194303 0.86531 coro. 21 4194303 2.19048 relat 21 4194303 2.53143 ciao - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Terry Reedy

1:55 a.m.

On 10/19/2012 5:31 PM, Guido van Rossum wrote:

...

I did a basic timing test using a simple recursive function and a recursive PEP-380 coroutine computing the same value (see attachment). The coroutine version is a little over twice as slow as the function version. I find that acceptable. This went 20 deep, making 2 recursive calls at each level (except at the deepest level).

Output on my MacBook Pro:

plain 2097151 0.5880069732666016 coro. 2097151 1.2958409786224365

This was a Python 3.3 built a few days ago from the 3.3 branch.

At the top level, the coroutine version adds 2097151 next() calls. Suspecting that that, not the addition of 'yield from' was responsible for most of the extra time, I added def trivial(): for i in range(2097151): yield raise StopIteration(2097151) ... t0 = time.time() try: g = trivial() while True: next(g) except StopIteration as err: k = err.value t1 = time.time() print('triv.', k, t1-t0) The result supports the hypothesis. plain 2097151 0.4590260982513428 coro. 2097151 0.9180529117584229 triv. 2097151 0.39902305603027344 I don't know what to make of this in the context of asynch operations, but in 'traditional' use, the generator would not replace a function returning a single number but one returning a list (of, in this case, 2097151 numbers), so each next replaces a .append method call. -- Terry Jan Reedy

Jim Jewett

23 Oct 23 Oct

7:17 a.m.

On 10/19/12, Guido van Rossum <guido@python.org> wrote:

...

I did a basic timing test using a simple recursive function and a recursive PEP-380 coroutine computing the same value (see attachment). The coroutine version is a little over twice as slow as the function version. I find that acceptable. This went 20 deep, making 2 recursive calls at each level (except at the deepest level).

Note that the co-routine code (copied below) does not involve a scheduler that unwraps futures; there is no scheduler, and nothing runs concurrently. def coroutine(n): if n <= 0: return 1 l = yield from coroutine(n-1) r = yield from coroutine(n-1) return l + 1 + r I like the above code; my concern was that yield might get co-opted for use with scheduler loops, which would have to track the parent task explicitly, and prevent it from being rescheduled too early. -jJ

Guido van Rossum

2:44 p.m.

On Tue, Oct 23, 2012 at 12:17 AM, Jim Jewett <jimjjewett@gmail.com> wrote:

...

On 10/19/12, Guido van Rossum <guido@python.org> wrote:

...
I did a basic timing test using a simple recursive function and a recursive PEP-380 coroutine computing the same value (see attachment). The coroutine version is a little over twice as slow as the function version. I find that acceptable. This went 20 deep, making 2 recursive calls at each level (except at the deepest level).

Note that the co-routine code (copied below) does not involve a scheduler that unwraps futures; there is no scheduler, and nothing runs concurrently.

def coroutine(n): if n <= 0: return 1 l = yield from coroutine(n-1) r = yield from coroutine(n-1) return l + 1 + r

I like the above code; my concern was that yield might get co-opted for use with scheduler loops, which would have to track the parent task explicitly, and prevent it from being rescheduled too early.

Don't worry. There is no way that a scheduler can change the meaning of yield from. All its power stems from its ability to decide when to call next(), and that is the same power that the app has itself. -- --Guido van Rossum (python.org/~guido)

Christian Tismer

19 Oct 19 Oct

1:12 a.m.

Hi Greg, coming back to this after quite a storm in my brain... On 16.10.12 02:44, Greg Ewing wrote:

...

Christian Tismer wrote:

...
Right, CPython still keeps unneccessary crap on the C stack.

It's not just Python leaving stuff on the stack that's a problem, it's external C code that calls back into Python.

That one is something that I think to ignore. Of course there are quite some situations where callbacks into Python are a problem, but I don't want to put this into Python. There are ways to cope with this, for instance using greenlet as an optional extension module that handles these cases. Alternatively, Python itself can do it with strictly controlled threads. But I think leaving that out will simplify matters a lot and keeps Python clean. In the end, I want to model something that is likely to be accepted.

...

...
But that's not the point right now, because on the other hand, in the context of a possible yield (from or not), the C stack is clean, and this enables switching.

...
And actually in such clean positions, Stackless Python (as opposed to Greenlets) does soft-switching, which is very similar to what the generators are doing - there is no assembly stuff involved at all.

But the assembly code still needs to be there to handle the cases where you *can't* do soft switching. It's the presence of the code that's the issue here, not how frequently it gets called.

No, I'm intending to really rip that out. Or better, I want to do a rewrite of a subset of Stackless, actually the functionality that allows to implement greenlets or multi-level generators, task scheduling and so on. In effect, I want to find something that enables some extended switching. Emulated, without hacking the kernel in the first place. Generators are restricted to call/yield/return positions, and I thing that's fine. What can be switched is totally clear by definitions, and I like that. I'm talking of exactly that. What I dislike is a different topic ;-)

...

...
I have begun studying the code for YIELD_FROM. As it is written, every next iteration elevates the chain of generators once up and down. Maybe that can be avoided by changing the frame chain, so this can become a cheaper O(1) operation.

My original implementation of yield-from actually *did* avoid this, by keeping a C-level pointer chain of yielding-from frames. But that part was ripped out at the last minute when someone discovered that it had a detrimental effect on tracebacks.

There are probably other ways the traceback problem could be fixed, so maybe we will get this optimisation back one day.

Ok, let's ignore this O(n) problem for now. _yield from_ is anyway probably faster by more than an order of magnitude, so it will serve your purpose (nesting generators) pretty well. My problem is different because I want a scaling building block for building higher level structures, and I would love to build them using _yield from_ . There are a few things which contradict completely my thinking: - _yield from_ works from the top. That is, if I have five nested iterators and I want to drive them, then I have to call the root generator?! I see that that works, but it is against all what I'm used to. How can I inject the info that I want to switch context? - generators always yield to the caller, and also return values to the caller. What I'm looking for is to express a switch so something else? - generators are able to free the stack, when they yield. But when they are active, they use the full stack. At least when I follow the pattern "generator is calling sub-generator". A deeply nested recursion is therefore something to avoid. :-( Now I'm playing around with different approaches to model something flexible that gives me more freedom. Right now I'm trying a slightly pervert approach to give me an _unwindable_, merely a frame-like object that can vanish on demand. I'm also experimenting with emulating different kinds of _yield_". Since there is only one kind of yield to one target, I get the problem to distinguish that for different purposes. Example: I can take a set of nested functions in their native form. Then replacing ordinary calls by _yield from_ and inserting proper yields before actually returning, I now have the equivalent of a nested function call, that I can drive with another _yield from_ . This is now a function that permanently releases the stack. Now I would like to give one of the nested function the ability to transfer execution somewhere else. The support is insofar there, as the stack is freed all the time. But this function that wants to switch needs to pass the fact that it wants to switch, plus the target somewhere. As I understood it, I would need to yield that to the driver function. In order to do that, I would need to yield a tuple or a bound object. This is a different purpose than the simple driver functionality. Do you see it? In my understanding, a switch would not be driven from the top and then dispatched upon, but a called function below the function to be switched would modify something that leads to a switch as a result. In this example, the generator-emulated function would be driven by thousands of yields, kind of polled to catch the one event that needs to be supported by a switching action. This looks wrong for me, like doing things upside down. To shorten this: I have the problem that I have your very efficient yield collector, but I need to dispatch on what is intended by the yield, instead of initiating a reaction from where I am. All in all, I can't get rid of the thought "un-pythonic". So I'm still thinking of a frame-like object that allows me to control its execution, let it vanish, and so on, and use it as a building block. As always, I'm feeling bad when going this road, because I want to use the eficient _yield from_ as much as possible. But it may be my missing experience. Do you understand, and maybe see where I have the wrong brain shortcuts? How do you write something composable that scales? Cheers -- Chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Greg Ewing

5:15 a.m.

Christian Tismer wrote:

...

- generators are able to free the stack, when they yield. But when they are active, they use the full stack. At least when I follow the pattern "generator is calling sub-generator". A deeply nested recursion is therefore something to avoid. :-(

Only if yield-from chains aren't optimised the way they used to be. In any case, for the application we're talking about here, the difference will probably not be noticeable.

...

But this function that wants to switch needs to pass the fact that it wants to switch, plus the target somewhere. As I understood it, I would need to yield that to the driver function.

You understand incorrectly. In my scheduler, the yields don't send or receive values at all. Communicating with the scheduler, for example to tell it to allow another task to run, is done by calling functions. A yield must be done to actually allow a switch, but the yield itself doesn't send any information.

...

Do you see it? In my understanding, a switch would not be driven from the top and then dispatched upon, but a called function below the function to be switched would modify something that leads to a switch as a result.

That's pretty much what happens in my scheduler.

...

Do you understand, and maybe see where I have the wrong brain shortcuts? How do you write something composable that scales?

I think you should study my scheduler tutorial. If you can understand how that works, I think it will answer many of your questions. http://www.cosc.canterbury.ac.nz/greg.ewing/python/tasks/ -- Greg

Christian Tismer

12:05 p.m.

On 19.10.12 07:15, Greg Ewing wrote:

...

Christian Tismer wrote:

...
- generators are able to free the stack, when they yield. But when they are active, they use the full stack. At least when I follow the pattern "generator is calling sub-generator". A deeply nested recursion is therefore something to avoid. :-(

Only if yield-from chains aren't optimised the way they used to be.

Does that mean a very deep recursion would be efficient? I'm trying to find that change in the hg history right now. Can you give me a hint how your initial implementation works, the initial patch source?

...

...

...
But this function that wants to switch needs to pass the fact that it wants to switch, plus the target somewhere. As I understood it, I would need to yield that to the driver function.

You understand incorrectly. In my scheduler, the yields don't send or receive values at all. Communicating with the scheduler, for example to tell it to allow another task to run, is done by calling functions. A yield must be done to actually allow a switch, but the yield itself doesn't send any information.

I have studied that yesterday already in depth and like that quite much. It is probably just the problem that I had with generators from their beginning. -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Guido van Rossum

4:07 p.m.

On Fri, Oct 19, 2012 at 5:05 AM, Christian Tismer <tismer@stackless.com> wrote:

...

On 19.10.12 07:15, Greg Ewing wrote:

...
Christian Tismer wrote:

...
- generators are able to free the stack, when they yield. But when they are active, they use the full stack. At least when I follow the pattern "generator is calling sub-generator". A deeply nested recursion is therefore something to avoid. :-(

Only if yield-from chains aren't optimised the way they used to be.

Does that mean a very deep recursion would be efficient?

TBH, I am not interested in making very deep recursion work at all. If you need that, you're doing it wrong in my opinion.

...

I'm trying to find that change in the hg history right now.

Can you give me a hint how your initial implementation works, the initial patch source?

...
...

...
But this function that wants to switch needs to pass the fact that it wants to switch, plus the target somewhere. As I understood it, I would need to yield that to the driver function.

You understand incorrectly. In my scheduler, the yields don't send or receive values at all. Communicating with the scheduler, for example to tell it to allow another task to run, is done by calling functions. A yield must be done to actually allow a switch, but the yield itself doesn't send any information.

I have studied that yesterday already in depth and like that quite much. It is probably just the problem that I had with generators from their beginning.

-- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

-- --Guido van Rossum (python.org/~guido)

Christian Tismer

4:50 p.m.

On 19.10.12 18:07, Guido van Rossum wrote:

...

On Fri, Oct 19, 2012 at 5:05 AM, Christian Tismer <tismer@stackless.com> wrote:

...
On 19.10.12 07:15, Greg Ewing wrote:

...
Christian Tismer wrote:

...
- generators are able to free the stack, when they yield. But when they are active, they use the full stack. At least when I follow the pattern "generator is calling sub-generator". A deeply nested recursion is therefore something to avoid. :-(

Only if yield-from chains aren't optimised the way they used to be.

Does that mean a very deep recursion would be efficient? TBH, I am not interested in making very deep recursion work at all. If you need that, you're doing it wrong in my opinion.

Misunderstanding I think. Of course I don't want to use deep recursion. But people might write things that happen several levels deep and then iterating over lots of stuff. A true generator would have no problem with that. Assume just five layers of generators that have to be re-invoked for a tight yielding loop is quite some overhead that can be avoided. The reason why I care is that existing implementations that use greenlet style could be turned into pure python, given that I manage to write the right support functions, and replace all functions by generators that emulate functions with async behavior. It would just be great if that worked at the same speed, independent from at which stack level an iteration happens. Agreed that new code like that would be bad style. ciao - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Guido van Rossum

5:18 p.m.

On Fri, Oct 19, 2012 at 9:50 AM, Christian Tismer <tismer@stackless.com> wrote:

...

On 19.10.12 18:07, Guido van Rossum wrote:

...
On Fri, Oct 19, 2012 at 5:05 AM, Christian Tismer <tismer@stackless.com> wrote:

...
On 19.10.12 07:15, Greg Ewing wrote:

...
Christian Tismer wrote:

...
- generators are able to free the stack, when they yield. But when they are active, they use the full stack. At least when I follow the pattern "generator is calling sub-generator". A deeply nested recursion is therefore something to avoid. :-(

Only if yield-from chains aren't optimised the way they used to be.

Does that mean a very deep recursion would be efficient?

TBH, I am not interested in making very deep recursion work at all. If you need that, you're doing it wrong in my opinion.

Misunderstanding I think. Of course I don't want to use deep recursion. But people might write things that happen several levels deep and then iterating over lots of stuff. A true generator would have no problem with that.

Okay, good. I agree that this use case should be as fast as possible -- as long as we still see every frame involved when a traceback is printed.

...

Assume just five layers of generators that have to be re-invoked for a tight yielding loop is quite some overhead that can be avoided.

The reason why I care is that existing implementations that use greenlet style could be turned into pure python, given that I manage to write the right support functions, and replace all functions by generators that emulate functions with async behavior.

It would just be great if that worked at the same speed, independent from at which stack level an iteration happens.

Yup.

...

Agreed that new code like that would be bad style.

Like "what"? -- --Guido van Rossum (python.org/~guido)

Christian Tismer

5:36 p.m.

On 19.10.12 19:18, Guido van Rossum wrote:

...

On Fri, Oct 19, 2012 at 9:50 AM, Christian Tismer <tismer@stackless.com> wrote:

...
On 19.10.12 18:07, Guido van Rossum wrote:

...
...

...

...
...
TBH, I am not interested in making very deep recursion work at all. If you need that, you're doing it wrong in my opinion. ... Agreed that new code like that would be bad style. Like "what"?

Like code that excercises deep recursion thoughtlessly ;-) in contrast to code that happens to be quite nested because of a systematic transformation. So correctness first, big Oh later. -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Greg Ewing

11:02 p.m.

Christian Tismer wrote:

...

Can you give me a hint how your initial implementation works, the initial patch source?

You can find my initial patches here: http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yield_from.htm... Essentially, an extra field f_yieldfrom is added to frame objects. When a 'yield from' is started, the f_yieldfrom field of the calling frame is set to point to the called frame. The __next__ method of a generator first traverses the f_yieldfrom chain to find the frame at the end, and then resumes that frame. So most of the time, only the innermost frame of a nested yield-from chain is actually entered in response to a next() call. (There are some complications due to the fact that you can 'yield from' something that's not a generator, but the above is effectively what happens when all the objects in the chain are generators.) -- Greg

Greg Ewing

15 Oct 15 Oct

8:19 p.m.

Christian Tismer wrote:

...

Question: Is it already given that something like greenlets is out of consideration?

Greenlets will always be available to those who want and are able to use them. But there's a desire to have something in the standard library that is completely portable and doesn't rely on any platform dependent techniques or tricks. That's what we're talking about here. -- Greg

Greg Ewing

5:10 a.m.

Guido van Rossum wrote:

...

I think it's too early to start proposing new syntax for a problem we don't even know is common at all.

Greg Ewing's proposal works for me:

r = yield from par(f1(), f2())

Also, whoever's proposing this needs to understand that even if the suggested change to yield-from were made, it would NOT automatically result in par() behaviour. It would just be another way of sequentially calling two sub-generators. -- Greg

4447

Age (days ago)

4457

Last active (days ago)

List overview

Download

106 comments

20 participants

participants (20)

Benoit Chesneau
Brett Cannon
Christian Tismer
Dino Viehland
Eric Snow
Eric V. Smith
Greg Ewing
Guido van Rossum
Jasper St. Pierre
Jim Jewett
Laurens Van Houtven
Mark Lawrence
Mark Shannon
Nick Coghlan
Sam Rushing
Steve Dower
Steven D'Aprano
Terry Reedy
Tom Hoover
Vinay Sajip

yield from multiple iterables (was Re: The async API of the future: yield-from)

Mark Lawrence

Sam Rushing

Benoit Chesneau

tags

participants (20)