Add a generic async IO poller/reactor to select module

Including an established async IO framework such as Twisted, gevent or Tornado in the Python stdlib has always been a controversial subject. PEP-3153 (http://www.python.org/dev/peps/pep-3153/) tried to face this problem in the most agnostic way as possible, and it's a good starting point IMO. Nevertheless, it's still vague about what the actual API should look like and AFAIK it remained stagnant so far. There's one thing in the whole async stack which is basically the same for all implementations though: the poller/reactor. Could it make sense to add something similar to select module? Differently from PEP-3153, providing such a layer on top of select(), poll() & co. is easier and could possibly be an incentive to avoid such code duplication. I'm coming up with this because I recently did something similar in pyftpdlib as an hack on top of asyncore to add support for epoll() and kqueue(), using the excellent Tornado's io loop as source of inspiration: http://code.google.com/p/pyftpdlib/issues/detail?id=203 http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.... The way I imagine it:
The handler is supposed to provide 3 methods: - handle_read_event - handle_write_event - handle_error_event Users willing to support multiple event loops such as wx, gtk etc can do:
Basically, this would be the whole API. Thoughts? --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/

2012/5/24 Giampaolo Rodolà <g.rodola@gmail.com>:
Further note: this is the approach I used in pyftpdlib. An even more abstracted approach would be having poller.poll() return a dict of {fd: events, fd, events, ...}, similarly to what Tornado currently does. This way we wouldn't be forcing the user to provide a handler class with the 3 methods described above. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/

2012/5/24 Ronald Oussoren <ronaldoussoren@mac.com>:
poller.poll serves the same purpose of asyncore.loop, yes, but this is supposed to be independent from asyncore. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/

On Thu, 24 May 2012 13:50:27 +0200 Giampaolo Rodolà <g.rodola@gmail.com> wrote:
I agree with Ronald that it looks like a less-braindead version of asyncore. I don't think the select module is the right place. Also, I don't know why you would specify poller.READ or poller.WRITE explicitly. Usually you are interested in all events, no? Regards Antoine.

2012/5/24 Antoine Pitrou <solipsis@pitrou.net>:
Yeah, probably. Usually when I post here I'm the first one not being sure whether what I propose is a good idea or not. =) Anyway, it must be clear that what I have in mind is not related to asyncore per-se. The proposal is to add a *generic* poller/reactor to select module as an abstraction layer on top of select(), poll(), epoll() and kqueue(), that's all.
Also, I don't know why you would specify poller.READ or poller.WRITE explicitly. Usually you are interested in all events, no?
Nope, that's what asyncore does and that's why it is significantly slower compared to more modern and clever async loops (independenly from the lack of epoll() / kqueue() support in asyncore). You should only be interested in reading for accepting sockets (servers) or when you want to receive data. You should only be interested in writing for connecting sockets (clients) or when you want to send data. Being interested in both when, say, you only intend to receive data is a considerable waste of time, especially when there are many concurrent connections. The performance degradation if you wildly look for both read and write events is *huge*, see benchmarks referring to old vs. new select() implementation here (~8.5x slowdown with 200 concurrent clients): http://code.google.com/p/pyftpdlib/issues/detail?id=203#c6 --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/

On 24 May, 2012, at 14:03, Antoine Pitrou wrote:
What worries me most is that it might only look like a beter version of asyncore. I'd much rather see something based on the event-handling core of Twisted because that code base is used in production and is hence more likely to be correct w.r.t. odd real-world conditions. IIRC doing this was discussed at the language summit in 2011, but as Nick mentions that doesn't seem to be the focus of PEP 3153. I am by the way not using Twisted myself, I'm at this time still using homebrew select loops and asyncore.
Also, I don't know why you would specify poller.READ or poller.WRITE explicitly. Usually you are interested in all events, no?
You're not always interested in write events, those are only interesting when you have data that must be written to a socket. Ronald

2012/5/24 Ronald Oussoren <ronaldoussoren@mac.com>:
Please, forget about asyncore: this has nothing to do with it per-se as it's just a reactor - it doesn't aim to provide any connection handling. Given the poor asyncore API I doubt it would be even integrable with it without breaking backward compatibility. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/

On 24May2012 14:03, Antoine Pitrou <solipsis@pitrou.net> wrote: | Also, I don't know why you would specify poller.READ or poller.WRITE | explicitly. Usually you are interested in all events, no? Personally, I would want specificity. If I only care about write (eg I'm only sending), I would only specify poller.WRITE and have my handler only know and care about that. Possibly it would be good to be able to raise an exception for events I hadn't handled, but I'd be half inclined to have my handler do that, were it wanted (yes, there is some tension in this sentence). Unless I'm missing something here. Just my 2c, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ I just didn't give up, not riding it out wasn't an option. You don't crash, until you do. The longer you ride it out the more likely you are to ride it out. Throwing it away, saves nothing. - J. Pridmore

On Thu, May 24, 2012 at 9:50 PM, Giampaolo Rodolà <g.rodola@gmail.com> wrote:
poller.poll serves the same purpose of asyncore.loop, yes, but this is supposed to be independent from asyncore.
I'd actually like to see something like this pitched as a "concurrent.eventloop" PEP. PEP 3153 really wasn't what I was expecting after the discussions at the PyCon US 2011 language summit - I was expecting "here's a common event loop all the async frameworks can hook into", but instead we got something a *lot* more ambitious taht tried to merge the entire IO stack for the async frameworks, rather than just provide a standard way for their event loops to cooperate. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, May 24, 2012 at 10:37 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
See the final section of my notes here: http://www.boredomandlaziness.org/2011/03/python-language-summit-rough-notes... Turns out the idea of a PEP 3153 level API *was* raised at the summit, but I'd still like to see a competing PEP that targets the reactor level API directly. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2012/5/24 Nick Coghlan <ncoghlan@gmail.com>:
It's not clear to me what such a PEP should address in particular, anyway here's a bunch of semi-random ideas. === Idea #1 === 4 classes (SelectPoller, PollPoller, EpollPoller, KqueuePoller) within concurrent.eventloop namespace all sharing the same API: - register(fd, events, callback) # callback gets called with events as arg - modify(fd, events) - unregister(fd) - call_later(timeout, callback, errback=None) - call_every(timeout, callback, errback=None) - poll(timeout=1.0, blocking=True) - close() call_later() and call_every() can return an object having cancel() and reset() methods. The user willing to register a new handler will do:
poller.register(sock.fileno(), poller.READ | poller.WRITE, callback)
...then, in the callback: def callback(events): if events & poller.ERROR and not events & poller.READ: disconnect() else: if events & poller.READ: read() if events & poller.WRITE: write() pros: highly customizable cons: too low level, requires manual handling === Idea #2 === same as #1 except: - register(fd, events) - poll(timeout=1.0) # desn't block, return {fd:events, fd:events, ...} === Idea #3 === same as #1 except: - register(fd, events, handler) - poll(timeout=1.0, blocking=True) ...poll() will call handler.handle_X_event() depending on the current event (READ, WRITE or ERROR). An internal map such as {fd:handler, fd:handler} will be maintaned internally. - pros: easier to use - cons: more rigid, requires a "contract" with the handler --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/

On 5/24/2012 2:40 PM, Giampaolo Rodolà wrote:
It's not clear to me what such a PEP should address in particular, anyway here's a bunch of semi-random ideas.
I have been reading for perhaps a decade how bad asyncore is. So I hope you stick with trying to thrash out something different, even if the discussion gets tedious or contentions.
For new classes, the first question is what concept (and data/function grouping) they and their instances represent. As a naive event loop user, I might think in terms of event sources (or sets of sources) and corresponding handlers. For events generated by 'file' polling, the particular method would seem like a secondary issue. Your proposed classes are named after methods and you give no initialization api. This suggests to me that you mean for all files being polled by the same method to be grouped together. If so, there would only need 0 or 1 instance of each 'class', in while case, they could just as well be modules. In other words, I am unsure what concept these classes would represent. I am perhaps thinking at too high a level.
-- Terry Jan Reedy

On 24 May, 2012, at 20:40, Giampaolo Rodolà wrote:
All of these are probably too low level to be the only API because they don't encapsulate error handling. A slightly higher level API would have a callback with received data and a buffered API for sending data. That way the networking library can deal with lowlevel socket API errors and translate them to usefull abtract errors. It would also handle some errors like and EGAIN error itself. Also: how would you use SSL with these APIs? The API would probably end up with functionality simular to Twisted's reactor and transport APIs (and possibly endpoints but I don't know how stable that API is). Ronald

On Wed, May 23, 2012 at 9:32 PM, Giampaolo Rodolà <g.rodola@gmail.com> wrote:
Frankly, I don't think this deserves a PEP at all, or even to consider one *yet*. Building a new API and a new library from scratch seems a frail comparison to testing a library in the real world, it having real uses, and then being incorporated into the stdlib. The problem here, of course, is that all the real-world solutions (ie, Twisted) include far more than the reactor.
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

On Fri, May 25, 2012 at 5:32 AM, Calvin Spealman <ironfroggy@gmail.com> wrote:
To be fair, PEP-3153 was built based largely on experience from the Twisted project and input from Twisted developers, who know what they are talking about and how to build a useful system. The entire transport/protocol separation is lifted directly out of it. -- Devin

On Fri, May 25, 2012 at 6:33 AM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:
My comments were in response to this post, not PEP-3153 -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

On May 25, 2012 7:33 PM, "Calvin Spealman" <ironfroggy@gmail.com> wrote:
No, the specific call at the PyCon US 2011 language summit was for a PEP that proposed a *new* event loop for the standard library that: 1. Provides simple event loop functionality in the standard library, as an improved alternative to asyncore for small apps that don't require the full power of a framework like Twisted (think things like little IRC bots, TCP echo servers, or testing of async components) 2. Provides a clean migration path to a production grade reactor like Twisted's 3. Makes it easier for multiple event loop based frameworks (e.g. tkinter, wxPython, PySide, Twisted) to all cooperate within the same process What we're after is something for the stdlib that is to event loops/reactors as wsgiref is to production grade WSGI servers like mod_wsgi and nginx. asyncore isn't it, because the migration path isn't clean. PEP 3153 currently spends a lot of time talking about transports and protocols, but doesn't answer those 3 core questions: 1. How do I write a simple IRC bot or TCP echo server? 2. How do I migrate my simple app to a production grade reactor like Twisted's? 3. How do I run two different concurrent.eventloop compatible reactors in the same process? As far as I can tell, PEP 3153 wants to handle all that by merging the I/O stacks of all the frameworks first, which strikes me as being *way* too ambitious for a first step. If we can't even figure out a common abstraction for the reactor level (ala WSGI), how are we ever going to agree on a standard async I/O abstraction? Cheers, Nick.

On Fri, May 25, 2012 at 9:53 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Obviously, for a man with many opinions I miss out on too many conversations and too many potential actions. I should make steps to correct this in the future. Thanks for clearing this up.
Cheers, Nick.
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

2012/5/24 Giampaolo Rodolà <g.rodola@gmail.com>:
Further note: this is the approach I used in pyftpdlib. An even more abstracted approach would be having poller.poll() return a dict of {fd: events, fd, events, ...}, similarly to what Tornado currently does. This way we wouldn't be forcing the user to provide a handler class with the 3 methods described above. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/

2012/5/24 Ronald Oussoren <ronaldoussoren@mac.com>:
poller.poll serves the same purpose of asyncore.loop, yes, but this is supposed to be independent from asyncore. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/

On Thu, 24 May 2012 13:50:27 +0200 Giampaolo Rodolà <g.rodola@gmail.com> wrote:
I agree with Ronald that it looks like a less-braindead version of asyncore. I don't think the select module is the right place. Also, I don't know why you would specify poller.READ or poller.WRITE explicitly. Usually you are interested in all events, no? Regards Antoine.

2012/5/24 Antoine Pitrou <solipsis@pitrou.net>:
Yeah, probably. Usually when I post here I'm the first one not being sure whether what I propose is a good idea or not. =) Anyway, it must be clear that what I have in mind is not related to asyncore per-se. The proposal is to add a *generic* poller/reactor to select module as an abstraction layer on top of select(), poll(), epoll() and kqueue(), that's all.
Also, I don't know why you would specify poller.READ or poller.WRITE explicitly. Usually you are interested in all events, no?
Nope, that's what asyncore does and that's why it is significantly slower compared to more modern and clever async loops (independenly from the lack of epoll() / kqueue() support in asyncore). You should only be interested in reading for accepting sockets (servers) or when you want to receive data. You should only be interested in writing for connecting sockets (clients) or when you want to send data. Being interested in both when, say, you only intend to receive data is a considerable waste of time, especially when there are many concurrent connections. The performance degradation if you wildly look for both read and write events is *huge*, see benchmarks referring to old vs. new select() implementation here (~8.5x slowdown with 200 concurrent clients): http://code.google.com/p/pyftpdlib/issues/detail?id=203#c6 --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/

On 24 May, 2012, at 14:03, Antoine Pitrou wrote:
What worries me most is that it might only look like a beter version of asyncore. I'd much rather see something based on the event-handling core of Twisted because that code base is used in production and is hence more likely to be correct w.r.t. odd real-world conditions. IIRC doing this was discussed at the language summit in 2011, but as Nick mentions that doesn't seem to be the focus of PEP 3153. I am by the way not using Twisted myself, I'm at this time still using homebrew select loops and asyncore.
Also, I don't know why you would specify poller.READ or poller.WRITE explicitly. Usually you are interested in all events, no?
You're not always interested in write events, those are only interesting when you have data that must be written to a socket. Ronald

2012/5/24 Ronald Oussoren <ronaldoussoren@mac.com>:
Please, forget about asyncore: this has nothing to do with it per-se as it's just a reactor - it doesn't aim to provide any connection handling. Given the poor asyncore API I doubt it would be even integrable with it without breaking backward compatibility. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/

On 24May2012 14:03, Antoine Pitrou <solipsis@pitrou.net> wrote: | Also, I don't know why you would specify poller.READ or poller.WRITE | explicitly. Usually you are interested in all events, no? Personally, I would want specificity. If I only care about write (eg I'm only sending), I would only specify poller.WRITE and have my handler only know and care about that. Possibly it would be good to be able to raise an exception for events I hadn't handled, but I'd be half inclined to have my handler do that, were it wanted (yes, there is some tension in this sentence). Unless I'm missing something here. Just my 2c, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ I just didn't give up, not riding it out wasn't an option. You don't crash, until you do. The longer you ride it out the more likely you are to ride it out. Throwing it away, saves nothing. - J. Pridmore

On Thu, May 24, 2012 at 9:50 PM, Giampaolo Rodolà <g.rodola@gmail.com> wrote:
poller.poll serves the same purpose of asyncore.loop, yes, but this is supposed to be independent from asyncore.
I'd actually like to see something like this pitched as a "concurrent.eventloop" PEP. PEP 3153 really wasn't what I was expecting after the discussions at the PyCon US 2011 language summit - I was expecting "here's a common event loop all the async frameworks can hook into", but instead we got something a *lot* more ambitious taht tried to merge the entire IO stack for the async frameworks, rather than just provide a standard way for their event loops to cooperate. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, May 24, 2012 at 10:37 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
See the final section of my notes here: http://www.boredomandlaziness.org/2011/03/python-language-summit-rough-notes... Turns out the idea of a PEP 3153 level API *was* raised at the summit, but I'd still like to see a competing PEP that targets the reactor level API directly. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2012/5/24 Nick Coghlan <ncoghlan@gmail.com>:
It's not clear to me what such a PEP should address in particular, anyway here's a bunch of semi-random ideas. === Idea #1 === 4 classes (SelectPoller, PollPoller, EpollPoller, KqueuePoller) within concurrent.eventloop namespace all sharing the same API: - register(fd, events, callback) # callback gets called with events as arg - modify(fd, events) - unregister(fd) - call_later(timeout, callback, errback=None) - call_every(timeout, callback, errback=None) - poll(timeout=1.0, blocking=True) - close() call_later() and call_every() can return an object having cancel() and reset() methods. The user willing to register a new handler will do:
poller.register(sock.fileno(), poller.READ | poller.WRITE, callback)
...then, in the callback: def callback(events): if events & poller.ERROR and not events & poller.READ: disconnect() else: if events & poller.READ: read() if events & poller.WRITE: write() pros: highly customizable cons: too low level, requires manual handling === Idea #2 === same as #1 except: - register(fd, events) - poll(timeout=1.0) # desn't block, return {fd:events, fd:events, ...} === Idea #3 === same as #1 except: - register(fd, events, handler) - poll(timeout=1.0, blocking=True) ...poll() will call handler.handle_X_event() depending on the current event (READ, WRITE or ERROR). An internal map such as {fd:handler, fd:handler} will be maintaned internally. - pros: easier to use - cons: more rigid, requires a "contract" with the handler --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/

On 5/24/2012 2:40 PM, Giampaolo Rodolà wrote:
It's not clear to me what such a PEP should address in particular, anyway here's a bunch of semi-random ideas.
I have been reading for perhaps a decade how bad asyncore is. So I hope you stick with trying to thrash out something different, even if the discussion gets tedious or contentions.
For new classes, the first question is what concept (and data/function grouping) they and their instances represent. As a naive event loop user, I might think in terms of event sources (or sets of sources) and corresponding handlers. For events generated by 'file' polling, the particular method would seem like a secondary issue. Your proposed classes are named after methods and you give no initialization api. This suggests to me that you mean for all files being polled by the same method to be grouped together. If so, there would only need 0 or 1 instance of each 'class', in while case, they could just as well be modules. In other words, I am unsure what concept these classes would represent. I am perhaps thinking at too high a level.
-- Terry Jan Reedy

On 24 May, 2012, at 20:40, Giampaolo Rodolà wrote:
All of these are probably too low level to be the only API because they don't encapsulate error handling. A slightly higher level API would have a callback with received data and a buffered API for sending data. That way the networking library can deal with lowlevel socket API errors and translate them to usefull abtract errors. It would also handle some errors like and EGAIN error itself. Also: how would you use SSL with these APIs? The API would probably end up with functionality simular to Twisted's reactor and transport APIs (and possibly endpoints but I don't know how stable that API is). Ronald

On Wed, May 23, 2012 at 9:32 PM, Giampaolo Rodolà <g.rodola@gmail.com> wrote:
Frankly, I don't think this deserves a PEP at all, or even to consider one *yet*. Building a new API and a new library from scratch seems a frail comparison to testing a library in the real world, it having real uses, and then being incorporated into the stdlib. The problem here, of course, is that all the real-world solutions (ie, Twisted) include far more than the reactor.
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

On Fri, May 25, 2012 at 5:32 AM, Calvin Spealman <ironfroggy@gmail.com> wrote:
To be fair, PEP-3153 was built based largely on experience from the Twisted project and input from Twisted developers, who know what they are talking about and how to build a useful system. The entire transport/protocol separation is lifted directly out of it. -- Devin

On Fri, May 25, 2012 at 6:33 AM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:
My comments were in response to this post, not PEP-3153 -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy

On May 25, 2012 7:33 PM, "Calvin Spealman" <ironfroggy@gmail.com> wrote:
No, the specific call at the PyCon US 2011 language summit was for a PEP that proposed a *new* event loop for the standard library that: 1. Provides simple event loop functionality in the standard library, as an improved alternative to asyncore for small apps that don't require the full power of a framework like Twisted (think things like little IRC bots, TCP echo servers, or testing of async components) 2. Provides a clean migration path to a production grade reactor like Twisted's 3. Makes it easier for multiple event loop based frameworks (e.g. tkinter, wxPython, PySide, Twisted) to all cooperate within the same process What we're after is something for the stdlib that is to event loops/reactors as wsgiref is to production grade WSGI servers like mod_wsgi and nginx. asyncore isn't it, because the migration path isn't clean. PEP 3153 currently spends a lot of time talking about transports and protocols, but doesn't answer those 3 core questions: 1. How do I write a simple IRC bot or TCP echo server? 2. How do I migrate my simple app to a production grade reactor like Twisted's? 3. How do I run two different concurrent.eventloop compatible reactors in the same process? As far as I can tell, PEP 3153 wants to handle all that by merging the I/O stacks of all the frameworks first, which strikes me as being *way* too ambitious for a first step. If we can't even figure out a common abstraction for the reactor level (ala WSGI), how are we ever going to agree on a standard async I/O abstraction? Cheers, Nick.

On Fri, May 25, 2012 at 9:53 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Obviously, for a man with many opinions I miss out on too many conversations and too many potential actions. I should make steps to correct this in the future. Thanks for clearing this up.
Cheers, Nick.
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy
participants (8)
-
Antoine Pitrou
-
Calvin Spealman
-
Cameron Simpson
-
Devin Jeanpierre
-
Giampaolo Rodolà
-
Nick Coghlan
-
Ronald Oussoren
-
Terry Reedy