asyncore: included batteries don't fit

hello python-ideas, i'd like to start discussion about the state of asyncore/asynchat's adaption in the python standard library, with the intention of finding a roadmap for how to improve things, and of kicking off and coordinating implementations. here's the problem (as previously described in [issue15978] and redirected here, with some additions): the asyncore module would be much more useful if it were well integrated in the standard library. in particular, it should be supported by: * subprocess * BaseHTTPServer / http.server (and thus, socketserver) * urllib2 / urllib, http.client * probably many other network libraries except smtpd, which already uses asyncore * third party libraries (if stdlib leads the way, the ecosystem will follow; eg pyserial) without widespread asyncore support, it is not possible to easily integrate different servers and services with each other; with asyncore support, it's just a matter of creating the objects and entering the main loop. (eg, a http server for controlling a serial device, with a telnet-like debugging interface). some examples of the changes required: * the socketserver documents that it would like to have such a framework ("Future work: [...] Standard framework for select-based multiplexing"). due to the nature of socketserver based implementations (blocking reads), we can't just "add glue so it works", but there could be extensions so that implementations can be ported to asynchronous socketservers. i've done if for a particular case (ported SimpleHTTPServer, but it's a mess of monkey-patching and intermediate StringIOs). * for subprocess, there's a bunch of recipies at [1]. * pyserial (not standard library, but might as well become) can be ported quite easily [2]. this touches several modules whose implementations can be handled independently from each other; i'd implement some of them myself. terry.reedy redirected me from the issue tracker to this list, hoping for controversy and alternatives. if you'd like to discuss, throw in questions, and we'll find a solution. if you'd think talk is cheap, i can try to work out first sketches. python already has batteries for nonblocking operation included, and i say it's doing it right -- let's just make sure the batteries fit in the other gadgets! yours truly chrysn [1] http://code.activestate.com/recipes/576957-asynchronous-subprocess-using-asy... [2] http://sourceforge.net/tracker/?func=detail&aid=3559321&group_id=46487&atid=446305 [issue15978] http://bugs.python.org/issue15978 -- Es ist nicht deine Schuld, dass die Welt ist, wie sie ist -- es wär' nur deine Schuld, wenn sie so bleibt. (You are not to blame for the state of the world, but you would be if that state persisted.) -- Die Ärzte

Hi! On Sat, Sep 22, 2012 at 06:31:06PM +0200, chrysn <chrysn@fsfe.org> wrote:
It seems you want Twisted, no? Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Sat, Sep 22, 2012 at 08:52:53PM +0400, Oleg Broytman wrote:
if these considerations end in twisted being consecrated as the new asyncore, i'd consider that a valid solution too. then, again, subprocess and the onboard servers should work well with *that* out of the box. best regards chrysn -- To use raw power is to make yourself infinitely vulnerable to greater powers. -- Bene Gesserit axiom

On Sat, Sep 22, 2012 at 08:27:10PM +0200, chrysn <chrysn@fsfe.org> wrote:
If you mean that Twisted will be included in the standard library -- then no, I'm sure it will not. Python comes with batteries included, but Twisted is not a battery, it's rather a power plant. I am sure it will always be developed and distributed separately. And developing asyncore to the level of Twisted would be a duplication of effort.
If you want subprocess and Twisted to work together -- you know where to send patches. PS. In my not so humble opinion what the standard library really lacks in this area is a way to combine a few asynchronous libraries with different mainloops. Think about wxPython+Twisted in one program. But I have no slightest idea how to approach the problem. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Sat, Sep 22, 2012 at 09:50:39PM +0200, Amaury Forgeot d'Arc <amauryfa@gmail.com> wrote:
And wxPython has a meaning to extend its main loop. But these are only partial solutions. There are much more libraries with mainloops. D-Bus/GLib, e.g. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Sat, Sep 22, 2012 at 10:52:10PM +0400, Oleg Broytman wrote:
well, what about python including a battery and a battery plug, then? asyncore could be the battery, and a interface between asynchronous libraries the battery plug. users could start developing with batteries, and when the project grows, just plug it into a power plant. less analogy, more technical: the asyncore dispatcher to main loop interface is pretty thin -- there is a (global or explicitly passed) "map" (a dictionary), mapping file descriptors to objects that can be readable or writable (or acceptable, not sure if that detail is really needed that far down). a dispatcher registers to a map, and then the main loop select()s for events on all files and dispatches them accordingly. it won't be as easy as just taking that interface, eg because it lacks timeouts, but i think it can be the "way to combine a few asynchronous libraries". (to avoid asyncore becoming a powerplant itself, it could choose not to implement some features for simplicity. for example, if asyncore chose to still not implement timeouts, registering timeouts to a asyncore based main loop would just result in a NotImplementedError telling the user to get a more powerful main loop.) i don't want to claim i know how that could work in detail or even if it could work at all, but if this is interesting for enough people that it will be used, i'd like to find out.
no, actually -- for now, it'd be a patch to twisted (who'd reply with "we already have a way of dealing with it"). if asyncore's interface becomes the battery plug, it'd be a patch to subprocess. thanks for sharing your ideas chrysn -- To use raw power is to make yourself infinitely vulnerable to greater powers. -- Bene Gesserit axiom

On Sat, 22 Sep 2012 18:31:06 +0200 chrysn <chrysn@fsfe.org> wrote:
SSL support is also lacking: http://bugs.python.org/issue10084 Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

From a chronological standpoint I still think the best thing to do in order to fix the "python async problem" once and for all is to first define and
I still think this proposal is too vaguely defined and any effort towards adding async IO support to existing batteries is premature for different reasons, first of which the inadequacy of asyncore as the base async framework to fulfill the task you're proposing. asyncore is so old and difficult to fix/enhance without breaking backward compatibility (see for example http://bugs.python.org/issue11273#msg156439) that relying on it for any modern work is inevitably a bad idea. possibly implement an "async WSGI interface" describing what a standard async IO loop/reactor should look like (in terms of API) and how to integrate with it, see: http://mail.python.org/pipermail/python-ideas/2012-May/015223.html http://mail.python.org/pipermail/python-ideas/2012-May/015235.html
In my mind this is the ideal long-term scenario but even managing to define an "async WSGI interface" alone would be a big step forward. Again, at this point in time what you're proposing looks too vague, ambitious and premature to me. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/ 2012/9/22 chrysn <chrysn@fsfe.org>

Temporarily un-lurking to reply to this thread (which I'll actually be reading). Giampaolo and I talked about this for a bit over the weekend, and I have to say that I agree with his perspective. In particular, to get something better than asyncore, there must be something minimally better to build upon. I don't really have an opinion on what that minimally better thing should be named, but I do agree that having a simple reactor API that has predictable behavior over the variety of handlers (select, poll, epoll, kqueue, WSAEvent in Windows, etc.) is necessary. Now, let's get to brass tacks... 1. Whatever reactors are available, you need to be able to instantiate multiple of different types of reactors and multiple instances of the same type of reactor simultaneously (to support multiple threads handling different groups of reactors, or different reactors for different types of objects on certain platforms). While this allows for insanity in the worst-case, we're all consenting adults here, so shouldn't be limited by reactor singletons. There should be a default reactor class, which is defined on module/package import (use the "best" one for the platform). 2. The API must be simple. I am not sure that it can get easier than Idea #3 from: http://mail.python.org/pipermail/python-ideas/2012-May/015245.html I personally like it because it offers a simple upgrade path for asyncore users (create your asyncore-derived classes, pass it into the new reactor), while simultaneously defining a relatively easy API for any 3rd party to integrate with. By offering an easy-to-integrate method for 3rd parties (that is also sane), there is the added bonus that 3rd parties are more likely to integrate, rather than replace, which means more use in the "real world", better bug reports, etc. To simplify integration further, make the API register(fd, handler, events=singleton). Passing no events from the caller means "register me for all events", which will help 3rd parties that aren't great with handling read/write registration. 3. I don't have a 3rd tack, you can hang things on the wall with 2 ;) Regards, - Josiah On Mon, Sep 24, 2012 at 3:31 PM, Giampaolo Rodolà <g.rodola@gmail.com> wrote:

On Mon, Sep 24, 2012 at 03:31:37PM -0700, Giampaolo Rodolà wrote:
i wasn't aware that pep 3153 exists. given that, my original intention of this thread should be re-worded into "let's get pep3153 along!". i'm not convinced by the api suggested in the first mail, as it sounds very unix centric (poll, read/write/error). i rather imagined leaving the details of the callbackable/mainloop interaction to be platform details. (a win32evtlog event source just couldn't possibly register with a select() based main loop). i'd prefer to keep the part that registers with the a main loop concentrated to a very lowlevel common denominator. for unix, that'd mean that there is a basic callbackable for "things that receive events because they have a fileno". everything above that, eg the distinction whether a "w" event means that we can write() or that we must accept() could happen above that and wouldn't have to be concerned with the main loop integration any more. in case (pseudo)code gets the idea over better: class UnixFilehandle(object): def __init__(self, fileno): self._fileno = fileno def register_with_main_loop(self, mainloop): # it might happen that the main loop doesn't support unix # filenos. tough luck, in that case -- the developer should # select a more suitable main loop. mainloop.register_unix_fileno(self._fileno, self) def handle_r_event(self): raise NotImplementedError("Not configured to receive that sort of event") # if you're sure you'd never receive any anyway, you can # not-register them by setting them None in the subclass handle_w_event = handle_e_event = handle_r_event class SocketServer(UnixFilehandle): def __init__(self, socket): self._socket = socket UnixFilehandle.init(socket.fileno()) def handle_w_event(self): self.handle_accept_event(self.socket.accept()) other interfaces parallel to the file handle interface would, for example, handle unix signals. (built atop of that, like the accept-handling socket server, could be an that deals with child processes.) the interface for android might look different again, because there is no main loop and select never gets called by the application.
i'd welcome such an interface. if asyncore can then be retrofitted to accept that interface too w/o breaking compatibility, it'd be nice, but if not, it's asyncore2, then.
Again, at this point in time what you're proposing looks too vague, ambitious and premature to me.
please don't get me wrong -- i'm not proposing anything for immediate action, i just want to start a thinking process towards a better integrated stdlib. On Mon, Sep 24, 2012 at 05:02:08PM -0700, Josiah Carlson wrote:
i think that's already common. with asyncore, you can have different maps (just one is installed globally as default). with the gtk main loop, it's a little tricky (the gtk.main() function doesn't simply take an argument), but the underlying glib can do that afaict.
it's good that the necessities of call_later and call_every are mentioned here, i'd have forgotten about them. we've talked about many things we'd need in a python asynchronous interface (not implementation), so what are the things we *don't* need? (so we won't start building a framework like twisted). i'll start: * high-level protocol handling (can be extra modules atop of it) * ssl * something like the twisted delayed framework (not sure about that, i guess the twisted people will have good reason to use it, but i don't see compelling reasons for such a thing in a minimal interface from my limited pov) * explicit connection handling (retries, timeouts -- would be up to the user as well, eg urllib might want to set up a timeout and retries for asynchronous url requests) best regards chrysn -- To use raw power is to make yourself infinitely vulnerable to greater powers. -- Bene Gesserit axiom

On Wed, Sep 26, 2012 at 1:17 AM, chrysn <chrysn@fsfe.org> wrote:
Go ahead and read PEP 3153, we will wait. A careful reading of PEP 3153 will tell you that the intent is to make a "light" version of Twisted built into Python. There isn't any discussion as to *why* this is a good idea, it just lays out the plan of action. Its ideas were gathered from the experience of the Twisted folks. Their experience is substantial, but in the intervening 1.5+ years since Pycon 2011, only the barest of abstract interfaces has been defined (https://github.com/lvh/async-pep/blob/master/async/abstract.py), and no discussion has taken place as to forward migration of the (fairly large) body of existing asyncore code.
Of course not, but then again no one would attempt to do as much. They would use a WSAEvent reactor, because that's the only thing that it would work with. That said, WSAEvent should arguably be the default on Windows, so this issue shouldn't even come up there. Also, worrying about platform-specific details like "what if someone uses a source that is relatively uncommon on the platform" is a red-herring; get the interface/api right, build it, and start using it. To the point, Giampaolo already has a reactor that implements the interface (more or less "idea #3" from his earlier message), and it's been used in production (under staggering ftp(s) load). Even better, it offers effectively transparent replacement of the existing asyncore loop, and supports existing asyncore-derived classes. It is available: https://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop...
That is, incidentally, what Giampaolo has implemented already. I encourage you to read the source I linked above.
Easily done, because it's already been done ;)
I am curious as to what you mean by "a better integrated stdlib". A new interface that doesn't allow people to easily migrate from an existing (and long-lived, though flawed) standard library is not better integration. Better integration requires allowing previous users to migrate, while encouraging new users to join in with any later development. That's what Giampaolo's suggested interface offers on the lowest level; something to handle file-handle reactors, combined with a scheduler.
Remember that a reactor isn't just a dictionary of file handles to do stuff on, it's the thing that determines what underlying platform mechanics will be used to multiplex across channels. But that level of detail will be generally unused by most people, as most people will only use one at a time. The point of offering multiple reactors is to allow people to be flexible if they choose (or to pick from the different reactors if they know that one is faster for their number of expected handles).
I disagree with the last 3. If you have an IO loop, more often than not you want an opportunity to do something later in the same context. This is commonly the case for bandwidth limiting, connection timeouts, etc., which are otherwise *very* difficult to do at a higher level (which are the reasons why schedulers are built into IO loops). Further, SSL in async can be tricky to get right. Having the 20-line SSL layer as an available class is a good idea, and will save people time by not having them re-invent it (poorly or incorrectly) every time. Regards, - Josiah

On Wed, Sep 26, 2012 at 10:02:24AM -0700, Josiah Carlson wrote:
it doesn't look like twisted-light to me, more like a interface suggestion for a small subset of twisted. in particular, it doesn't talk about main loops / reactors / registration-in-the-first-place. you mention interaction with the twisted people. is there willingness, from the twisted side, to use a standard python middle layer, once it exists and has sufficiently high quality?
i've had a look at it, but honestly can't say more than that it's good to have a well-tested asyncore compatible main loop with scheduling support, and i'll try it out for my own projects.
a new interface won't make integration automatically happen, but it's something the standard library components can evolve on. whether, for example urllib2 will then automatically work asynchronously in that framework or whether we'll wait for urllib3, we'll see when we have it. @migrate from an existing standard library: is there a big user base for the current asyncore framework? my impression from is that it is not very well known among python users, and most that could use it use twisted.
i see; those should be provided, then. i'm afraid i don't completely get the point you're making, sorry for that, maybe i've missed important statements or lack sufficiently deep knowledge of topics affected and got lost in details. what is your opinion on the state of asynchronous operations in python, and what would you like it to be? thanks for staying with this topic chrysn -- To use raw power is to make yourself infinitely vulnerable to greater powers. -- Bene Gesserit axiom

On Wed, Oct 3, 2012 at 7:43 AM, chrysn <chrysn@fsfe.org> wrote:
Things don't "automatically work" without work. You can't just make urllib2 work asynchronously unless you do the sorts of greenlet-style stack switching that lies to you about what is going on, or unless you redesign it from scratch to do such. That's not to say that greenlets are bad, they are great. But expecting that a standard library implementing an updated async spec will all of a sudden hook itself into a synchronous socket client? I think that expectation is unreasonable.
"Well known" is an interesting claim. I believe it actually known of by quite a large part of the community, but due to a (perhaps deserved) reputation (that may or may not still be the case), isn't used as often as Twisted. But along those lines, there are a few questions that should be asked: 1. Is it desirable to offer users the chance to transition from asyncore-derived stuff to some new thing? 2. If so, what is necessary for an upgrade/replacement for asyncore/asynchat in the long term? 3. Would 3rd parties use this as a basis for their libraries? 4. What are the short, mid, and long-term goals? For my answers: 1. I think it is important to offer people who are using a standard library module to continue using a standard library module if possible. 2. A transition should offer either an adapter or similar-enough API equivalency between the old and new. 3. I think that if it offers a reasonable API, good functionality, and examples are provided - both as part of the stdlib and outside the stdlib, people will see the advantages of maintaining less of their own custom code. To the point: would Twisted use *whatever* was in the stdlib? I don't know the answer, but unless the API is effectively identical to Twisted, that transition may be delayed significantly. 4. Short: get current asyncore people transitioned to something demonstrably better, that 3rd parties might also use. Mid: pull parsers/logic out of cores of methods and make them available for sync/async/3rd party parsing/protocol handling (get the best protocol parsers into the stdlib, separated from the transport). Long: everyone contributes/updates the stdlib modules because it has the best parsers for protocols/formats, that can be used from *anywhere* (sync or async). My long-term dream (which has been the case for 6+ years, since I proposed doing it myself on the python-dev mailing list and was told "no") is that whether someone uses urllib2, httplib2, smtpd, requests, ftplib, etc., they all have access to high-quality protocol-level protocol parsers. So that once one person writes the bit that handles http 30X redirects, everyone can use it. So that when one person writes the gzip + chunked transfer encoding/decoding, everyone can use it.
I think it is functional, but flawed. I also think that every 3rd party that does network-level protocols are different mixes of functional and flawed. I think that there is a repeated and often-times wasted effort where folks are writing different and invariably crappy (to some extent) protocol parsers and network handlers. I think that whenever possible, that should stop, and the highest-quality protocol parsing functions/methods should be available in the Python standard library, available to be called from any library, whether sync, async, stdlib, or 3rd party. Now, my discussions in the context of asyncore-related upgrades may seem like a strange leap, but some of these lesser-quality parsing routines exist in asyncore-derived classes, as well as non-asyncore-derived classes. But if we make an effort on the asyncore side of things, under the auspices of improving one stdlib module, offering additional functionality, the obviousness of needing protocol-level parsers shared among sync/async should become obvious to *everyone* (that it isn't now the case I suspect is because the communities either don't spend a lot of time cross-pollinating, people like writing parsers - I do too ;) - or the sync folks end up going the greenlet route if/when threading bites them on the ass). Regards, - Josiah

On Fri, 5 Oct 2012 11:51:21 -0700 Josiah Carlson <josiah.carlson@gmail.com> wrote:
I'm not sure what you're talking about: what were you told "no" about, specifically? Your proposal sounds reasonable and (ideally) desirable to me. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Fri, Oct 5, 2012 at 1:09 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I've managed to find the email where I half-way proposed it (though not as pointed as what I posted above): http://mail.python.org/pipermail/python-dev/2004-November/049827.html Phillip J. Eby said in a reply that policy would kill it. My experience at the time told me that policy was a tough nut to crack, and my 24-year old self wasn't confident enough to keep pushing (even though I had the time). Now, my 32-year old self has the confidence and the knowledge to do it (or advise how to do it), but not the time (I'm finishing up my first book, doing a conference tour, running a startup, and preparing for my first child). One of the big reasons why I like and am pushing Giampaolo's ideas (and existing code) is my faith that he *can* and *will* do it, if he says he will. Regards, - Josiah

This is an incredibly important discussion. I would like to contribute despite my limited experience with the various popular options. My own async explorations are limited to the constraints of the App Engine runtime environment, where a rather unique type of reactor is required. I am developing some ideas around separating reactors, futures, and yield-based coroutines, but they take more thinking and probably some experimental coding before I'm ready to write it up in any detail. For a hint on what I'm after, you might read up on monocle (https://github.com/saucelabs/monocle) and my approach to building coroutines on top of Futures (http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/tasklets...). In the mean time I'd like to bring up a few higher-order issues: (1) How importance is it to offer a compatibility path for asyncore? I would have thought that offering an integration path forward for Twisted and Tornado would be more important. (2) We're at a fork in the road here. On the one hand, we could choose to deeply integrate greenlets/gevents into the standard library. (It's not monkey-patching if it's integrated, after all. :-) I'm not sure how this would work for other implementations than CPython, or even how to address CPython on non-x86 architectures. But users seem to like the programming model: write synchronous code, get async operation for free. It's easy to write protocol parsers that way. On the other hand, we could reject this approach: the integration would never be completely smooth, there's the issue of other implementations and architectures, it probably would never work smoothly even for CPython/x86 when 3rd party extension modules are involved. Callback-based APIs don't have these downsides, but they are harder to program; however we can make programming them easier by using yield-based coroutines. Even Twisted offers those (inline callbacks). Before I invest much more time in these ideas I'd like to at least have (2) sorted out. -- --Guido van Rossum (python.org/~guido)

On Sat, 6 Oct 2012 15:00:54 -0700 Guido van Rossum <guido@python.org> wrote:
greenlets/gevents only get you half the advantages of single-threaded "async" programming: they get you scalability in the face of a high number of concurrent connections, but they don't get you the robustness of cooperative multithreading (because it's not obvious when reading the code where the possible thread-switching points are). (I don't actually understand the attraction of gevent, except for extreme situations; threads should be cheap on a decent OS) Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sat, Oct 6, 2012 at 3:24 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I used to think that too, long ago, until I discovered that as you add abstraction layers, cooperative multithreading is untenable -- sooner or later you will lose track of where the threads are switched.
(I don't actually understand the attraction of gevent, except for extreme situations; threads should be cheap on a decent OS)
I think it's the observation that the number of sockets you can realistically have open in a single process or machine is always 1-2 orders of maginuted larger than the number of threads you can have -- and this makes sense since the total amount of memory (kernel and user) to represent a socket is just much smaller than needed for a thread. Just check the configuration limits of your typical Linux kernel if you don't believe me. :-) -- --Guido van Rossum (python.org/~guido)

On Sat, 6 Oct 2012 17:23:48 -0700 Guido van Rossum <guido@python.org> wrote:
Even with an explicit notation like "yield" / "yield from"? Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sun, Oct 7, 2012 at 3:09 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
If you strictly adhere to using those you should be safe (though distinguishing between the two may prove challenging) -- but in practice it's hard to get everyone and every API to use this style. So you'll have some blocking API calls hidden deep inside what looks like a perfectly innocent call to some helper function. IIUC in Go this is solved by mixing threads and lighter-weight constructs (say, greenlets) -- if a greenlet gets blocked for I/O, the rest of the system continues to make progress by spawning another thread. My own experience with NDB is that it's just too hard to make everyone use the async APIs all the time -- so I gave up and made async APIs an optional feature, offering a blocking and an async version of every API. I didn't start out that way, but once I started writing documentation aimed at unsophisticated users, I realized that it was just too much of an uphill battle to bother. So I think it's better to accept this and deal with it, possibly adding locking primitives into the mix that work well with the rest of the framework. Building a lock out of a tasklet-based (i.e. non-threading) Future class is easy enough. -- --Guido van Rossum (python.org/~guido)

Hi Guido and folks, On 07.10.12 17:04, Guido van Rossum wrote:
I'm digging in, a bit late. Still trying to read the myriad of messages. For now just a word: Guido: How much I would love to use your time machine and invite you to discuss Pythons future in 1998. Then we would have tossed greenlet/stackless and all that crap. Entering a different context could have been folded deeply into Python, by making it able to pickle program state in certain positions. Just dreaming out loud :-) It is great that this discussion is taking place, and I'll try to help. cheers - Chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

+1000 Can we dream with gevent integrated to standard cpython ? This would be a fantastic path for 3.4 :) And I definitely should move to 3.x. Because for web programming, I just can't think another way to program using python. I'm seeing some people going to other languages where async is more easy like Go (some are trying Erlang). Async is a MUST HAVE for web programming these days... In my experience, I've found that "robustness of cooperative multithreading" come at the price of a code difficult to maintain. And, in single threading it never reach the SMP benefits with easy. Thats why erlang shines... it abstracts the hard work of to maintain the switching under control. Gevent walks the same line: makes the programmer life easier. -- Carlo Pires 2012/10/6 Guido van Rossum <guido@python.org>

On Sat, Oct 6, 2012 at 3:00 PM, Guido van Rossum <guido@python.org> wrote:
Yield-based coroutines like monocle are the simplest way to do multi-paradigm in the same code. Whether you have a async-style reactor, greenlet-style stack switching, cooperatively scheduled generator trampolines, or just plain blocking threaded sockets; that style works with all of them (the futures and wrapper around everything just looks a little different). That said, it forces everyone to drink the same coroutine-styled kool-aid. That doesn't bother me. But I understand it, and have built similar systems before. I don't have an intuition about whether 3rd parties will like it or will migrate to it. Someone want to ping the Twisted and Tornado folks about it?
Combining your responses to #1 and now this, are you proposing a path forward for Twisted/Tornado to be greenlets? That's an interesting approach to the problem, though I can see the draw. ;) I have been hesitant on the Twisted side of things for an arbitrarily selfish reason. After 2-3 hours of reading over a codebase (which I've done 5 or 6 times in the last 8 years), I ask myself whether I believe I understand 80+% of how things work; how data flows, how callbacks/layers are invoked, and whether I could add a piece of arbitrary functionality to one layer or another (or to determine the proper layer in which to add the functionality). If my answer is "no", then my gut says "this is probably a bad idea". But if I start figuring out the layers before I've finished my 2-3 hours, and I start finding bugs? Well, then I think it's a much better idea, even if the implementation is buggy. Maybe something like Monocle would be better (considering your favor for that style, it obviously has a leg-up on the competition). I don't know. But if something like Monocle can merge it all together, then maybe I'd be happy. Incidentally, I can think of a few different styles of wrappers that would actually let people using asyncore-derived stuff use something like Monocle. So maybe that's really the right answer? Regards, - Josiah P.S. Thank you for weighing in on this Guido. Even if it doesn't end up the way I had originally hoped, at least now there's discussion.

On Sat, Oct 6, 2012 at 7:22 PM, Josiah Carlson <josiah.carlson@gmail.com> wrote:
Glad I'm not completely crazy here. :-)
They should be reading this. Or maybe we should bring it up on python-dev before too long.
Can't tell whether you're serious, but that's not what I meant. Surely it will never fly for Twisted. Tornado apparently already works with greenlets (though maybe through a third party hack). But personally I'd be leaning towards rejecting greenlets, for the same reasons I've kept the doors tightly shut for Stackless -- I like it fine as a library, but not as a language feature, because I don't see how it can be supported on all platforms where Python must be supported. However I figured that if we define the interfaces well enough, it might be possible to use (a superficially modified version of) Twisted's reactors instead of the standard ones, and, orthogonally, Twisted's deferred's could be wrapped in the standard Futures (or the other way around?) when used with a non-Twisted reactor. Which would hopefully open the door for migrating some of their more useful protocol parsers into the stdlib.
Can't figure what you're implying here. On which side does Twisted fall for you?
My worry is that monocle is too simple and does not cater for advanced needs. It doesn't seem to have caught on much outside the company where it originated.
I still don't really think asyncore is going to be a problem. It can easily be separated into a reactor and callbacks.
Hm, there seemed to be plenty of discussion before... -- --Guido van Rossum (python.org/~guido)

On Sun, Oct 7, 2012 at 12:05 AM, Guido van Rossum <guido@python.org> wrote:
I thought futures were meant for thread and process pools? The blocking methods make them a bad fit for an asynchronous networking toolset. The Twisted folks have discussed integrating futures and Twisted (see also the reply, which has some corrections): http://twistedmatrix.com/pipermail/twisted-python/2011-January/023296.html -- Devin

On Saturday, October 6, 2012, Devin Jeanpierre wrote:
The specific Future implementation in the py3k stdlib uses threads and is indeed meant for thread and process pools. But the *concept* of futures works fine in event-based systems, see the link I posted into the NDB sources. I'm not keen on cancellation and threadpools FWIW.
-- --Guido van Rossum (python.org/~guido)

On Oct 06, 2012, at 03:00 PM, Guido van Rossum wrote:
This is an incredibly important discussion.
Indeed. If Python gets it right, it could be yet another killer reason for upgrading to Python 3, at least for the growing subset of event-driven applications.
(1) How importance is it to offer a compatibility path for asyncore?
I've written and continue to use async-based code. I don't personally care much about compatibility. I've use async because it was the simplest and most stdlibby of the options for the Python versions I can use, but I have no love for it. If there were a better, more readable and comprehensible way to do it, I'd ditch the async-based versions as soon as possible.
I would have thought that offering an integration path forward for Twisted and Tornado would be more important.
Agreed. I share the same dream as someone else in this thread mentioned. It would be really fantastic if the experts in a particular protocol could write support for that protocol Just Once and have it as widely shared as possible. Maybe this is an unrealistic dream, but now's the time to have them anyway. Even something like the email package could benefit from this. The FeedParser is our attempt to support asynchronous reading of email data for parsing. I'm not so sure that the asynchronous part of that is very useful. -Barry

I'll have to put in my ..02€ here … Guido van Rossum <guido@...> writes:
(2) We're at a fork in the road here. On the one hand, we could choose to deeply integrate greenlets/gevents into the standard library.
Yes. I have two and a half reasons for this. (½) Ultimately I think that switching stacks around is always going to be faster than unwinding and re-winding things with yield(). (1) It's a whole lot easier to debug a problem with gevent than with anything which uses yield / Deferreds / asyncore / whatever. With gevent, you get a standard stack trace. With anything else, the "where did this call come from" information is not part of the call chain and thus is either unavailable, or will have to be carried around preemptively (with associated overhead). (2) Nothing against Twisted or any other async frameworks, but writing any nontrivial program in it requires warping my brain into something that's *not* second nature in Python, and never going to be. Python is not Javascript; if you want to use the "loads of callbacks" programming style, use node.js. Personal experience: I have written an interpreter for an asynchronous and vaguely Pythonic language which I use for home automation, my lawn sprinkers, and related stuff (which I should probably release in some form). The code was previously based on Twisted and was impossible to debug. It now uses gevent and Just Works. -- -- Matthias Urlichs

Ok I'll add a buck... On 16.10.12 20:40, Matthias Urlichs wrote:
If you are emulating things in Python, that may be true. Also if you are really only switching stacks, that may be true. But both assumptions do not fit, see below.
I'm absolutely your's on ease of coding straight forward. But this new, efficient "yield from" is a big step into that direction, see Greg's reply.
Same here.
You are using gevent, which uses greenlet! That means no pure stack switching, but the stack is sliced and moved onto the heap. But that technique (originally from Stackless 2.0) is known to be 5-10 times slower, compared to a cooperative context switching that is built into the interpreter. This story is by far not over. Even PyPy with all its advanced technology still depends on stack slicing when it emulates concurrency. Python 3.3 has done a huge move, because this efficient nesting of generators can deeply influence how people are coding, maybe with the effect that stack tricks loose more of their importance. I expect more like this to come. Greenlets are great. Stack inversion is faster. -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Do you use gevent's monkeypatch-the-stdlib feature? On Tue, Oct 16, 2012 at 8:40 PM, Matthias Urlichs <matthias@urlichs.de>wrote:
That seems like something that can be factually proven or counterproven.
gevent uses stack slicing, which IIUC is pretty expensive. Why is it not subject to the performance overhead you mention? Can you give an example of such a crappy stack trace in twisted? I develop in it all day, and get pretty decent stack traces. The closest thing I have to a crappy stack trace is when doing functional tests with an RPC API -- obviously on the client side all I'm going to see is a fairly crappy just-an-exception. That's okay, I also get the server side exception that looks like a plain old Python traceback to me and tells me exactly where the problem is from.
Which ones are you thinking about other than twisted? It seems that the issue you are describing is one of semantics, not so much of whether or not it actually does things asynchronously under the hood, as e.g gevent does too.
Python is not Javascript; if you want to use the "loads of callbacks" programming style, use node.js.
None of the solutions on the table have node.js-style "loads of callbacks". Everything has some way of structuring them. It's either implicit switches (as in "can happen in the caller"), explicit switches (as in yield/yield from) or something like deferreds, some options having both of the latter.
If you have undebuggable code samples from that I'd love to take a look.
-- cheers lvh

Oh my me. This is a very long thread that I probably should have replied to a long time ago. This thread is intensely long right now, and tonight is the first chance I've had to try and go through it comprehensively. I'll try to reply to individual points made in the thread -- if I missed yours, please don't be offended, I promise it's my fault :) FYI, I'm the sucker who originally got tricked into starting PEP 3153, aka async-pep. First of all, I'm glad to see that there's some more "let's get that pep along" movement. I tabled it because: a) I didn't have enough time to contribute, b) a lot of promised contributions ended up not happening when it came down to it, which was incredibly demotivating. The combination of this thread, plus the fact that I was strong armed at Pycon ZA by a bunch of community members that shall not be named (Alex, Armin, Maciej, Larry ;-)) into exploring this thing again. First of all, I don't feel async-pep is an attempt at twisted light in the stdlib. Other than separation of transport and protocol, there's not really much there that even smells of twisted (especially since right now I'd probably throw consumers/producers out) -- and that separation is simply good practice. Twisted does the same thing, but it didn't invent it. Furthermore, the advantages seem clear: reusability and testability are more than enough for me. If there's one take away idea from async-pep, it's reusable protocols. The PEP should probably be a number of PEPs. At first sight, it seems that this number is at least four: 1. Protocol and transport abstractions, making no mention of asynchronous IO (this is what I want 3153 to be, because it's small, manageable, and virtually everyone appears to agree it's a fantastic idea) 2. A base reactor interface 3. A way of structuring callbacks: probably deferreds with a built-in inlineCallbacks for people who want to write synchronous-looking code with explicit yields for asynchronous procedures 4+ adapting the stdlib tools to using these new things Re: forward path for existing asyncore code. I don't remember this being raised as an issue. If anything, it was mentioned in passing, and I think the answer to it was something to the tune of "asyncore's API is broken, fixing it is more important than backwards compat". Essentially I agree with Guido that the important part is an upgrade path to a good third-party library, which is the part about asyncore that REALLY sucks right now. Regardless, an API upgrade is probably a good idea. I'm not sure if it should go in the first PEP: given the separation I've outlined above (which may be too spread out...), there's no obvious place to put it besides it being a new PEP. Re base reactor interface: drawing maximally from the lessons learned in twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later, etc), asynchronous-looking name lookup, fd handling are the important parts. call_every can be implemented in terms of call_later on a separate object, so I think it should be (eg twisted.internet.task.LoopingCall). One thing that is apparently forgotten about is event loop integration. The prime way of having two event loops cooperate is *NOT* "run both in parallel", it's "have one call the other". Even though not all loops support this, I think it's important to get this as part of the interface (raise an exception for all I care if it doesn't work). cheers lvh

On Tue, Oct 9, 2012 at 11:00 AM, Laurens Van Houtven <_@lvh.cc> wrote:
No problem, I'm running behind myself...
FYI, I'm the sucker who originally got tricked into starting PEP 3153, aka async-pep.
I suppose that's your pet name for it. :-) For most everyone else it's PEP 3153.
Is there a newer version that what's on http://www.python.org/dev/peps/pep-3153/ ? It seems to be missing any specific proposals, after spending a lot of time giving a rationale and defining some terms. The version on https://github.com/lvh/async-pep doesn't seem to be any more complete.
But the devil is in the details. *What* specifically are you proposing? How would you write a protocol handler/parser without any reference to I/O? Most protocols are two-way streets -- you read some stuff, and you write some stuff, then you read some more. (HTTP may be the exception here, if you don't keep the connection open.)
2. A base reactor interface
I agree that this should be a separate PEP. But I do think that in practice there will be dependencies between the different PEPs you are proposing.
Your previous two ideas sound like you're not tied to backward compatibility with Tornado and/or Twisted (not even via an adaptation layer). Given that we're talking Python 3.4 here that's fine with me (though I think we should be careful to offer a path forward for those packages and their users, even if it means making changes to the libraries). But Twisted Deferred is pretty arcane, and I would much rather not use it as the basis of a forward-looking design. I'd much rather see what we can mooch off PEP 3148 (Futures).
4+ adapting the stdlib tools to using these new things
We at least need to have an idea for how this could be done. We're talking serious rewrites of many of our most fundamental existing synchronous protocol libraries (e.g. httplib, email, possibly even io.TextWrapper), most of which have had only scant updates even through the Python 3 transition apart from complications to deal with the bytes/str dichotomy.
I have the feeling that the main reason asyncore sucks is that it requires you to subclass its Dispatcher class, which has a rather treacherous interface.
Aren't all your proposals API upgrades?
That actually sounds more concrete than I'd like a reactor interface to be. In the App Engine world, there is a definite need for a reactor, but it cannot talk about file descriptors at all -- all I/O is defined in terms of RPC operations which have their own (several layers of) async management but still need to be plugged in to user code that might want to benefit from other reactor functionality such as scheduling and placing a call at a certain moment in the future.
This is definitely one of the things we ought to get right. My own thoughts are slightly (perhaps only cosmetically) different again: ideally each event loop would have a primitive operation to tell it to run for a little while, and then some other code could tie several event loops together. Possibly the primitive operation would be something like "block until either you've got one event ready, or until a certain time (possibly 0) has passed without any events, and then give us the events that are ready and a lower bound for when you might have more work to do" -- or maybe instead of returning the event(s) it could just call the associated callback (it might have to if it is part of a GUI library that has callbacks written in C/C++ for certain events like screen refreshes). Anyway, it would be good to have input from representatives from Wx, Qt, Twisted and Tornado to ensure that the *functionality* required is all there (never mind the exact signatures of the APIs needed to provide all that functionality). -- --Guido van Rossum (python.org/~guido)

On Thu, Oct 11, 2012 at 5:18 PM, Guido van Rossum <guido@python.org> wrote:
Could you be more specific? I've never heard Deferreds in particular called "arcane". They're very popular in e.g. the JS world, and possibly elsewhere. Moreover, they're extremely similar to futures, so if one is arcane so is the other. Maybe if you could elaborate on features of their designs that are better/worse? As far as I know, they mostly differ in that: - Callbacks are added in a pipeline, rather than "in parallel" - Deferreds pass in values along the pipeline, rather than self (and have a separate pipeline for error values). Neither is clearly better or more obvious than the other. If anything I generally find deferred composition more useful than deferred tee-ing, so I feel like composition is the correct base operator, but you could pick another. Either way, each is implementable in terms of the other (ish?). The pipeline approach is particularly nice for the errback pipeline, because it allows chained exception (Failure) handling on the deferred to be very simple. The larger issue is that futures don't make chaining easy at all, even if it is theoretically possible. For example, look at the following Twisted code: http://bpaste.net/show/RfEwoaflO0qY76N8NjHx/ , and imagine how that might generalize to more realistic error handling scenarios. The equivalent Futures code would involve creating one Future per callback in the pipeline and manually hooking them up with a special callback that passes values to the next future. And if we add that to the futures API, the API will almost certainly be somewhat similar to what Twisted has with deferreds and chaining and such. So then, equally arcane. To my mind, it is Futures that need to mooch off of Deferreds, not the other way around. Twisted's Deferreds have a lot of history with making asynchronous computation pleasant, and Futures are missing a lot of good tools. -- Devin

On Thu, Oct 11, 2012 at 3:42 PM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:
Really? Twisted is used in the JS world? Or do you just mean the pervasiveness of callback style async programming? That's one of the things I am desperately trying to keep out of Python, I find that style unreadable and unmanageable (whenever I click on a button in a website and nothing happens I know someone has a bug in their callbacks). I understand you feel different; but I feel the general sentiment is that callback-based async programming is even harder than multi-threaded programming (and nobody is claiming that threads are easy :-).
and possibly elsewhere. Moreover, they're extremely similar to futures, so if one is arcane so is the other.
I love Futures, they represent a nice simple programming model. But I especially love that you can write async code using Futures and yield-based coroutines (what you call inlineCallbacks) and never have to write an explicit callback function. Ever.
These two combined are indeed what mostly feels arcane to me.
If you're writing long complicated chains of callbacks that benefit from these features, IMO you are already doing it wrong. I understand that this is a matter of style where I won't be able to convince you. But style is important to me, so let's agree to disagree.
But as soon as you switch from callbacks to yield-based coroutines the chaining becomes natural, error handling is just a matter of try/except statements (or not if you want the error to bubble up) and (IMO) the code becomes much more readable.
Looks fine to me. I have a lot of code like that in NDB and it works great. (Note that NDB's Futures are not the same as PEP 3148 Futures, although they have some things in common; in particular NDB Futures are not tied to threads.)
The *implementation* of this stuff in NDB is certainly hairy; I already posted the link to the code: http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/tasklets... However, this is internal code and doesn't affect the Future API at all.
I am totally open to learning from Twisted's experience. I hope that you are willing to share even the end result might not look like Twisted at all -- after all in Python 3.3 we have "yield from" and return from a generator and many years of experience with different styles of async APIs. In addition to Twisted, there's Tornado and Monocle, and then there's the whole greenlets/gevent and Stackless/microthreads community that we can't completely ignore. I believe somewhere is an ideal async architecture, and I hope you can help us discover it. (For example, I am very interested in Twisted's experiences writing real-world performant, robust reactors.) -- --Guido van Rossum (python.org/~guido)

First of all, sorry for not snipping the reply I made previously. Noticed that only after I sent it :( On Thu, Oct 11, 2012 at 7:37 PM, Guido van Rossum <guido@python.org> wrote:
Ah, I mean Deferreds. I attended a talk earlier this year all about deferreds in JS, and not a single reference to Python or Twisted was made! These are the examples I remember mentioned in the talk: - http://api.jquery.com/category/deferred-object/ (not very twistedish at all, ill-liked by the speaker) - http://mochi.github.com/mochikit/doc/html/MochiKit/Async.html (maybe not a good example, mochikit tries to be "python in JS") - http://dojotoolkit.org/reference-guide/1.8/dojo/Deferred.html - https://github.com/kriskowal/q (also includes an explanation of why the author likes deferreds) There were a few more that the speaker mentioned, but didn't cover. One of his points was that the various systems of deferreds are subtly different, some very badly so, and that it was a mess, but that deferreds were still awesome. JS is a language where async programming is mainstream, so lots of people try to make it easier, and they all do it slightly differently.
:S There are (at least?) four different styles of asynchronous computation used in Twisted, and you seem to be confused as to which ones I'm talking about. 1. Explicit callbacks: For example, reactor.callLater(t, lambda: print("woo hoo")) 2. Method dispatch callbacks: Similar to the above, the reactor or somebody has a handle on your object, and calls methods that you've defined when events happen e.g. IProtocol's dataReceived method 3. Deferred callbacks: When you ask for something to be done, it's set up, and you get an object back, which you can add a pipeline of callbacks to that will be called whenever whatever happens e.g. twisted.internet.threads.deferToThread(print, "x").addCallback(print, "x was printed in some other thread!") 4. Generator coroutines These are a syntactic wrapper around deferreds. If you yield a deferred, you will be sent the result if the deferred succeeds, or an exception if the deferred fails. e.g. examples from previous message I don't see a reason for the first to exist at all, the second one is kind of nice in some circumstances (see below), but perhaps overused. I feel like you're railing on the first and second when I'm talking about the third and fourth. I could be wrong.
The reason explicit non-deferred callbacks are involved in Twisted is because of situations in which deferreds are not present, because of past history in Twisted. It is not at all a limitation of deferreds or something futures are better at, best as I'm aware. (In case that's what you're getting at.) Anyway, one big issue is that generator coroutines can't really effectively replace callbacks everywhere. Consider the GUI button example you gave. How do you write that as a coroutine? I can see it being written like this: def mycoroutine(gui): while True: clickevent = yield gui.mybutton1.on_click() # handle clickevent But that's probably worse than using callbacks.
This is more than a matter of style, so at least for now I'd like to hold off on calling it even. In my day to day silly, synchronous, python code, I do lots of synchronous requests. For example, it's not unreasonable for me to want to load two different files from disk, or make several database interactions, etc. If I want to make this asynchronous, I have to find a way to execute multiple things that could hypothetically block, at the same time. If I can't do that easily, then the asynchronous solution has failed, because its entire purpose is to do everything that I do synchronously, except without blocking the main thread. Here's an example with lots of synchronous requests in Django: def view_paste(request, filekey): try: fileinfo= Pastes.objects.get(key=filekey) except DoesNotExist: t = loader.get_template('pastebin/error.html') return HttpResponse(t.render(Context(dict(error='File does not exist')))) f = open(fileinfo.filename) fcontents = f.read() t = loader.get_template('pastebin/paste.html') return HttpResponse(t.render(Context(dict(file=fcontents)))) How many blocking requests are there? Lots. This is, in a word, a long, complicated chain of synchronous requests. This is also very similar to what actual django code might look like in some circumstances. Even if we might think this is unreasonable, some subset of alteration of this is reasonable. Certainly we should be able to, say, load multiple (!) objects from the database, and open the template (possibly from disk), all potentially-blocking operations. This is inherently a long, complicated chain of requests, whether we implement it asynchronously or synchronously, or use Deferreds or Futures, or write it in Java or Python. Some parts can be done at any time before the end (loader.get_template(...)), some need to be done in a certain order, and there's branching depending on what happens in different cases. In order to even write this code _at all_, we need a way to chain these IO actions together. If we can't chain them together, we can't produce that final synthesis of results at the end. We _need_ a pipeline or something computationally equivalent or more powerful. Results from past "deferred computations" need to be passed forward into future "deferred computations", in order to implement this at all. This is not a style issue, this is an issue of needing to be able to solve problems that involve more than one computation where the results of every computation matters somewhere. It's just that in this case, some of the computations are computed asynchronously.
For that stuff, you'd have to speak to the main authors of Twisted. I'm just a twisted user. :( In the end it really doesn't matter what API you go with. The Twisted people will wrap it up so that they are compatible, as far as that is possible. I hope I haven't detracted too much from the main thrust of the surrounding discussion. Futures/deferreds are a pretty big tangent, so sorry. I justified it to myself by figuring that it'd probably come up anyway, somehow, since these are useful abstractions for asynchronous programming. -- Devin

On Fri, 12 Oct 2012 00:29:05 -0400 Devin Jeanpierre <jeanpierreda@gmail.com> wrote:
Mochikit has been dead for years. As for the others, just because they are called "Deferred" doesn't mean they are the same thing. None of them seems to look like Twisted's Deferred abstraction.
A Deferred can only be called once, but a dataReceived method can be called any number of times. So you can't use a Deferred for dataReceived unless you introduce significant hackery.
Agreed. And that's precisely because your GUI button handler is a dataReceived-alike :-) Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Fri, 12 Oct 2012 09:14:54 +0200 Antoine Pitrou <solipsis@pitrou.net> wrote:
Correction: actually, some of them do :-) I should have looked a bit better. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Fri, Oct 12, 2012 at 3:14 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Mochikit has been dead for years.
Last update to the github repository was a few months ago. That said, looking at their APIs now, I'm pretty sure mochikit was not in that presentation. Its API isn't jQuery-like.
They have separate callbacks for error and success, which are passed values. That is the same. The callback chains are formed from sequences of deferreds. That's different. If a callback returns a deferred, then the rest of the chain is only called once that deferred resolves -- that's the same, and super important. There's some API differences, like .addCallbacks() --> .then(); and .callback() --> .resolve(). And IIRC jQuery had other differences, but maybe it's just that you use .pipe() to chain deferreds because .then() returns a Promise instead of a Deferred? I don't remember what was weird about jQuery, it's been a while since that talk. :(
Haha, oops! I was being dumb and only thinking of minor cases when callbacks are used, rather than major cases. Some people complain that Twisted's protocols (and dataReceived) should be like that GUI button example, though. Not major hackery, just somewhat nasty and bug-prone. -- Devin

I am going to start some new threads on this topic, to avoid going over 100 messages. Topics will be roughly: - reactors - protocol implementations - Twisted (esp. Deferred) - Tornado - yield from vs. Futures It may be a while (hours, not days). -- --Guido van Rossum (python.org/~guido)

On 10/11/2012 5:18 PM, Guido van Rossum wrote:
And of course tk/tkinter (tho perhaps we can represent that). It occurs to me that while i/o (file/socket) events can be added to a user (mouse/key) event loop, and I suspect that some tk/tkinter apps do so, it might be sensible to keep the two separate. A master loop could tell the user-event loop to handle all user events and then the i/o loop to handle one i/o event. This all depends on the relative speed of the handler code. -- Terry Jan Reedy

On Thu, Oct 11, 2012 at 5:29 PM, Terry Reedy <tjreedy@udel.edu> wrote:
You should talk to a Tcl/Tk user (if there are any left :-). They actually really like the unified event loop that's used for both widget events and network events. Tk is probably also a good example of a hybrid GUI system, where some of the callbacks (e.g. redraw events) are implemented in C. -- --Guido van Rossum (python.org/~guido)

On Thu, Oct 11, 2012 at 7:34 PM, Guido van Rossum <guido@python.org> wrote:
Here's the thing: the underlying O.S is always handling two major I/O channels at any given time and it needs all it's attention to do this: the GUI and one of the following (network, file) I/O. You can shuffle these around all you want, but somewhere the O.S. kernel is going to have to be involved, which means either portability is sacrificed or speed if one is going to pursue and abstract, unified async API.
You should talk to a Tcl/Tk user (if there are any left :-).
I used to be one of those :) mark

On Thu, Oct 11, 2012 at 2:18 PM, Guido van Rossum <guido@python.org> wrote:
So are you thinking of something like reactor.add_event_listener(event_type, event_params, func)? One thing to keep in mind is that file descriptors are somewhat special (at least in a level-triggered event loop), because of the way the event will keep firing until the socket buffer is drained or the event is unregistered. I'd be inclined to keep file descriptors in the interface even if they just raise an error on app engine, since they're fairly fundamental to the (unixy) event loop. On the other hand, I don't have any experience with event loops outside the unix/network world so I don't know what other systems might need for their event loops.
That doesn't work very well - while one loop is waiting for its timeout, nothing can happen on the other event loop. You have to switch back and forth frequently to keep things responsive, which is inefficient. I'd rather give each event loop its own thread; you can minimize the thread-synchronization concerns by picking one loop as "primary" and having all the others just pass callbacks over to it when their events fire. -Ben

On Thu, Oct 11, 2012 at 11:18 PM, Guido van Rossum <guido@python.org> wrote:
Correct. If I had to change it today, I'd throw out consumers and producers and just stick to a protocol API. Do you feel that there should be less talk about rationale?
It's not that there's *no* reference to IO: it's just that that reference is abstracted away in data_received and the protocol's transport object, just like Twisted's IProtocol.
Absolutely.
I'm assuming that by previous ideas you mean points 1, 2: protocol interface + reactor interface. I don't see why twisted's IProtocol couldn't grow an adapter for stdlib Protocols. Ditto for Tornado. Similarly, the reactor interface could be *provided* (through a fairly simple translation layer) by different implementations, including twisted.
I think this needs to be addressed in a separate mail, since more stuff has been said about deferreds in this thread.
I certainly agree that this is a very large amount of work. However, it has obvious huge advantages in terms of code reuse. I'm not sure if I understand the technical barrier though. It should be quite easy to create a blocking API with a protocol implementation that doesn't care; just call data_received with all your data at once, and presto! (Since transports in general don't provide guarantees as to how bytes will arrive, existing Twisted IProtocols have to do this already anyway, and that seems to work fine.)
There's at least a few others, but sure, that's an obvious one. Many of the objections I can raise however don't matter if there's already an *existing working solution*. I mean, sure, it can't do SSL, but if you have code that does what you want right now, then obviously SSL isn't actually needed.
Sorry, that was incredibly poor wording. I meant something more of an adapter: an upgrade path for existing asyncore code to new and shiny 3153 code.
I have a hard time understanding how that would work well outside of something like GAE. IIUC, that level of abstraction was chosen because it made sense for GAE (and I don't disagree), but I'm not sure it makes sense here. In this example, where would eg the select/epoll/whatever calls happen? Is it something that calls the reactor that then in turn calls whatever?
As an API, that's pretty close to Twisted's IReactorCore.iterate, I think. It'd work well enough. The issue is only with event loops that don't cooperate so well. Possibly the primitive operation would be something like "block until
-- --Guido van Rossum (python.org/~guido)
-- cheers lvh

Hi! On Sat, Sep 22, 2012 at 06:31:06PM +0200, chrysn <chrysn@fsfe.org> wrote:
It seems you want Twisted, no? Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Sat, Sep 22, 2012 at 08:52:53PM +0400, Oleg Broytman wrote:
if these considerations end in twisted being consecrated as the new asyncore, i'd consider that a valid solution too. then, again, subprocess and the onboard servers should work well with *that* out of the box. best regards chrysn -- To use raw power is to make yourself infinitely vulnerable to greater powers. -- Bene Gesserit axiom

On Sat, Sep 22, 2012 at 08:27:10PM +0200, chrysn <chrysn@fsfe.org> wrote:
If you mean that Twisted will be included in the standard library -- then no, I'm sure it will not. Python comes with batteries included, but Twisted is not a battery, it's rather a power plant. I am sure it will always be developed and distributed separately. And developing asyncore to the level of Twisted would be a duplication of effort.
If you want subprocess and Twisted to work together -- you know where to send patches. PS. In my not so humble opinion what the standard library really lacks in this area is a way to combine a few asynchronous libraries with different mainloops. Think about wxPython+Twisted in one program. But I have no slightest idea how to approach the problem. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Sat, Sep 22, 2012 at 09:50:39PM +0200, Amaury Forgeot d'Arc <amauryfa@gmail.com> wrote:
And wxPython has a meaning to extend its main loop. But these are only partial solutions. There are much more libraries with mainloops. D-Bus/GLib, e.g. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Sat, Sep 22, 2012 at 10:52:10PM +0400, Oleg Broytman wrote:
well, what about python including a battery and a battery plug, then? asyncore could be the battery, and a interface between asynchronous libraries the battery plug. users could start developing with batteries, and when the project grows, just plug it into a power plant. less analogy, more technical: the asyncore dispatcher to main loop interface is pretty thin -- there is a (global or explicitly passed) "map" (a dictionary), mapping file descriptors to objects that can be readable or writable (or acceptable, not sure if that detail is really needed that far down). a dispatcher registers to a map, and then the main loop select()s for events on all files and dispatches them accordingly. it won't be as easy as just taking that interface, eg because it lacks timeouts, but i think it can be the "way to combine a few asynchronous libraries". (to avoid asyncore becoming a powerplant itself, it could choose not to implement some features for simplicity. for example, if asyncore chose to still not implement timeouts, registering timeouts to a asyncore based main loop would just result in a NotImplementedError telling the user to get a more powerful main loop.) i don't want to claim i know how that could work in detail or even if it could work at all, but if this is interesting for enough people that it will be used, i'd like to find out.
no, actually -- for now, it'd be a patch to twisted (who'd reply with "we already have a way of dealing with it"). if asyncore's interface becomes the battery plug, it'd be a patch to subprocess. thanks for sharing your ideas chrysn -- To use raw power is to make yourself infinitely vulnerable to greater powers. -- Bene Gesserit axiom

On Sat, 22 Sep 2012 18:31:06 +0200 chrysn <chrysn@fsfe.org> wrote:
SSL support is also lacking: http://bugs.python.org/issue10084 Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

From a chronological standpoint I still think the best thing to do in order to fix the "python async problem" once and for all is to first define and
I still think this proposal is too vaguely defined and any effort towards adding async IO support to existing batteries is premature for different reasons, first of which the inadequacy of asyncore as the base async framework to fulfill the task you're proposing. asyncore is so old and difficult to fix/enhance without breaking backward compatibility (see for example http://bugs.python.org/issue11273#msg156439) that relying on it for any modern work is inevitably a bad idea. possibly implement an "async WSGI interface" describing what a standard async IO loop/reactor should look like (in terms of API) and how to integrate with it, see: http://mail.python.org/pipermail/python-ideas/2012-May/015223.html http://mail.python.org/pipermail/python-ideas/2012-May/015235.html
In my mind this is the ideal long-term scenario but even managing to define an "async WSGI interface" alone would be a big step forward. Again, at this point in time what you're proposing looks too vague, ambitious and premature to me. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/ 2012/9/22 chrysn <chrysn@fsfe.org>

Temporarily un-lurking to reply to this thread (which I'll actually be reading). Giampaolo and I talked about this for a bit over the weekend, and I have to say that I agree with his perspective. In particular, to get something better than asyncore, there must be something minimally better to build upon. I don't really have an opinion on what that minimally better thing should be named, but I do agree that having a simple reactor API that has predictable behavior over the variety of handlers (select, poll, epoll, kqueue, WSAEvent in Windows, etc.) is necessary. Now, let's get to brass tacks... 1. Whatever reactors are available, you need to be able to instantiate multiple of different types of reactors and multiple instances of the same type of reactor simultaneously (to support multiple threads handling different groups of reactors, or different reactors for different types of objects on certain platforms). While this allows for insanity in the worst-case, we're all consenting adults here, so shouldn't be limited by reactor singletons. There should be a default reactor class, which is defined on module/package import (use the "best" one for the platform). 2. The API must be simple. I am not sure that it can get easier than Idea #3 from: http://mail.python.org/pipermail/python-ideas/2012-May/015245.html I personally like it because it offers a simple upgrade path for asyncore users (create your asyncore-derived classes, pass it into the new reactor), while simultaneously defining a relatively easy API for any 3rd party to integrate with. By offering an easy-to-integrate method for 3rd parties (that is also sane), there is the added bonus that 3rd parties are more likely to integrate, rather than replace, which means more use in the "real world", better bug reports, etc. To simplify integration further, make the API register(fd, handler, events=singleton). Passing no events from the caller means "register me for all events", which will help 3rd parties that aren't great with handling read/write registration. 3. I don't have a 3rd tack, you can hang things on the wall with 2 ;) Regards, - Josiah On Mon, Sep 24, 2012 at 3:31 PM, Giampaolo Rodolà <g.rodola@gmail.com> wrote:

On Mon, Sep 24, 2012 at 03:31:37PM -0700, Giampaolo Rodolà wrote:
i wasn't aware that pep 3153 exists. given that, my original intention of this thread should be re-worded into "let's get pep3153 along!". i'm not convinced by the api suggested in the first mail, as it sounds very unix centric (poll, read/write/error). i rather imagined leaving the details of the callbackable/mainloop interaction to be platform details. (a win32evtlog event source just couldn't possibly register with a select() based main loop). i'd prefer to keep the part that registers with the a main loop concentrated to a very lowlevel common denominator. for unix, that'd mean that there is a basic callbackable for "things that receive events because they have a fileno". everything above that, eg the distinction whether a "w" event means that we can write() or that we must accept() could happen above that and wouldn't have to be concerned with the main loop integration any more. in case (pseudo)code gets the idea over better: class UnixFilehandle(object): def __init__(self, fileno): self._fileno = fileno def register_with_main_loop(self, mainloop): # it might happen that the main loop doesn't support unix # filenos. tough luck, in that case -- the developer should # select a more suitable main loop. mainloop.register_unix_fileno(self._fileno, self) def handle_r_event(self): raise NotImplementedError("Not configured to receive that sort of event") # if you're sure you'd never receive any anyway, you can # not-register them by setting them None in the subclass handle_w_event = handle_e_event = handle_r_event class SocketServer(UnixFilehandle): def __init__(self, socket): self._socket = socket UnixFilehandle.init(socket.fileno()) def handle_w_event(self): self.handle_accept_event(self.socket.accept()) other interfaces parallel to the file handle interface would, for example, handle unix signals. (built atop of that, like the accept-handling socket server, could be an that deals with child processes.) the interface for android might look different again, because there is no main loop and select never gets called by the application.
i'd welcome such an interface. if asyncore can then be retrofitted to accept that interface too w/o breaking compatibility, it'd be nice, but if not, it's asyncore2, then.
Again, at this point in time what you're proposing looks too vague, ambitious and premature to me.
please don't get me wrong -- i'm not proposing anything for immediate action, i just want to start a thinking process towards a better integrated stdlib. On Mon, Sep 24, 2012 at 05:02:08PM -0700, Josiah Carlson wrote:
i think that's already common. with asyncore, you can have different maps (just one is installed globally as default). with the gtk main loop, it's a little tricky (the gtk.main() function doesn't simply take an argument), but the underlying glib can do that afaict.
it's good that the necessities of call_later and call_every are mentioned here, i'd have forgotten about them. we've talked about many things we'd need in a python asynchronous interface (not implementation), so what are the things we *don't* need? (so we won't start building a framework like twisted). i'll start: * high-level protocol handling (can be extra modules atop of it) * ssl * something like the twisted delayed framework (not sure about that, i guess the twisted people will have good reason to use it, but i don't see compelling reasons for such a thing in a minimal interface from my limited pov) * explicit connection handling (retries, timeouts -- would be up to the user as well, eg urllib might want to set up a timeout and retries for asynchronous url requests) best regards chrysn -- To use raw power is to make yourself infinitely vulnerable to greater powers. -- Bene Gesserit axiom

On Wed, Sep 26, 2012 at 1:17 AM, chrysn <chrysn@fsfe.org> wrote:
Go ahead and read PEP 3153, we will wait. A careful reading of PEP 3153 will tell you that the intent is to make a "light" version of Twisted built into Python. There isn't any discussion as to *why* this is a good idea, it just lays out the plan of action. Its ideas were gathered from the experience of the Twisted folks. Their experience is substantial, but in the intervening 1.5+ years since Pycon 2011, only the barest of abstract interfaces has been defined (https://github.com/lvh/async-pep/blob/master/async/abstract.py), and no discussion has taken place as to forward migration of the (fairly large) body of existing asyncore code.
Of course not, but then again no one would attempt to do as much. They would use a WSAEvent reactor, because that's the only thing that it would work with. That said, WSAEvent should arguably be the default on Windows, so this issue shouldn't even come up there. Also, worrying about platform-specific details like "what if someone uses a source that is relatively uncommon on the platform" is a red-herring; get the interface/api right, build it, and start using it. To the point, Giampaolo already has a reactor that implements the interface (more or less "idea #3" from his earlier message), and it's been used in production (under staggering ftp(s) load). Even better, it offers effectively transparent replacement of the existing asyncore loop, and supports existing asyncore-derived classes. It is available: https://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop...
That is, incidentally, what Giampaolo has implemented already. I encourage you to read the source I linked above.
Easily done, because it's already been done ;)
I am curious as to what you mean by "a better integrated stdlib". A new interface that doesn't allow people to easily migrate from an existing (and long-lived, though flawed) standard library is not better integration. Better integration requires allowing previous users to migrate, while encouraging new users to join in with any later development. That's what Giampaolo's suggested interface offers on the lowest level; something to handle file-handle reactors, combined with a scheduler.
Remember that a reactor isn't just a dictionary of file handles to do stuff on, it's the thing that determines what underlying platform mechanics will be used to multiplex across channels. But that level of detail will be generally unused by most people, as most people will only use one at a time. The point of offering multiple reactors is to allow people to be flexible if they choose (or to pick from the different reactors if they know that one is faster for their number of expected handles).
I disagree with the last 3. If you have an IO loop, more often than not you want an opportunity to do something later in the same context. This is commonly the case for bandwidth limiting, connection timeouts, etc., which are otherwise *very* difficult to do at a higher level (which are the reasons why schedulers are built into IO loops). Further, SSL in async can be tricky to get right. Having the 20-line SSL layer as an available class is a good idea, and will save people time by not having them re-invent it (poorly or incorrectly) every time. Regards, - Josiah

On Wed, Sep 26, 2012 at 10:02:24AM -0700, Josiah Carlson wrote:
it doesn't look like twisted-light to me, more like a interface suggestion for a small subset of twisted. in particular, it doesn't talk about main loops / reactors / registration-in-the-first-place. you mention interaction with the twisted people. is there willingness, from the twisted side, to use a standard python middle layer, once it exists and has sufficiently high quality?
i've had a look at it, but honestly can't say more than that it's good to have a well-tested asyncore compatible main loop with scheduling support, and i'll try it out for my own projects.
a new interface won't make integration automatically happen, but it's something the standard library components can evolve on. whether, for example urllib2 will then automatically work asynchronously in that framework or whether we'll wait for urllib3, we'll see when we have it. @migrate from an existing standard library: is there a big user base for the current asyncore framework? my impression from is that it is not very well known among python users, and most that could use it use twisted.
i see; those should be provided, then. i'm afraid i don't completely get the point you're making, sorry for that, maybe i've missed important statements or lack sufficiently deep knowledge of topics affected and got lost in details. what is your opinion on the state of asynchronous operations in python, and what would you like it to be? thanks for staying with this topic chrysn -- To use raw power is to make yourself infinitely vulnerable to greater powers. -- Bene Gesserit axiom

On Wed, Oct 3, 2012 at 7:43 AM, chrysn <chrysn@fsfe.org> wrote:
Things don't "automatically work" without work. You can't just make urllib2 work asynchronously unless you do the sorts of greenlet-style stack switching that lies to you about what is going on, or unless you redesign it from scratch to do such. That's not to say that greenlets are bad, they are great. But expecting that a standard library implementing an updated async spec will all of a sudden hook itself into a synchronous socket client? I think that expectation is unreasonable.
"Well known" is an interesting claim. I believe it actually known of by quite a large part of the community, but due to a (perhaps deserved) reputation (that may or may not still be the case), isn't used as often as Twisted. But along those lines, there are a few questions that should be asked: 1. Is it desirable to offer users the chance to transition from asyncore-derived stuff to some new thing? 2. If so, what is necessary for an upgrade/replacement for asyncore/asynchat in the long term? 3. Would 3rd parties use this as a basis for their libraries? 4. What are the short, mid, and long-term goals? For my answers: 1. I think it is important to offer people who are using a standard library module to continue using a standard library module if possible. 2. A transition should offer either an adapter or similar-enough API equivalency between the old and new. 3. I think that if it offers a reasonable API, good functionality, and examples are provided - both as part of the stdlib and outside the stdlib, people will see the advantages of maintaining less of their own custom code. To the point: would Twisted use *whatever* was in the stdlib? I don't know the answer, but unless the API is effectively identical to Twisted, that transition may be delayed significantly. 4. Short: get current asyncore people transitioned to something demonstrably better, that 3rd parties might also use. Mid: pull parsers/logic out of cores of methods and make them available for sync/async/3rd party parsing/protocol handling (get the best protocol parsers into the stdlib, separated from the transport). Long: everyone contributes/updates the stdlib modules because it has the best parsers for protocols/formats, that can be used from *anywhere* (sync or async). My long-term dream (which has been the case for 6+ years, since I proposed doing it myself on the python-dev mailing list and was told "no") is that whether someone uses urllib2, httplib2, smtpd, requests, ftplib, etc., they all have access to high-quality protocol-level protocol parsers. So that once one person writes the bit that handles http 30X redirects, everyone can use it. So that when one person writes the gzip + chunked transfer encoding/decoding, everyone can use it.
I think it is functional, but flawed. I also think that every 3rd party that does network-level protocols are different mixes of functional and flawed. I think that there is a repeated and often-times wasted effort where folks are writing different and invariably crappy (to some extent) protocol parsers and network handlers. I think that whenever possible, that should stop, and the highest-quality protocol parsing functions/methods should be available in the Python standard library, available to be called from any library, whether sync, async, stdlib, or 3rd party. Now, my discussions in the context of asyncore-related upgrades may seem like a strange leap, but some of these lesser-quality parsing routines exist in asyncore-derived classes, as well as non-asyncore-derived classes. But if we make an effort on the asyncore side of things, under the auspices of improving one stdlib module, offering additional functionality, the obviousness of needing protocol-level parsers shared among sync/async should become obvious to *everyone* (that it isn't now the case I suspect is because the communities either don't spend a lot of time cross-pollinating, people like writing parsers - I do too ;) - or the sync folks end up going the greenlet route if/when threading bites them on the ass). Regards, - Josiah

On Fri, 5 Oct 2012 11:51:21 -0700 Josiah Carlson <josiah.carlson@gmail.com> wrote:
I'm not sure what you're talking about: what were you told "no" about, specifically? Your proposal sounds reasonable and (ideally) desirable to me. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Fri, Oct 5, 2012 at 1:09 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I've managed to find the email where I half-way proposed it (though not as pointed as what I posted above): http://mail.python.org/pipermail/python-dev/2004-November/049827.html Phillip J. Eby said in a reply that policy would kill it. My experience at the time told me that policy was a tough nut to crack, and my 24-year old self wasn't confident enough to keep pushing (even though I had the time). Now, my 32-year old self has the confidence and the knowledge to do it (or advise how to do it), but not the time (I'm finishing up my first book, doing a conference tour, running a startup, and preparing for my first child). One of the big reasons why I like and am pushing Giampaolo's ideas (and existing code) is my faith that he *can* and *will* do it, if he says he will. Regards, - Josiah

This is an incredibly important discussion. I would like to contribute despite my limited experience with the various popular options. My own async explorations are limited to the constraints of the App Engine runtime environment, where a rather unique type of reactor is required. I am developing some ideas around separating reactors, futures, and yield-based coroutines, but they take more thinking and probably some experimental coding before I'm ready to write it up in any detail. For a hint on what I'm after, you might read up on monocle (https://github.com/saucelabs/monocle) and my approach to building coroutines on top of Futures (http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/tasklets...). In the mean time I'd like to bring up a few higher-order issues: (1) How importance is it to offer a compatibility path for asyncore? I would have thought that offering an integration path forward for Twisted and Tornado would be more important. (2) We're at a fork in the road here. On the one hand, we could choose to deeply integrate greenlets/gevents into the standard library. (It's not monkey-patching if it's integrated, after all. :-) I'm not sure how this would work for other implementations than CPython, or even how to address CPython on non-x86 architectures. But users seem to like the programming model: write synchronous code, get async operation for free. It's easy to write protocol parsers that way. On the other hand, we could reject this approach: the integration would never be completely smooth, there's the issue of other implementations and architectures, it probably would never work smoothly even for CPython/x86 when 3rd party extension modules are involved. Callback-based APIs don't have these downsides, but they are harder to program; however we can make programming them easier by using yield-based coroutines. Even Twisted offers those (inline callbacks). Before I invest much more time in these ideas I'd like to at least have (2) sorted out. -- --Guido van Rossum (python.org/~guido)

On Sat, 6 Oct 2012 15:00:54 -0700 Guido van Rossum <guido@python.org> wrote:
greenlets/gevents only get you half the advantages of single-threaded "async" programming: they get you scalability in the face of a high number of concurrent connections, but they don't get you the robustness of cooperative multithreading (because it's not obvious when reading the code where the possible thread-switching points are). (I don't actually understand the attraction of gevent, except for extreme situations; threads should be cheap on a decent OS) Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sat, Oct 6, 2012 at 3:24 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I used to think that too, long ago, until I discovered that as you add abstraction layers, cooperative multithreading is untenable -- sooner or later you will lose track of where the threads are switched.
(I don't actually understand the attraction of gevent, except for extreme situations; threads should be cheap on a decent OS)
I think it's the observation that the number of sockets you can realistically have open in a single process or machine is always 1-2 orders of maginuted larger than the number of threads you can have -- and this makes sense since the total amount of memory (kernel and user) to represent a socket is just much smaller than needed for a thread. Just check the configuration limits of your typical Linux kernel if you don't believe me. :-) -- --Guido van Rossum (python.org/~guido)

On Sat, 6 Oct 2012 17:23:48 -0700 Guido van Rossum <guido@python.org> wrote:
Even with an explicit notation like "yield" / "yield from"? Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sun, Oct 7, 2012 at 3:09 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
If you strictly adhere to using those you should be safe (though distinguishing between the two may prove challenging) -- but in practice it's hard to get everyone and every API to use this style. So you'll have some blocking API calls hidden deep inside what looks like a perfectly innocent call to some helper function. IIUC in Go this is solved by mixing threads and lighter-weight constructs (say, greenlets) -- if a greenlet gets blocked for I/O, the rest of the system continues to make progress by spawning another thread. My own experience with NDB is that it's just too hard to make everyone use the async APIs all the time -- so I gave up and made async APIs an optional feature, offering a blocking and an async version of every API. I didn't start out that way, but once I started writing documentation aimed at unsophisticated users, I realized that it was just too much of an uphill battle to bother. So I think it's better to accept this and deal with it, possibly adding locking primitives into the mix that work well with the rest of the framework. Building a lock out of a tasklet-based (i.e. non-threading) Future class is easy enough. -- --Guido van Rossum (python.org/~guido)

Hi Guido and folks, On 07.10.12 17:04, Guido van Rossum wrote:
I'm digging in, a bit late. Still trying to read the myriad of messages. For now just a word: Guido: How much I would love to use your time machine and invite you to discuss Pythons future in 1998. Then we would have tossed greenlet/stackless and all that crap. Entering a different context could have been folded deeply into Python, by making it able to pickle program state in certain positions. Just dreaming out loud :-) It is great that this discussion is taking place, and I'll try to help. cheers - Chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

+1000 Can we dream with gevent integrated to standard cpython ? This would be a fantastic path for 3.4 :) And I definitely should move to 3.x. Because for web programming, I just can't think another way to program using python. I'm seeing some people going to other languages where async is more easy like Go (some are trying Erlang). Async is a MUST HAVE for web programming these days... In my experience, I've found that "robustness of cooperative multithreading" come at the price of a code difficult to maintain. And, in single threading it never reach the SMP benefits with easy. Thats why erlang shines... it abstracts the hard work of to maintain the switching under control. Gevent walks the same line: makes the programmer life easier. -- Carlo Pires 2012/10/6 Guido van Rossum <guido@python.org>

On Sat, Oct 6, 2012 at 3:00 PM, Guido van Rossum <guido@python.org> wrote:
Yield-based coroutines like monocle are the simplest way to do multi-paradigm in the same code. Whether you have a async-style reactor, greenlet-style stack switching, cooperatively scheduled generator trampolines, or just plain blocking threaded sockets; that style works with all of them (the futures and wrapper around everything just looks a little different). That said, it forces everyone to drink the same coroutine-styled kool-aid. That doesn't bother me. But I understand it, and have built similar systems before. I don't have an intuition about whether 3rd parties will like it or will migrate to it. Someone want to ping the Twisted and Tornado folks about it?
Combining your responses to #1 and now this, are you proposing a path forward for Twisted/Tornado to be greenlets? That's an interesting approach to the problem, though I can see the draw. ;) I have been hesitant on the Twisted side of things for an arbitrarily selfish reason. After 2-3 hours of reading over a codebase (which I've done 5 or 6 times in the last 8 years), I ask myself whether I believe I understand 80+% of how things work; how data flows, how callbacks/layers are invoked, and whether I could add a piece of arbitrary functionality to one layer or another (or to determine the proper layer in which to add the functionality). If my answer is "no", then my gut says "this is probably a bad idea". But if I start figuring out the layers before I've finished my 2-3 hours, and I start finding bugs? Well, then I think it's a much better idea, even if the implementation is buggy. Maybe something like Monocle would be better (considering your favor for that style, it obviously has a leg-up on the competition). I don't know. But if something like Monocle can merge it all together, then maybe I'd be happy. Incidentally, I can think of a few different styles of wrappers that would actually let people using asyncore-derived stuff use something like Monocle. So maybe that's really the right answer? Regards, - Josiah P.S. Thank you for weighing in on this Guido. Even if it doesn't end up the way I had originally hoped, at least now there's discussion.

On Sat, Oct 6, 2012 at 7:22 PM, Josiah Carlson <josiah.carlson@gmail.com> wrote:
Glad I'm not completely crazy here. :-)
They should be reading this. Or maybe we should bring it up on python-dev before too long.
Can't tell whether you're serious, but that's not what I meant. Surely it will never fly for Twisted. Tornado apparently already works with greenlets (though maybe through a third party hack). But personally I'd be leaning towards rejecting greenlets, for the same reasons I've kept the doors tightly shut for Stackless -- I like it fine as a library, but not as a language feature, because I don't see how it can be supported on all platforms where Python must be supported. However I figured that if we define the interfaces well enough, it might be possible to use (a superficially modified version of) Twisted's reactors instead of the standard ones, and, orthogonally, Twisted's deferred's could be wrapped in the standard Futures (or the other way around?) when used with a non-Twisted reactor. Which would hopefully open the door for migrating some of their more useful protocol parsers into the stdlib.
Can't figure what you're implying here. On which side does Twisted fall for you?
My worry is that monocle is too simple and does not cater for advanced needs. It doesn't seem to have caught on much outside the company where it originated.
I still don't really think asyncore is going to be a problem. It can easily be separated into a reactor and callbacks.
Hm, there seemed to be plenty of discussion before... -- --Guido van Rossum (python.org/~guido)

On Sun, Oct 7, 2012 at 12:05 AM, Guido van Rossum <guido@python.org> wrote:
I thought futures were meant for thread and process pools? The blocking methods make them a bad fit for an asynchronous networking toolset. The Twisted folks have discussed integrating futures and Twisted (see also the reply, which has some corrections): http://twistedmatrix.com/pipermail/twisted-python/2011-January/023296.html -- Devin

On Saturday, October 6, 2012, Devin Jeanpierre wrote:
The specific Future implementation in the py3k stdlib uses threads and is indeed meant for thread and process pools. But the *concept* of futures works fine in event-based systems, see the link I posted into the NDB sources. I'm not keen on cancellation and threadpools FWIW.
-- --Guido van Rossum (python.org/~guido)

On Oct 06, 2012, at 03:00 PM, Guido van Rossum wrote:
This is an incredibly important discussion.
Indeed. If Python gets it right, it could be yet another killer reason for upgrading to Python 3, at least for the growing subset of event-driven applications.
(1) How importance is it to offer a compatibility path for asyncore?
I've written and continue to use async-based code. I don't personally care much about compatibility. I've use async because it was the simplest and most stdlibby of the options for the Python versions I can use, but I have no love for it. If there were a better, more readable and comprehensible way to do it, I'd ditch the async-based versions as soon as possible.
I would have thought that offering an integration path forward for Twisted and Tornado would be more important.
Agreed. I share the same dream as someone else in this thread mentioned. It would be really fantastic if the experts in a particular protocol could write support for that protocol Just Once and have it as widely shared as possible. Maybe this is an unrealistic dream, but now's the time to have them anyway. Even something like the email package could benefit from this. The FeedParser is our attempt to support asynchronous reading of email data for parsing. I'm not so sure that the asynchronous part of that is very useful. -Barry

I'll have to put in my ..02€ here … Guido van Rossum <guido@...> writes:
(2) We're at a fork in the road here. On the one hand, we could choose to deeply integrate greenlets/gevents into the standard library.
Yes. I have two and a half reasons for this. (½) Ultimately I think that switching stacks around is always going to be faster than unwinding and re-winding things with yield(). (1) It's a whole lot easier to debug a problem with gevent than with anything which uses yield / Deferreds / asyncore / whatever. With gevent, you get a standard stack trace. With anything else, the "where did this call come from" information is not part of the call chain and thus is either unavailable, or will have to be carried around preemptively (with associated overhead). (2) Nothing against Twisted or any other async frameworks, but writing any nontrivial program in it requires warping my brain into something that's *not* second nature in Python, and never going to be. Python is not Javascript; if you want to use the "loads of callbacks" programming style, use node.js. Personal experience: I have written an interpreter for an asynchronous and vaguely Pythonic language which I use for home automation, my lawn sprinkers, and related stuff (which I should probably release in some form). The code was previously based on Twisted and was impossible to debug. It now uses gevent and Just Works. -- -- Matthias Urlichs

Ok I'll add a buck... On 16.10.12 20:40, Matthias Urlichs wrote:
If you are emulating things in Python, that may be true. Also if you are really only switching stacks, that may be true. But both assumptions do not fit, see below.
I'm absolutely your's on ease of coding straight forward. But this new, efficient "yield from" is a big step into that direction, see Greg's reply.
Same here.
You are using gevent, which uses greenlet! That means no pure stack switching, but the stack is sliced and moved onto the heap. But that technique (originally from Stackless 2.0) is known to be 5-10 times slower, compared to a cooperative context switching that is built into the interpreter. This story is by far not over. Even PyPy with all its advanced technology still depends on stack slicing when it emulates concurrency. Python 3.3 has done a huge move, because this efficient nesting of generators can deeply influence how people are coding, maybe with the effect that stack tricks loose more of their importance. I expect more like this to come. Greenlets are great. Stack inversion is faster. -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/

Do you use gevent's monkeypatch-the-stdlib feature? On Tue, Oct 16, 2012 at 8:40 PM, Matthias Urlichs <matthias@urlichs.de>wrote:
That seems like something that can be factually proven or counterproven.
gevent uses stack slicing, which IIUC is pretty expensive. Why is it not subject to the performance overhead you mention? Can you give an example of such a crappy stack trace in twisted? I develop in it all day, and get pretty decent stack traces. The closest thing I have to a crappy stack trace is when doing functional tests with an RPC API -- obviously on the client side all I'm going to see is a fairly crappy just-an-exception. That's okay, I also get the server side exception that looks like a plain old Python traceback to me and tells me exactly where the problem is from.
Which ones are you thinking about other than twisted? It seems that the issue you are describing is one of semantics, not so much of whether or not it actually does things asynchronously under the hood, as e.g gevent does too.
Python is not Javascript; if you want to use the "loads of callbacks" programming style, use node.js.
None of the solutions on the table have node.js-style "loads of callbacks". Everything has some way of structuring them. It's either implicit switches (as in "can happen in the caller"), explicit switches (as in yield/yield from) or something like deferreds, some options having both of the latter.
If you have undebuggable code samples from that I'd love to take a look.
-- cheers lvh

Oh my me. This is a very long thread that I probably should have replied to a long time ago. This thread is intensely long right now, and tonight is the first chance I've had to try and go through it comprehensively. I'll try to reply to individual points made in the thread -- if I missed yours, please don't be offended, I promise it's my fault :) FYI, I'm the sucker who originally got tricked into starting PEP 3153, aka async-pep. First of all, I'm glad to see that there's some more "let's get that pep along" movement. I tabled it because: a) I didn't have enough time to contribute, b) a lot of promised contributions ended up not happening when it came down to it, which was incredibly demotivating. The combination of this thread, plus the fact that I was strong armed at Pycon ZA by a bunch of community members that shall not be named (Alex, Armin, Maciej, Larry ;-)) into exploring this thing again. First of all, I don't feel async-pep is an attempt at twisted light in the stdlib. Other than separation of transport and protocol, there's not really much there that even smells of twisted (especially since right now I'd probably throw consumers/producers out) -- and that separation is simply good practice. Twisted does the same thing, but it didn't invent it. Furthermore, the advantages seem clear: reusability and testability are more than enough for me. If there's one take away idea from async-pep, it's reusable protocols. The PEP should probably be a number of PEPs. At first sight, it seems that this number is at least four: 1. Protocol and transport abstractions, making no mention of asynchronous IO (this is what I want 3153 to be, because it's small, manageable, and virtually everyone appears to agree it's a fantastic idea) 2. A base reactor interface 3. A way of structuring callbacks: probably deferreds with a built-in inlineCallbacks for people who want to write synchronous-looking code with explicit yields for asynchronous procedures 4+ adapting the stdlib tools to using these new things Re: forward path for existing asyncore code. I don't remember this being raised as an issue. If anything, it was mentioned in passing, and I think the answer to it was something to the tune of "asyncore's API is broken, fixing it is more important than backwards compat". Essentially I agree with Guido that the important part is an upgrade path to a good third-party library, which is the part about asyncore that REALLY sucks right now. Regardless, an API upgrade is probably a good idea. I'm not sure if it should go in the first PEP: given the separation I've outlined above (which may be too spread out...), there's no obvious place to put it besides it being a new PEP. Re base reactor interface: drawing maximally from the lessons learned in twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later, etc), asynchronous-looking name lookup, fd handling are the important parts. call_every can be implemented in terms of call_later on a separate object, so I think it should be (eg twisted.internet.task.LoopingCall). One thing that is apparently forgotten about is event loop integration. The prime way of having two event loops cooperate is *NOT* "run both in parallel", it's "have one call the other". Even though not all loops support this, I think it's important to get this as part of the interface (raise an exception for all I care if it doesn't work). cheers lvh

On Tue, Oct 9, 2012 at 11:00 AM, Laurens Van Houtven <_@lvh.cc> wrote:
No problem, I'm running behind myself...
FYI, I'm the sucker who originally got tricked into starting PEP 3153, aka async-pep.
I suppose that's your pet name for it. :-) For most everyone else it's PEP 3153.
Is there a newer version that what's on http://www.python.org/dev/peps/pep-3153/ ? It seems to be missing any specific proposals, after spending a lot of time giving a rationale and defining some terms. The version on https://github.com/lvh/async-pep doesn't seem to be any more complete.
But the devil is in the details. *What* specifically are you proposing? How would you write a protocol handler/parser without any reference to I/O? Most protocols are two-way streets -- you read some stuff, and you write some stuff, then you read some more. (HTTP may be the exception here, if you don't keep the connection open.)
2. A base reactor interface
I agree that this should be a separate PEP. But I do think that in practice there will be dependencies between the different PEPs you are proposing.
Your previous two ideas sound like you're not tied to backward compatibility with Tornado and/or Twisted (not even via an adaptation layer). Given that we're talking Python 3.4 here that's fine with me (though I think we should be careful to offer a path forward for those packages and their users, even if it means making changes to the libraries). But Twisted Deferred is pretty arcane, and I would much rather not use it as the basis of a forward-looking design. I'd much rather see what we can mooch off PEP 3148 (Futures).
4+ adapting the stdlib tools to using these new things
We at least need to have an idea for how this could be done. We're talking serious rewrites of many of our most fundamental existing synchronous protocol libraries (e.g. httplib, email, possibly even io.TextWrapper), most of which have had only scant updates even through the Python 3 transition apart from complications to deal with the bytes/str dichotomy.
I have the feeling that the main reason asyncore sucks is that it requires you to subclass its Dispatcher class, which has a rather treacherous interface.
Aren't all your proposals API upgrades?
That actually sounds more concrete than I'd like a reactor interface to be. In the App Engine world, there is a definite need for a reactor, but it cannot talk about file descriptors at all -- all I/O is defined in terms of RPC operations which have their own (several layers of) async management but still need to be plugged in to user code that might want to benefit from other reactor functionality such as scheduling and placing a call at a certain moment in the future.
This is definitely one of the things we ought to get right. My own thoughts are slightly (perhaps only cosmetically) different again: ideally each event loop would have a primitive operation to tell it to run for a little while, and then some other code could tie several event loops together. Possibly the primitive operation would be something like "block until either you've got one event ready, or until a certain time (possibly 0) has passed without any events, and then give us the events that are ready and a lower bound for when you might have more work to do" -- or maybe instead of returning the event(s) it could just call the associated callback (it might have to if it is part of a GUI library that has callbacks written in C/C++ for certain events like screen refreshes). Anyway, it would be good to have input from representatives from Wx, Qt, Twisted and Tornado to ensure that the *functionality* required is all there (never mind the exact signatures of the APIs needed to provide all that functionality). -- --Guido van Rossum (python.org/~guido)
participants (19)
-
Amaury Forgeot d'Arc
-
Antoine Pitrou
-
Barry Warsaw
-
Ben Darnell
-
Carlo Pires
-
Christian Tismer
-
chrysn
-
Devin Jeanpierre
-
Duncan McGreggor
-
Giampaolo Rodolà
-
Greg Ewing
-
Guido van Rossum
-
Josiah Carlson
-
Laurens Van Houtven
-
Mark Adam
-
Massimo DiPierro
-
Matthias Urlichs
-
Oleg Broytman
-
Terry Reedy