Re: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object

I think that Glyph hit the nail on the head when he said that "you can go from any arbitrary Future to a full-featured Deferred, but not the other way around." This is exactly my concern, and the reason why I think it's important for Python to standardize on an async result type that is sufficiently general that it can accommodate the different kinds of async semantics in common use in the Python world today. If you don't think this is a problem, just Google for "twisted vs. tornado". While the debate is sometimes passionate and rude, it points to the fragmentation that has occured in the Python async space due to the lack of direction from the standard library. And there's a real cost to this fragmentation -- it's not easy to build an application that uses different async frameworks when there's no standardized result object or reactor model. My concern is that PEP 3148 was really designed for the purpose of thread and process pooling, and that the Future object is designed with the minimum functionality required to achieve this end. The problem is that the Future object starts to look like a stripped-down version of a Twisted Deferred. And that begs the question of why are we standardizing on the special case and not the general case? Wouldn't it be better to break this into two problems: * Develop a full-featured standard async result type and reactor model to facilitate interoperability of different async libraries. This would consist of a standard async result type and an abstract base class for a reactor model. * Let PEP 3148 focus on the problem of thread and process pooling and leverage on the above async result type. The semantics that a general async type should support include: 1. Semantics that allow you to define a callback channel for results and and optionally a separate channel for exceptions as well. 2. Semantics that offer the flexibility of working with async results at the callback level or at the generator level (having a separate channel for exceptions makes it easy for the generator decorator implementation (that facilitates "yield function_returning_async_object()") to dispatch exceptions into the caller). 3. Semantics that can easily be used to pass results and exceptions back from thread or process pools. 4. Semantics that allow for aggregate processing of parallel asynchronous results, such as "fire async result when all of the async results in an async set have fired" or "fire async result when the first result from an async set has fired." Deferreds presently support all of the above. My point here is not so much that Deferreds should be the standard, but that whatever standard is chosen, that the semantics be general enough that different async Python libraries/platforms can interoperate. James

On Mon, Sep 20, 2010 at 2:41 PM, James Yonan <james@openvpn.net> wrote:
Where by "go from X to Y" you mean "take a program written using X and change it to use Y", right?
I think I get your gist. Unfortunately there's only a small number of people who know enough about async semantics in order to write the PEP that is needed.
If you don't think this is a problem, just Google for "twisted vs. tornado". While the debate is sometimes passionate and rude,
Is it ever distanced and polite? :-)
But, circularly, the lack of direction from the standard library is that nobody has contributed an async framework to the standard library since asyncore was added in, oh, 1999.
Because we could reach agreement fairly quickly on PEP 3148. There are some core contributors who know threads and processes inside out, and after several rounds of comments (a lot, really) they were satisfied. At this point it is probably best to forget about PEP 3148 if you want to improve the async situation in the stdlib, and start thinking about that async PEP instead.
Unless you want to propose to include Twisted into the stdlib, this is not going to be ready for inclusion into Python 3.2.
* Let PEP 3148 focus on the problem of thread and process pooling and leverage on the above async result type.
But PEP 3148 *is* ready for inclusion in Python 3.2. So you've got the ordering wrong. It doesn't make sense to hold up PEP 3148, waiting for the perfect solution to appear. In fact, the changes that were made to PEP 3148 at Glyph's suggestion are probably all you are going to get regarding PEP 3148.
Do you want to champion a PEP? I hope you do -- it will be a long march but rewarding, especially if you get the Tornado folks to participate and contribute. -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote: [...]
Unless you want to propose to include Twisted into the stdlib, this is not going to be ready for inclusion into Python 3.2.
I don't think anyone has suggested "include Twisted". What is being suggested is "include twisted.internet.defer, or something about as useful." Let's consider just how hard it would be to just adding twisted/internet/defer.py to the stdlib (possibly as 'deferred.py'). It's already almost a standalone module, especially if pared back to just the Deferred class and maybe one or two of the most useful helpers (e.g. gatherResults, to take a list of Deferreds and turn them into a single Deferred that fires when they have all fired). The two most problematic dependencies would be: 1) twisted.python.log, which for these purposes could be replaced with a call to a user-replaceable hook whenever an unhandled error occurs (similiar to sys.excepthook). 2) twisted.python.failure... this one is harder. As glyph said, it provides "an object that represent[s] an exception as raised at a particular point, associated with a particular stack". But also, as he said, it's a mess and could use a clean up. Cleaning it up or thinking of a simpler replacement is not insurmountable, but probably too ambitious for Python 3.2's schedule. My point is that adding the Deferred abstraction to the stdlib is a *much* smaller and more reasonable proposition than "include Twisted." -Andrew.

On Tue, Sep 21, 2010 at 1:39 AM, Andrew Bennetts <andrew@bemusement.org> wrote:
No on was seriously proposing including twisted wholesale. There has been discussion, off and on *for years* about doing including a stripped down deferred object; and yet no one has stepped up to *do it*, so it might be hilariously easy, it might be a 40 line module, but it doesn't matter if no one steps up to do the pep, and commit the code, and commit to maintaining it. jesse

On Tue, Sep 21, 2010 at 11:25 PM, Jesse Noller <jnoller@gmail.com> wrote:
Indeed. Thread and process pools had similarly been talked about for quite some time before Brian stepped up to actually do the work of writing and championing PEP 3148. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 20Sep2010 15:41, James Yonan <james@openvpn.net> wrote: [...] | * Develop a full-featured standard async result type and reactor | model to facilitate interoperability of different async libraries. | This would consist of a standard async result type and an abstract | base class for a reactor model. | | * Let PEP 3148 focus on the problem of thread and process pooling | and leverage on the above async result type. | | The semantics that a general async type should support include: | | 1. Semantics that allow you to define a callback channel for results | and and optionally a separate channel for exceptions as well. | | 2. Semantics that offer the flexibility of working with async | results at the callback level or at the generator level (having a | separate channel for exceptions makes it easy for the generator | decorator implementation (that facilitates "yield | function_returning_async_object()") to dispatch exceptions into the | caller). | | 3. Semantics that can easily be used to pass results and exceptions | back from thread or process pools. [...] Just to address this particular aspect (return types and notification), I have my own futures-like module, where the equivalent of a Future is called a LateFunction. There are only 3 basic types of return in my model: there's a .report() method in the main (Executor equivalent) class that yields LateFunctions as they complete. A LateFunction has two basic get-the result methods. Having made a LateFunction: LF = Later.defer(func) You can either go: result = LF() This waits for func's ompletion and returns func's return value. If func raises an exception, this raises that exception. Or you can go: result, exc_info = LF.wait() which returns: result, None if func completed without exception and None, exc_info if an exception was raised, where exc_info is a 3-tuple as from sys.exc_info(). At any rate, when looking for completion you can either get LateFunctions as they complete via .report(), or function results plain (that may raise exceptions) or function (results xor exceptions). This makes implementing the separate streams (results vs exceptions) models trivial if it is desired while keeping the LateFunction interface simple (few interface methods). Yes, I know there's no timeout stuff in there :-( Cheers, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ By God, Mr. Chairman, at this moment I stand astonished at my own moderation! - Baron Robert Clive of Plassey

I'm not an expert on this subject by any stretch, but have been following the discussion with interest. One of the more interesting ideas out of Microsoft in the last few years is their Reactive Framework (http://msdn.microsoft.com/en-us/devlabs/ee794896.aspx), which implements IObserver and IObservable as the dual to IEnumerator and IEnumerable. This makes operators on events just as composable as operators on enumerables. It also comes after several other attempts to formalize a standard async programming pattern. The ideas seam pretty generic, since they've released a javascript version of the approach as well. The basic interface is very simple, consisting of a subscribe method on IObservable and on_next, on_completed, and on_error methods for IObserver. The power comes from the extension methods, similar to itertools, defined in the Observable class (http://bit.ly/acBhbP). These methods provide a huge range of composable functionality. For instance, using a chaining style, consider a async webclient module that takes a bunch of urls: responses = webclient.get(['http://www1.cnn.com', 'http://www2.cnn.com']) responses.filter(lambda x: x.status == 200).first().do(lambda x: print(x.body)) The filter is nonblocking and returns another observable. The first() blocks and returns after the first document is received. The do calls a method. Multiple async streams can be composed together in all sorts of ways. For instance, http = webclient.get(['http://www.cnn.com', 'http://www.nyt.com']) https = webclient.get(['https://www.cnn.com', 'https://www.nyt.com']) http.zip(https).filter(lambda x, y: x.status == 200 and y.status == 200).start(lambda x, y: slow_save(x, y)) This never blocks. It downloads both the https and http versions of web pages, zips them into a new observable, filters sites with both http and https, and then saves asynchronously the remaining sites. I personally find this easy to reason about, and much easier than manually specifying a callback chain. Errors and completed events propagate through these chains intuitively. "Marble diagrams" help with intuition here (http://bit.ly/cl7Oad). All you need to do is implement the observable interface and you get all the composibility for free. Or you can just use any number of simple methods to convert things to observables (http://bit.ly/7VMnKv), such as observable.start(lambda: print("hi")). Or use decorators. If the observable interface became standard, all future async libraries would be composable, and their would also be a growing collection of observabletools. As somebody who is new to async programming, I quite quickly grasped this reactive approach even though I was otherwise completely unfamiliar with C#. While it may be due to my lack of experience, I still get confused when thinking about callback chains and error channels. For instance, I have no idea how to zip an async http call and a mongodb call into a simple observable that returns a tuple when both respond and then alerts the user. This would be as simple as webclient.get().zip(mongodb.get()).start(flash_completed_message) or maybe it's more pythonic to write obstools.start(obstools.zip(mongodb.get(), webclient.get), flash_completed_message) although I've never like this inside out style. But perhaps I missed the point of this thread? Tristan On Wed, Sep 22, 2010 at 6:31 PM, Cameron Simpson <cs@zip.com.au> wrote:

I should note that it should be possible to convert the twisted, twisted, eventlet, monocle, and other existing async libraries to observables pretty easily. The Javascript Rx library, for instance, already wraps the events from dojo, extjs, google maps, jquery, google translate, microsoft translate, mootools, prototype, raphael, virtualearth, and yui3, and keeps adding others to enable composability between different event driven widgets/frameworks. Tristan On Thu, Sep 23, 2010 at 12:41 AM, Tristan Zajonc <tristanz@gmail.com> wrote:

On Mon, Sep 20, 2010 at 2:41 PM, James Yonan <james@openvpn.net> wrote:
Where by "go from X to Y" you mean "take a program written using X and change it to use Y", right?
I think I get your gist. Unfortunately there's only a small number of people who know enough about async semantics in order to write the PEP that is needed.
If you don't think this is a problem, just Google for "twisted vs. tornado". While the debate is sometimes passionate and rude,
Is it ever distanced and polite? :-)
But, circularly, the lack of direction from the standard library is that nobody has contributed an async framework to the standard library since asyncore was added in, oh, 1999.
Because we could reach agreement fairly quickly on PEP 3148. There are some core contributors who know threads and processes inside out, and after several rounds of comments (a lot, really) they were satisfied. At this point it is probably best to forget about PEP 3148 if you want to improve the async situation in the stdlib, and start thinking about that async PEP instead.
Unless you want to propose to include Twisted into the stdlib, this is not going to be ready for inclusion into Python 3.2.
* Let PEP 3148 focus on the problem of thread and process pooling and leverage on the above async result type.
But PEP 3148 *is* ready for inclusion in Python 3.2. So you've got the ordering wrong. It doesn't make sense to hold up PEP 3148, waiting for the perfect solution to appear. In fact, the changes that were made to PEP 3148 at Glyph's suggestion are probably all you are going to get regarding PEP 3148.
Do you want to champion a PEP? I hope you do -- it will be a long march but rewarding, especially if you get the Tornado folks to participate and contribute. -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote: [...]
Unless you want to propose to include Twisted into the stdlib, this is not going to be ready for inclusion into Python 3.2.
I don't think anyone has suggested "include Twisted". What is being suggested is "include twisted.internet.defer, or something about as useful." Let's consider just how hard it would be to just adding twisted/internet/defer.py to the stdlib (possibly as 'deferred.py'). It's already almost a standalone module, especially if pared back to just the Deferred class and maybe one or two of the most useful helpers (e.g. gatherResults, to take a list of Deferreds and turn them into a single Deferred that fires when they have all fired). The two most problematic dependencies would be: 1) twisted.python.log, which for these purposes could be replaced with a call to a user-replaceable hook whenever an unhandled error occurs (similiar to sys.excepthook). 2) twisted.python.failure... this one is harder. As glyph said, it provides "an object that represent[s] an exception as raised at a particular point, associated with a particular stack". But also, as he said, it's a mess and could use a clean up. Cleaning it up or thinking of a simpler replacement is not insurmountable, but probably too ambitious for Python 3.2's schedule. My point is that adding the Deferred abstraction to the stdlib is a *much* smaller and more reasonable proposition than "include Twisted." -Andrew.

On Tue, Sep 21, 2010 at 1:39 AM, Andrew Bennetts <andrew@bemusement.org> wrote:
No on was seriously proposing including twisted wholesale. There has been discussion, off and on *for years* about doing including a stripped down deferred object; and yet no one has stepped up to *do it*, so it might be hilariously easy, it might be a 40 line module, but it doesn't matter if no one steps up to do the pep, and commit the code, and commit to maintaining it. jesse

On Tue, Sep 21, 2010 at 11:25 PM, Jesse Noller <jnoller@gmail.com> wrote:
Indeed. Thread and process pools had similarly been talked about for quite some time before Brian stepped up to actually do the work of writing and championing PEP 3148. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 20Sep2010 15:41, James Yonan <james@openvpn.net> wrote: [...] | * Develop a full-featured standard async result type and reactor | model to facilitate interoperability of different async libraries. | This would consist of a standard async result type and an abstract | base class for a reactor model. | | * Let PEP 3148 focus on the problem of thread and process pooling | and leverage on the above async result type. | | The semantics that a general async type should support include: | | 1. Semantics that allow you to define a callback channel for results | and and optionally a separate channel for exceptions as well. | | 2. Semantics that offer the flexibility of working with async | results at the callback level or at the generator level (having a | separate channel for exceptions makes it easy for the generator | decorator implementation (that facilitates "yield | function_returning_async_object()") to dispatch exceptions into the | caller). | | 3. Semantics that can easily be used to pass results and exceptions | back from thread or process pools. [...] Just to address this particular aspect (return types and notification), I have my own futures-like module, where the equivalent of a Future is called a LateFunction. There are only 3 basic types of return in my model: there's a .report() method in the main (Executor equivalent) class that yields LateFunctions as they complete. A LateFunction has two basic get-the result methods. Having made a LateFunction: LF = Later.defer(func) You can either go: result = LF() This waits for func's ompletion and returns func's return value. If func raises an exception, this raises that exception. Or you can go: result, exc_info = LF.wait() which returns: result, None if func completed without exception and None, exc_info if an exception was raised, where exc_info is a 3-tuple as from sys.exc_info(). At any rate, when looking for completion you can either get LateFunctions as they complete via .report(), or function results plain (that may raise exceptions) or function (results xor exceptions). This makes implementing the separate streams (results vs exceptions) models trivial if it is desired while keeping the LateFunction interface simple (few interface methods). Yes, I know there's no timeout stuff in there :-( Cheers, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ By God, Mr. Chairman, at this moment I stand astonished at my own moderation! - Baron Robert Clive of Plassey

I'm not an expert on this subject by any stretch, but have been following the discussion with interest. One of the more interesting ideas out of Microsoft in the last few years is their Reactive Framework (http://msdn.microsoft.com/en-us/devlabs/ee794896.aspx), which implements IObserver and IObservable as the dual to IEnumerator and IEnumerable. This makes operators on events just as composable as operators on enumerables. It also comes after several other attempts to formalize a standard async programming pattern. The ideas seam pretty generic, since they've released a javascript version of the approach as well. The basic interface is very simple, consisting of a subscribe method on IObservable and on_next, on_completed, and on_error methods for IObserver. The power comes from the extension methods, similar to itertools, defined in the Observable class (http://bit.ly/acBhbP). These methods provide a huge range of composable functionality. For instance, using a chaining style, consider a async webclient module that takes a bunch of urls: responses = webclient.get(['http://www1.cnn.com', 'http://www2.cnn.com']) responses.filter(lambda x: x.status == 200).first().do(lambda x: print(x.body)) The filter is nonblocking and returns another observable. The first() blocks and returns after the first document is received. The do calls a method. Multiple async streams can be composed together in all sorts of ways. For instance, http = webclient.get(['http://www.cnn.com', 'http://www.nyt.com']) https = webclient.get(['https://www.cnn.com', 'https://www.nyt.com']) http.zip(https).filter(lambda x, y: x.status == 200 and y.status == 200).start(lambda x, y: slow_save(x, y)) This never blocks. It downloads both the https and http versions of web pages, zips them into a new observable, filters sites with both http and https, and then saves asynchronously the remaining sites. I personally find this easy to reason about, and much easier than manually specifying a callback chain. Errors and completed events propagate through these chains intuitively. "Marble diagrams" help with intuition here (http://bit.ly/cl7Oad). All you need to do is implement the observable interface and you get all the composibility for free. Or you can just use any number of simple methods to convert things to observables (http://bit.ly/7VMnKv), such as observable.start(lambda: print("hi")). Or use decorators. If the observable interface became standard, all future async libraries would be composable, and their would also be a growing collection of observabletools. As somebody who is new to async programming, I quite quickly grasped this reactive approach even though I was otherwise completely unfamiliar with C#. While it may be due to my lack of experience, I still get confused when thinking about callback chains and error channels. For instance, I have no idea how to zip an async http call and a mongodb call into a simple observable that returns a tuple when both respond and then alerts the user. This would be as simple as webclient.get().zip(mongodb.get()).start(flash_completed_message) or maybe it's more pythonic to write obstools.start(obstools.zip(mongodb.get(), webclient.get), flash_completed_message) although I've never like this inside out style. But perhaps I missed the point of this thread? Tristan On Wed, Sep 22, 2010 at 6:31 PM, Cameron Simpson <cs@zip.com.au> wrote:

I should note that it should be possible to convert the twisted, twisted, eventlet, monocle, and other existing async libraries to observables pretty easily. The Javascript Rx library, for instance, already wraps the events from dojo, extjs, google maps, jquery, google translate, microsoft translate, mootools, prototype, raphael, virtualearth, and yui3, and keeps adding others to enable composability between different event driven widgets/frameworks. Tristan On Thu, Sep 23, 2010 at 12:41 AM, Tristan Zajonc <tristanz@gmail.com> wrote:
participants (7)
-
Andrew Bennetts
-
Cameron Simpson
-
Guido van Rossum
-
James Yonan
-
Jesse Noller
-
Nick Coghlan
-
Tristan Zajonc