Re: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object
I think that Glyph hit the nail on the head when he said that "you can go from any arbitrary Future to a full-featured Deferred, but not the other way around." This is exactly my concern, and the reason why I think it's important for Python to standardize on an async result type that is sufficiently general that it can accommodate the different kinds of async semantics in common use in the Python world today. If you don't think this is a problem, just Google for "twisted vs. tornado". While the debate is sometimes passionate and rude, it points to the fragmentation that has occured in the Python async space due to the lack of direction from the standard library. And there's a real cost to this fragmentation -- it's not easy to build an application that uses different async frameworks when there's no standardized result object or reactor model. My concern is that PEP 3148 was really designed for the purpose of thread and process pooling, and that the Future object is designed with the minimum functionality required to achieve this end. The problem is that the Future object starts to look like a stripped-down version of a Twisted Deferred. And that begs the question of why are we standardizing on the special case and not the general case? Wouldn't it be better to break this into two problems: * Develop a full-featured standard async result type and reactor model to facilitate interoperability of different async libraries. This would consist of a standard async result type and an abstract base class for a reactor model. * Let PEP 3148 focus on the problem of thread and process pooling and leverage on the above async result type. The semantics that a general async type should support include: 1. Semantics that allow you to define a callback channel for results and and optionally a separate channel for exceptions as well. 2. Semantics that offer the flexibility of working with async results at the callback level or at the generator level (having a separate channel for exceptions makes it easy for the generator decorator implementation (that facilitates "yield function_returning_async_object()") to dispatch exceptions into the caller). 3. Semantics that can easily be used to pass results and exceptions back from thread or process pools. 4. Semantics that allow for aggregate processing of parallel asynchronous results, such as "fire async result when all of the async results in an async set have fired" or "fire async result when the first result from an async set has fired." Deferreds presently support all of the above. My point here is not so much that Deferreds should be the standard, but that whatever standard is chosen, that the semantics be general enough that different async Python libraries/platforms can interoperate. James
Thanks for the ping about this (I don't think I subscribe to python-ideas, so someone may have to moderate my post in). Sorry for the delay in responding, but I've been kinda busy and cooking up these examples took a bit of thinking.
And thanks, James, for restarting this discussion. I obviously find it interesting :).
I'm going to mix in some other stuff I found on the web archives, since it's easiest just to reply in one message. I'm sorry that this response is a bit sprawling and doesn't have a single clear narrative, the thread thus far didn't seem to lend it to one.
For those of you who don't want to read my usual novel-length post, you can probably stop shortly after the end of the first block of code examples.
On Sep 11, 2010, at 10:26 PM, Guido van Rossum wrote:
although he didn't say what deferreds really added beyond what futures provide, and why the "add_done_callback" method isn't adequate to provide interoperability between futures and deferreds (which would be odd, since Brian made changes to that part of PEP 3148 to help with that interoperability after discussions with Glyph).
Between PEP 380 and PEP 3148 I'm not really seeing a lot more scope for standardisation in this space though.
Cheers, Nick.
That was my initial reaction as well, but I'm more than open to hearing from Jean Paul/Glyph and the other twisted folks on this.
But thinking about this more I don't know that it will be easy to mix PEP 3148, which is solidly thread-based, with a PEP 342 style scheduler (whether or not the PEP 380 enhancements are applied, or even PEP 3152). And if we take the OP's message at face value, his point isn't so much that Twisted is great, but that in order to benefit maximally from PEP 342 there needs to be a standard way of using callbacks. I think that's probably true. And comparing the blog's examples to PEP 3148, I find Twisted's terminology rather confusing compared to the PEP's clean Futures API (where IMO you can ignore almost everything except result()).
That blog post was written to demonstrate why programs using generators are "... far easier to read and write ..." than ones using Deferreds, so it stands to reason it would choose an example where that helps :).
When you want to write systems that manage varying levels of parallelism within a single computation, generators can start to get pretty hairy and the "normal" Deferred way of doing things looks more straightforward.
Thinking in terms of asynchronicity is tricky, and generators can be a useful tool for promoting that understanding, but they only make it superficially easier. For example:
def serial(): results = set() for x in ...: results.add((yield do_something_async(x))) return results
If you're writing an application whose parallelism calls for an asynchronous approach, after all, you presumably don't want to be standing around waiting for each network round trip to complete. How do you re-write this so that there are always at least N outstanding do_something_async calls running in parallel?
You can sorta do it like this:
def parallel(N): results = set() outstanding = [] for x in ...: if len(outstanding) > N: results.add((yield outstanding.pop(0))) else: outstanding.append(do_something_async(x))
but that will always block on one particular do_something_async, when you really want to say "let me know when any outstanding call is complete". So I could handwave about 'yield any_completed(outstanding)'...
def parallel(N): results = set() outstanding = set() for x in ...: if len(outstanding) > N: results.add((yield any_completed(outstanding))) else: outstanding.add(do_something_async(x))
but that just begs the question of how you implement any_completed(), and I can't think of a way to do that with generators, without getting into the specifics of some Deferred-or-Future-like asynchronous result object. You could implement such a function with such primitives, and here's what it looks like with Deferreds:
def any_completed(setOfDeferreds): d = Deferred() called = [] def fireme(result, whichDeferred): if not called: called.append(True) setOfDeferreds.remove(whichDeferred) d.callback(result) return result for subd in setOfDeferreds: subd.addBoth(fireme, subd) return d
Here's how you do the top-level task in Twisted, without generators, in the truly-parallel fashion (keep in mind this combines the functionality of 'any_completed' and 'parallel', so it's a bit shorter):
def parallel(N): ds = DeferredSemaphore(N) l = [] def release(result): ds.release() return result def after(sem, it): return do_something_async(it) for x in ...: l.append(ds.acquire().addCallback(after_acquire, x).addBoth(release)) return gatherResults(l).addCallback(set)
Some informal benchmarking has shown this method to be considerably faster (on the order of 1/2 to 1/3 as much CPU time) than at least our own inlineCallbacks generator-scheduling method. Take this with the usual fist-sized grain of salt that you do any 'informal' benchmarks, but the difference is significant enough that I do try to refactor into this style in my own code, and I have seen performance benefits from doing this on more specific benchmarks.
This is all untested, and that's far too many lines of code to expect to work without testing, but hopefully it gives a pretty good impression of the differences in flavor between the different styles.
Yeah, please do explain why Twisted has so much machinery to handle exceptions?
There are a lot of different implied questions here, so I'll answer a few of those.
Why does twisted.python.failure exist? The answer to that is that we wanted an object that represented an exception as raised at a particular point, associated with a particular stack, that could live on without necessarily capturing all the state in that stack. If you're going to report failures asynchronously, you don't necessarily want to hold a reference to every single thing in a potentially giant stack while you're waiting to send it to some network endpoint. Also, in 1.5.2 we had no way of chaining exceptions, and this code is that old. Finally, even if you can chain exceptions, it's a serious performance hit to have to re-raise and re-catch the same exception 4 or 5 times in order to translate it or handle it at many different layers of the stack, so a Failure is intended to encapsulate that state such that it can just be returned, in performance-sensitive areas. (This is sort of a weak point though, since the performance of Failure itself is so terrible, for u nrelated reasons.)
Why is twisted.python.failure such a god damned mess? The answer to that is ... uh, sorry. Yes, it is. We should clean it up. It was written a long time ago and the equivalent module now could be _much_ shorter, simpler, and less of a performance problem. It just never seems to be the highest priority. Maybe after we're done porting to py3 :). My one defense here is that still a slight improvement over the stdlib 'traceback' module ;-).
Why do Deferreds have an errback chain rather than just handing you an exception object in the callback chain? Basically, this is for the same reason that Python has exceptions instead of just making you check return codes. We wanted it to be easy to say:
d = getPage("http://...") def ok(page): doSomething(...) d.addCallback(ok)
and know that the argument to 'ok' would always be what getPage promised (you don't need to typecheck it for exception-ness) and the default error behavior would be to simply bail out with a traceback, not to barrel through your success-path code wreaking havoc.
ISTM that the main difference is that add_done_callback() isn't meant for callbacks that return a value.
add_done_callback works fine with callbacks that return a value. If it didn't, I'd be concerned, because then it would have the barrel-through-the-success-path flaw. But, I assume the idiomatic asynchronous-code-using-Futures would look like this:
f = some_future_thing(...) def my_callback(future): result = future.result() do_something(result) f.add_done_callback(my_callback)
This is one extra line of code as compared to the Twisted version, and chaining involves a bit more gymnastics (somehow creating more futures to return further up the stack, I guess, I haven't thought about it too hard), but it does allow you to handle exceptions with a simple 'except:', rather than calling some exception-handling methods, so I can see why some people would prefer it.
Maybe it's possible to write a little framework that lets you create Futures using either threads, processes (both supported by PEP 3148) or generators. But I haven't tried it. And maybe the need to use 'yield' for everything that may block when using generators, but not when using threads or processes, will make this awkward.
You've already addressed the main point that I really wanted to mention here, but I'd like to emphasize it. Blocking and not-blocking are fundamentally different programming styles, and if you sometimes allow blocking on asynchronous results, that means you are effectively always programming in the blocking-and-threaded style and not getting much benefit from the code which does choose to be politely non-blocking.
I was somewhat pleased with the changes made to the Futures PEP because you could use them as an asynchronous result, and have things that implemented the Future API but raised an exception if you tried to wait on them. That would at least allow some layer of stdlib compatibility. If you are disciplined and careful, this would let you write async code which used a common interoperability mechanism, and if you weren't careful, it would blow up when you tried to use it the wrong way.
But - and I am guessing that this is the main thrust of this discussion - I do think that having Deferred in the standard library would be much, much better if we can do that.
So maybe we'll be stuck with at least two Future-like APIs: PEP 3148 and something else, generator-based.
Having something "generator-based" is, in my opinion, an abstraction inversion. The things which you are yielding from these generators are asynchronous results. There should be a specific type for asynchronous results which can be easily interacted with. Generators are syntactic sugar for doing that interaction in a way which doesn't involve defining tons of little functions. This is useful, and it makes the concept more accessible, so I don't say "just" syntactic sugar: but nevertheless, the generators need to be 'yield'ing something, and the type of thing that they're yielding is a Deferred-or-something-like-it.
I don't think that this is really two 'Future-like APIs'. At least, they're not redundant, any more than having both socket.makefile() and socket.recv() is redundant.
If Future had a deferred() method rather than an add_done_callback() method, then it would always be very clear whether you had a synchronous-but-possibly-not-ready or a purely-asynchronous result. Although it would be equally easy to just have a function that turned a Future into a Deferred by calling add_done_callback(). You can go from any arbitrary Future to a full-featured Deferred, but not the other way around.
Or maybe PEP 3152.
I don't like PEP 3152 aesthetically on many levels, but I can't deny that it would do the job. 'cocall', though, really? It would be nice if it read like an actual word, i.e. "yield to" or "invoke" or even just "call" or something.
In another message, where Guido is replying to Antoine:
I think the main reason, though, that people find Deferreds inconvenient is that they force you to think in terms of asynchronicity (...)
Actually I think the main reason is historic: Twisted introduced callback-based asynchronous (thread-less) programming when there was no alternative in Python, and they invented both the mechanisms and the terminology as they were figuring it all out. That is no mean feat. But with PEP 342 (generator-based coroutines) and especially PEP 380 (yield from) there *is* an alternative, and while Twisted has added APIs to support generators, it hasn't started to deprecate its other APIs, and its terminology becomes hard to follow for people (like me, frankly) who first learned this stuff through PEP 342.
I really have to go with Antoine on this one: people were confused about Deferreds long before PEP 342 came along :). Given that Javascript environments have mostly adopted the Twisted terminology (oddly, Node.js doesn't, but Dojo and MochiKit both have pretty literal-minded Deferred translations), there are plenty of people who are familiar with the terminology but still get confused.
See the beginning of the message for why we're not deprecating our own APIs.
Once again, sorry for not compressing this down further! If you got this far, you win a prize :).
On Mon, Sep 20, 2010 at 2:41 PM, James Yonan
I think that Glyph hit the nail on the head when he said that "you can go from any arbitrary Future to a full-featured Deferred, but not the other way around."
Where by "go from X to Y" you mean "take a program written using X and change it to use Y", right?
This is exactly my concern, and the reason why I think it's important for Python to standardize on an async result type that is sufficiently general that it can accommodate the different kinds of async semantics in common use in the Python world today.
I think I get your gist. Unfortunately there's only a small number of people who know enough about async semantics in order to write the PEP that is needed.
If you don't think this is a problem, just Google for "twisted vs. tornado". While the debate is sometimes passionate and rude,
Is it ever distanced and polite? :-)
it points to the fragmentation that has occured in the Python async space due to the lack of direction from the standard library. And there's a real cost to this fragmentation -- it's not easy to build an application that uses different async frameworks when there's no standardized result object or reactor model.
But, circularly, the lack of direction from the standard library is that nobody has contributed an async framework to the standard library since asyncore was added in, oh, 1999.
My concern is that PEP 3148 was really designed for the purpose of thread and process pooling, and that the Future object is designed with the minimum functionality required to achieve this end. The problem is that the Future object starts to look like a stripped-down version of a Twisted Deferred. And that begs the question of why are we standardizing on the special case and not the general case?
Because we could reach agreement fairly quickly on PEP 3148. There are some core contributors who know threads and processes inside out, and after several rounds of comments (a lot, really) they were satisfied. At this point it is probably best to forget about PEP 3148 if you want to improve the async situation in the stdlib, and start thinking about that async PEP instead.
Wouldn't it be better to break this into two problems:
* Develop a full-featured standard async result type and reactor model to facilitate interoperability of different async libraries. This would consist of a standard async result type and an abstract base class for a reactor model.
Unless you want to propose to include Twisted into the stdlib, this is not going to be ready for inclusion into Python 3.2.
* Let PEP 3148 focus on the problem of thread and process pooling and leverage on the above async result type.
But PEP 3148 *is* ready for inclusion in Python 3.2. So you've got the ordering wrong. It doesn't make sense to hold up PEP 3148, waiting for the perfect solution to appear. In fact, the changes that were made to PEP 3148 at Glyph's suggestion are probably all you are going to get regarding PEP 3148.
The semantics that a general async type should support include:
1. Semantics that allow you to define a callback channel for results and and optionally a separate channel for exceptions as well.
2. Semantics that offer the flexibility of working with async results at the callback level or at the generator level (having a separate channel for exceptions makes it easy for the generator decorator implementation (that facilitates "yield function_returning_async_object()") to dispatch exceptions into the caller).
3. Semantics that can easily be used to pass results and exceptions back from thread or process pools.
4. Semantics that allow for aggregate processing of parallel asynchronous results, such as "fire async result when all of the async results in an async set have fired" or "fire async result when the first result from an async set has fired."
Deferreds presently support all of the above. My point here is not so much that Deferreds should be the standard, but that whatever standard is chosen, that the semantics be general enough that different async Python libraries/platforms can interoperate.
Do you want to champion a PEP? I hope you do -- it will be a long march but rewarding, especially if you get the Tornado folks to participate and contribute. -- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote: [...]
Unless you want to propose to include Twisted into the stdlib, this is not going to be ready for inclusion into Python 3.2.
I don't think anyone has suggested "include Twisted". What is being suggested is "include twisted.internet.defer, or something about as useful." Let's consider just how hard it would be to just adding twisted/internet/defer.py to the stdlib (possibly as 'deferred.py'). It's already almost a standalone module, especially if pared back to just the Deferred class and maybe one or two of the most useful helpers (e.g. gatherResults, to take a list of Deferreds and turn them into a single Deferred that fires when they have all fired). The two most problematic dependencies would be: 1) twisted.python.log, which for these purposes could be replaced with a call to a user-replaceable hook whenever an unhandled error occurs (similiar to sys.excepthook). 2) twisted.python.failure... this one is harder. As glyph said, it provides "an object that represent[s] an exception as raised at a particular point, associated with a particular stack". But also, as he said, it's a mess and could use a clean up. Cleaning it up or thinking of a simpler replacement is not insurmountable, but probably too ambitious for Python 3.2's schedule. My point is that adding the Deferred abstraction to the stdlib is a *much* smaller and more reasonable proposition than "include Twisted." -Andrew.
On Tue, Sep 21, 2010 at 1:39 AM, Andrew Bennetts
Guido van Rossum wrote: [...]
Unless you want to propose to include Twisted into the stdlib, this is not going to be ready for inclusion into Python 3.2.
I don't think anyone has suggested "include Twisted". What is being suggested is "include twisted.internet.defer, or something about as useful."
Let's consider just how hard it would be to just adding twisted/internet/defer.py to the stdlib (possibly as 'deferred.py'). It's already almost a standalone module, especially if pared back to just the Deferred class and maybe one or two of the most useful helpers (e.g. gatherResults, to take a list of Deferreds and turn them into a single Deferred that fires when they have all fired).
The two most problematic dependencies would be:
1) twisted.python.log, which for these purposes could be replaced with a call to a user-replaceable hook whenever an unhandled error occurs (similiar to sys.excepthook). 2) twisted.python.failure... this one is harder. As glyph said, it provides "an object that represent[s] an exception as raised at a particular point, associated with a particular stack". But also, as he said, it's a mess and could use a clean up. Cleaning it up or thinking of a simpler replacement is not insurmountable, but probably too ambitious for Python 3.2's schedule.
My point is that adding the Deferred abstraction to the stdlib is a *much* smaller and more reasonable proposition than "include Twisted."
-Andrew.
No on was seriously proposing including twisted wholesale. There has been discussion, off and on *for years* about doing including a stripped down deferred object; and yet no one has stepped up to *do it*, so it might be hilariously easy, it might be a 40 line module, but it doesn't matter if no one steps up to do the pep, and commit the code, and commit to maintaining it. jesse
On Tue, Sep 21, 2010 at 11:25 PM, Jesse Noller
There has been discussion, off and on *for years* about doing including a stripped down deferred object; and yet no one has stepped up to *do it*, so it might be hilariously easy, it might be a 40 line module, but it doesn't matter if no one steps up to do the pep, and commit the code, and commit to maintaining it.
Indeed. Thread and process pools had similarly been talked about for quite some time before Brian stepped up to actually do the work of writing and championing PEP 3148. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 20Sep2010 15:41, James Yonan
I'm not an expert on this subject by any stretch, but have been
following the discussion with interest.
One of the more interesting ideas out of Microsoft in the last few
years is their Reactive Framework
(http://msdn.microsoft.com/en-us/devlabs/ee794896.aspx), which
implements IObserver and IObservable as the dual to IEnumerator and
IEnumerable. This makes operators on events just as composable as
operators on enumerables. It also comes after several other attempts
to formalize a standard async programming pattern. The ideas seam
pretty generic, since they've released a javascript version of the
approach as well.
The basic interface is very simple, consisting of a subscribe method
on IObservable and on_next, on_completed, and on_error methods for
IObserver. The power comes from the extension methods, similar to
itertools, defined in the Observable class (http://bit.ly/acBhbP).
These methods provide a huge range of composable functionality.
For instance, using a chaining style, consider a async webclient
module that takes a bunch of urls:
responses = webclient.get(['http://www1.cnn.com', 'http://www2.cnn.com'])
responses.filter(lambda x: x.status == 200).first().do(lambda x: print(x.body))
The filter is nonblocking and returns another observable. The first()
blocks and returns after the first document is received. The do calls
a method. Multiple async streams can be composed together in all sorts
of ways. For instance,
http = webclient.get(['http://www.cnn.com', 'http://www.nyt.com'])
https = webclient.get(['https://www.cnn.com', 'https://www.nyt.com'])
http.zip(https).filter(lambda x, y: x.status == 200 and y.status ==
200).start(lambda x, y: slow_save(x, y))
This never blocks. It downloads both the https and http versions of
web pages, zips them into a new observable, filters sites with both
http and https, and then saves asynchronously the remaining sites. I
personally find this easy to reason about, and much easier than
manually specifying a callback chain. Errors and completed events
propagate through these chains intuitively. "Marble diagrams" help
with intuition here (http://bit.ly/cl7Oad).
All you need to do is implement the observable interface and you get
all the composibility for free. Or you can just use any number of
simple methods to convert things to observables
(http://bit.ly/7VMnKv), such as observable.start(lambda: print("hi")).
Or use decorators. If the observable interface became standard, all
future async libraries would be composable, and their would also be a
growing collection of observabletools.
As somebody who is new to async programming, I quite quickly grasped
this reactive approach even though I was otherwise completely
unfamiliar with C#. While it may be due to my lack of experience, I
still get confused when thinking about callback chains and error
channels. For instance, I have no idea how to zip an async http call
and a mongodb call into a simple observable that returns a tuple when
both respond and then alerts the user. This would be as simple as
webclient.get().zip(mongodb.get()).start(flash_completed_message)
or maybe it's more pythonic to write
obstools.start(obstools.zip(mongodb.get(), webclient.get),
flash_completed_message)
although I've never like this inside out style.
But perhaps I missed the point of this thread?
Tristan
On Wed, Sep 22, 2010 at 6:31 PM, Cameron Simpson
On 20Sep2010 15:41, James Yonan
wrote: [...] | * Develop a full-featured standard async result type and reactor | model to facilitate interoperability of different async libraries. | This would consist of a standard async result type and an abstract | base class for a reactor model. | | * Let PEP 3148 focus on the problem of thread and process pooling | and leverage on the above async result type. | | The semantics that a general async type should support include: | | 1. Semantics that allow you to define a callback channel for results | and and optionally a separate channel for exceptions as well. | | 2. Semantics that offer the flexibility of working with async | results at the callback level or at the generator level (having a | separate channel for exceptions makes it easy for the generator | decorator implementation (that facilitates "yield | function_returning_async_object()") to dispatch exceptions into the | caller). | | 3. Semantics that can easily be used to pass results and exceptions | back from thread or process pools. [...] Just to address this particular aspect (return types and notification), I have my own futures-like module, where the equivalent of a Future is called a LateFunction.
There are only 3 basic types of return in my model:
there's a .report() method in the main (Executor equivalent) class that yields LateFunctions as they complete.
A LateFunction has two basic get-the result methods. Having made a LateFunction: LF = Later.defer(func)
You can either go: result = LF() This waits for func's ompletion and returns func's return value. If func raises an exception, this raises that exception.
Or you can go: result, exc_info = LF.wait() which returns: result, None if func completed without exception and None, exc_info if an exception was raised, where exc_info is a 3-tuple as from sys.exc_info().
At any rate, when looking for completion you can either get LateFunctions as they complete via .report(), or function results plain (that may raise exceptions) or function (results xor exceptions).
This makes implementing the separate streams (results vs exceptions) models trivial if it is desired while keeping the LateFunction interface simple (few interface methods).
Yes, I know there's no timeout stuff in there :-(
Cheers, -- Cameron Simpson
DoD#743 http://www.cskk.ezoshosting.com/cs/ By God, Mr. Chairman, at this moment I stand astonished at my own moderation! - Baron Robert Clive of Plassey _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
I should note that it should be possible to convert the twisted,
twisted, eventlet, monocle, and other existing async libraries to
observables pretty easily. The Javascript Rx library, for instance,
already wraps the events from dojo, extjs, google maps, jquery, google
translate, microsoft translate, mootools, prototype, raphael,
virtualearth, and yui3, and keeps adding others to enable
composability between different event driven widgets/frameworks.
Tristan
On Thu, Sep 23, 2010 at 12:41 AM, Tristan Zajonc
I'm not an expert on this subject by any stretch, but have been following the discussion with interest.
One of the more interesting ideas out of Microsoft in the last few years is their Reactive Framework (http://msdn.microsoft.com/en-us/devlabs/ee794896.aspx), which implements IObserver and IObservable as the dual to IEnumerator and IEnumerable. This makes operators on events just as composable as operators on enumerables. It also comes after several other attempts to formalize a standard async programming pattern. The ideas seam pretty generic, since they've released a javascript version of the approach as well.
The basic interface is very simple, consisting of a subscribe method on IObservable and on_next, on_completed, and on_error methods for IObserver. The power comes from the extension methods, similar to itertools, defined in the Observable class (http://bit.ly/acBhbP). These methods provide a huge range of composable functionality.
For instance, using a chaining style, consider a async webclient module that takes a bunch of urls:
responses = webclient.get(['http://www1.cnn.com', 'http://www2.cnn.com']) responses.filter(lambda x: x.status == 200).first().do(lambda x: print(x.body))
The filter is nonblocking and returns another observable. The first() blocks and returns after the first document is received. The do calls a method. Multiple async streams can be composed together in all sorts of ways. For instance,
http = webclient.get(['http://www.cnn.com', 'http://www.nyt.com']) https = webclient.get(['https://www.cnn.com', 'https://www.nyt.com']) http.zip(https).filter(lambda x, y: x.status == 200 and y.status == 200).start(lambda x, y: slow_save(x, y))
This never blocks. It downloads both the https and http versions of web pages, zips them into a new observable, filters sites with both http and https, and then saves asynchronously the remaining sites. I personally find this easy to reason about, and much easier than manually specifying a callback chain. Errors and completed events propagate through these chains intuitively. "Marble diagrams" help with intuition here (http://bit.ly/cl7Oad).
All you need to do is implement the observable interface and you get all the composibility for free. Or you can just use any number of simple methods to convert things to observables (http://bit.ly/7VMnKv), such as observable.start(lambda: print("hi")). Or use decorators. If the observable interface became standard, all future async libraries would be composable, and their would also be a growing collection of observabletools.
As somebody who is new to async programming, I quite quickly grasped this reactive approach even though I was otherwise completely unfamiliar with C#. While it may be due to my lack of experience, I still get confused when thinking about callback chains and error channels. For instance, I have no idea how to zip an async http call and a mongodb call into a simple observable that returns a tuple when both respond and then alerts the user. This would be as simple as
webclient.get().zip(mongodb.get()).start(flash_completed_message)
or maybe it's more pythonic to write
obstools.start(obstools.zip(mongodb.get(), webclient.get), flash_completed_message)
although I've never like this inside out style.
But perhaps I missed the point of this thread?
Tristan
On Wed, Sep 22, 2010 at 6:31 PM, Cameron Simpson
wrote: On 20Sep2010 15:41, James Yonan
wrote: [...] | * Develop a full-featured standard async result type and reactor | model to facilitate interoperability of different async libraries. | This would consist of a standard async result type and an abstract | base class for a reactor model. | | * Let PEP 3148 focus on the problem of thread and process pooling | and leverage on the above async result type. | | The semantics that a general async type should support include: | | 1. Semantics that allow you to define a callback channel for results | and and optionally a separate channel for exceptions as well. | | 2. Semantics that offer the flexibility of working with async | results at the callback level or at the generator level (having a | separate channel for exceptions makes it easy for the generator | decorator implementation (that facilitates "yield | function_returning_async_object()") to dispatch exceptions into the | caller). | | 3. Semantics that can easily be used to pass results and exceptions | back from thread or process pools. [...] Just to address this particular aspect (return types and notification), I have my own futures-like module, where the equivalent of a Future is called a LateFunction.
There are only 3 basic types of return in my model:
there's a .report() method in the main (Executor equivalent) class that yields LateFunctions as they complete.
A LateFunction has two basic get-the result methods. Having made a LateFunction: LF = Later.defer(func)
You can either go: result = LF() This waits for func's ompletion and returns func's return value. If func raises an exception, this raises that exception.
Or you can go: result, exc_info = LF.wait() which returns: result, None if func completed without exception and None, exc_info if an exception was raised, where exc_info is a 3-tuple as from sys.exc_info().
At any rate, when looking for completion you can either get LateFunctions as they complete via .report(), or function results plain (that may raise exceptions) or function (results xor exceptions).
This makes implementing the separate streams (results vs exceptions) models trivial if it is desired while keeping the LateFunction interface simple (few interface methods).
Yes, I know there's no timeout stuff in there :-(
Cheers, -- Cameron Simpson
DoD#743 http://www.cskk.ezoshosting.com/cs/ By God, Mr. Chairman, at this moment I stand astonished at my own moderation! - Baron Robert Clive of Plassey _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
participants (7)
-
Andrew Bennetts
-
Cameron Simpson
-
Guido van Rossum
-
James Yonan
-
Jesse Noller
-
Nick Coghlan
-
Tristan Zajonc