Re: [Python-ideas] The async API of the future: Twisted and Deferreds

(Sorry if this doesn't end up in the right thread in mail clients; I've been reading this through a web UI and only just formally subscribed so can't reply directly to the correct email.) Code that uses generators is indeed often easier to read... but the problem is that this isn't just a difference in syntax, it has a significant semantic impact. Specifically, requiring yield means that you're re-introducing context switching. In inlineCallbacks, or coroutines, or any system that use yield as in your example above, arbitrary code may run during the context switch, and who knows what happened to the state of the world in the interim. True, it's an explicit context switch, unlike threading where it can happen at any point, but it's still a context switch, and it still increases the chance of race conditions and all the other problems threading has. (If you're omitting yield it's even worse, since you can't even tell anymore where the context switches are happening.) Superficially such code is simpler (and in some cases I'm happy to use inlineCallbacks, in particular in unit tests), but much the same way threaded code is "simpler". If you're not very very careful, it'll work 99 times and break mysteriously the 100th. For example, consider the following code; silly, but buggy due to the context switch in yield allowing race conditions if any other code modifies counter.value while getResult() is waiting for a result. def addToCounter(): counter.value = counter.value + (yield getResult()) In a Deferred callback, on the other hand, you know the only things that are going to run are functions you call. In so far as it's possible, what happens is under control of one function only. Less pretty, but no potential race conditions: def add(result): counter.value = counter.value + result getResult().addCallback(add) That being said, perhaps some changes to Python syntax could solve this; Allen Short ( http://washort.twistedmatrix.com/2012/10/coroutines-reduce-readability.html) claims to have a proposal, hopefully he'll post it soon.

In addition to the issue mentioned by Itamar, there needs to be a clear way to do two related things: 1) actually doing things asynchronously! A good example of where this happens for me is stats logging. I log some stats, but I don't want to wait for the request to be completed before I continue on with my work: def callback(): logSomeStats() return actuallyDoWorkCustomerCaresAbout() logSomeStats returns a deferred, and I probably would attach an errback to that deferred, but I don't want to wait until I've finished logging some stats to do the rest of the work, and I CERTAINLY don't want the work the customer cares about to bomb out because my stats server is down. In current inlineCallbacks, this is equally simple: I just run the expression and *not* yield. If I understand the current alternative suggestions correctly, the yielding part is important for actually hooking up the IO (whereas in @inlineCallbacks, it *only* does callback management). Perhaps I am mistaken in this belief? 2) doing multiple things concurrently. Let's say I want to download 10 web pages and do something when all ten of them have completed. In twisted, I can say: gatherResults(map(getPage, urls)).addCallback(...) with inlineCallbacks, you can do quite similar things (just yield the result of gatherResults, since that's a deferred that'll fire once all of them have fired): for body in (yield gatherResults(map(getPage, urls)): .... --- How would these two look in a world where the generator/inlineCallbacks magic isn't generator backed? cheers lvh

On Sat, Oct 13, 2012 at 9:05 PM, Laurens Van Houtven <_@lvh.cc> wrote:
Some have certainly suggested that, but not Guido. In Guido's API, the *_async() calls actually kick off the operations, the "yield" calls are the "I'm done for now, wake me when this Future I'm yielding is ready". This is the only way that makes sense, for the reasons you give here. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Oct 13, 2012 at 8:52 PM, Itamar Turner-Trauring <itamar@futurefoundries.com> wrote:
def addToCounter(): counter.value = counter.value + (yield getResult())
This is buggy code for the reasons you state. However, only improperly *embedded* yields have this problem, yields that are done in a dedicated assignment statement are fine: def addToCounter(): result = yield getResult() # No race condition here, as we only read the counter *after* receiving the result counter.value = counter.value + result (You can also make sure they're the first thing executed as part of a larger expression, but a separate assignment statement will almost always be clearer)
This is not the same code you wrote above in the generator version. The callback equivalent of the code you wrote is this: bound_value = counter.value def add(result): counter.value = bound_value + result getResult().addCallback(add) The generator version isn't magic, people still need to know what they're doing to properly benefit from the cooperative multithreading. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Oct 13, 2012 at 11:46 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
True, so, let's look at this version. First, notice that it's more convoluted than the version I wrote above; i.e. you have to go out of your way to write race conditiony code. Second, and much more important, when reading it it's obvious that you're getting and setting counter.value at different times! Whereas in the generator version you have to think about it. The generator version has you naturally writing code where things you thought are happening at the same time are actually happening very far apart; the Deferred code makes it clear which pieces of code happen separately, and so you're much more likely to notice these sort of bugs. The generator version isn't magic, people still need to know what
they're doing to properly benefit from the cooperative multithreading.
I agree. And that's exactly the dimension in which Deferreds are superior to cooperative multithreading; people don't have to think about race conditions as much, which is hard enough in general. At least when you're using Deferreds, you can tell by reading the code which chunks of code can happen at different times, and the natural idioms of Python don't *encourage* race conditions as they do with yield syntax. -Itamar

Itamar Turner-Trauring wrote:
But at least you can *see* from the presence of the 'yield' that suspension can occur. PEP 380 preserves this, because anything that can yield has to be called using 'yield from', so the potential suspension points remain visible.
He argues there that greenlet-style coroutines are bad because suspension can occur anywhere without warning. He likes generators better, because the 'yield' warns you that suspension might occur. Generators using 'yield from' have the same property. If his proposal involves marking the suspension points somehow, then syntactically it will probably be very similar to yield-from. -- Greg

On Sat, Oct 13, 2012 at 8:17 PM, Greg Ewing <greg.ewing@canterbury.ac.nz>wrote:
But at least you can *see* from the presence of the 'yield' that suspension can occur.
... He argues there that greenlet-style coroutines are bad because
Explicit suspension is certainly better than hidden suspension, yes. But by extension, no suspension at all is best.

On Sat, Oct 13, 2012 at 5:59 PM, Itamar Turner-Trauring <itamar@futurefoundries.com> wrote:
Explicit suspension is certainly better than hidden suspension, yes. But by extension, no suspension at all is best.
When using Deferreds, there are suspension points too. They just happen whenever a Deferred is blocked. Each next callback has to assume that the world may have changed. You may like that better. But for me, 9 out of 10 times, yield-based coroutines (whether using Futures or PEP 380's yield from) make the code more readable than the Deferred style. I do appreciate that often the Deferred style is an improvement over plain callbacks -- but I believe that explicit-yielding coroutines are so much better than Deferred that I'd rather base the future standard API on a combination of plain callbacks and either Futures+yield or yield-from (the latter without Futures). I trust that Twisted invented the best possible interface given the available machinery at the time (no yield-based coroutines at all, and not using Stackless). But now that we have explicit-yielding coroutines, I believe we should adopt a style based on those. Twisted can of course implement Deferred easily in this world using some form of adaptation, and we should ensure that this is indeed reasonable before accepting a standard. Whether it's better to use yield-from <generator> or yield <future> remains to be seen; that debate is still ongoing in the thread "yield-from". -- --Guido van Rossum (python.org/~guido)

In addition to the issue mentioned by Itamar, there needs to be a clear way to do two related things: 1) actually doing things asynchronously! A good example of where this happens for me is stats logging. I log some stats, but I don't want to wait for the request to be completed before I continue on with my work: def callback(): logSomeStats() return actuallyDoWorkCustomerCaresAbout() logSomeStats returns a deferred, and I probably would attach an errback to that deferred, but I don't want to wait until I've finished logging some stats to do the rest of the work, and I CERTAINLY don't want the work the customer cares about to bomb out because my stats server is down. In current inlineCallbacks, this is equally simple: I just run the expression and *not* yield. If I understand the current alternative suggestions correctly, the yielding part is important for actually hooking up the IO (whereas in @inlineCallbacks, it *only* does callback management). Perhaps I am mistaken in this belief? 2) doing multiple things concurrently. Let's say I want to download 10 web pages and do something when all ten of them have completed. In twisted, I can say: gatherResults(map(getPage, urls)).addCallback(...) with inlineCallbacks, you can do quite similar things (just yield the result of gatherResults, since that's a deferred that'll fire once all of them have fired): for body in (yield gatherResults(map(getPage, urls)): .... --- How would these two look in a world where the generator/inlineCallbacks magic isn't generator backed? cheers lvh

On Sat, Oct 13, 2012 at 9:05 PM, Laurens Van Houtven <_@lvh.cc> wrote:
Some have certainly suggested that, but not Guido. In Guido's API, the *_async() calls actually kick off the operations, the "yield" calls are the "I'm done for now, wake me when this Future I'm yielding is ready". This is the only way that makes sense, for the reasons you give here. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Oct 13, 2012 at 8:52 PM, Itamar Turner-Trauring <itamar@futurefoundries.com> wrote:
def addToCounter(): counter.value = counter.value + (yield getResult())
This is buggy code for the reasons you state. However, only improperly *embedded* yields have this problem, yields that are done in a dedicated assignment statement are fine: def addToCounter(): result = yield getResult() # No race condition here, as we only read the counter *after* receiving the result counter.value = counter.value + result (You can also make sure they're the first thing executed as part of a larger expression, but a separate assignment statement will almost always be clearer)
This is not the same code you wrote above in the generator version. The callback equivalent of the code you wrote is this: bound_value = counter.value def add(result): counter.value = bound_value + result getResult().addCallback(add) The generator version isn't magic, people still need to know what they're doing to properly benefit from the cooperative multithreading. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Oct 13, 2012 at 11:46 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
True, so, let's look at this version. First, notice that it's more convoluted than the version I wrote above; i.e. you have to go out of your way to write race conditiony code. Second, and much more important, when reading it it's obvious that you're getting and setting counter.value at different times! Whereas in the generator version you have to think about it. The generator version has you naturally writing code where things you thought are happening at the same time are actually happening very far apart; the Deferred code makes it clear which pieces of code happen separately, and so you're much more likely to notice these sort of bugs. The generator version isn't magic, people still need to know what
they're doing to properly benefit from the cooperative multithreading.
I agree. And that's exactly the dimension in which Deferreds are superior to cooperative multithreading; people don't have to think about race conditions as much, which is hard enough in general. At least when you're using Deferreds, you can tell by reading the code which chunks of code can happen at different times, and the natural idioms of Python don't *encourage* race conditions as they do with yield syntax. -Itamar

Itamar Turner-Trauring wrote:
But at least you can *see* from the presence of the 'yield' that suspension can occur. PEP 380 preserves this, because anything that can yield has to be called using 'yield from', so the potential suspension points remain visible.
He argues there that greenlet-style coroutines are bad because suspension can occur anywhere without warning. He likes generators better, because the 'yield' warns you that suspension might occur. Generators using 'yield from' have the same property. If his proposal involves marking the suspension points somehow, then syntactically it will probably be very similar to yield-from. -- Greg

On Sat, Oct 13, 2012 at 8:17 PM, Greg Ewing <greg.ewing@canterbury.ac.nz>wrote:
But at least you can *see* from the presence of the 'yield' that suspension can occur.
... He argues there that greenlet-style coroutines are bad because
Explicit suspension is certainly better than hidden suspension, yes. But by extension, no suspension at all is best.

On Sat, Oct 13, 2012 at 5:59 PM, Itamar Turner-Trauring <itamar@futurefoundries.com> wrote:
Explicit suspension is certainly better than hidden suspension, yes. But by extension, no suspension at all is best.
When using Deferreds, there are suspension points too. They just happen whenever a Deferred is blocked. Each next callback has to assume that the world may have changed. You may like that better. But for me, 9 out of 10 times, yield-based coroutines (whether using Futures or PEP 380's yield from) make the code more readable than the Deferred style. I do appreciate that often the Deferred style is an improvement over plain callbacks -- but I believe that explicit-yielding coroutines are so much better than Deferred that I'd rather base the future standard API on a combination of plain callbacks and either Futures+yield or yield-from (the latter without Futures). I trust that Twisted invented the best possible interface given the available machinery at the time (no yield-based coroutines at all, and not using Stackless). But now that we have explicit-yielding coroutines, I believe we should adopt a style based on those. Twisted can of course implement Deferred easily in this world using some form of adaptation, and we should ensure that this is indeed reasonable before accepting a standard. Whether it's better to use yield-from <generator> or yield <future> remains to be seen; that debate is still ongoing in the thread "yield-from". -- --Guido van Rossum (python.org/~guido)
participants (5)
-
Greg Ewing
-
Guido van Rossum
-
Itamar Turner-Trauring
-
Laurens Van Houtven
-
Nick Coghlan