[Twisted-Python] gthreadless.py
Hello twisted folks, I reimplemented gthreadless.py, originally implemented by Chris Armstrong. My implementation aims to be more robust and is currently used in a large project, therefore it is already usable. http://www.pragma2000.com/wiki/index.php?GThreadless gthreadless is a twisted module that makes use of greenlets to implement the very nice primitive blockOn(d), that suspends the execution in the current stack frame while waiting for the deferred to fire. This way you can use a synchronous blocking-style programming model while still using twisted, all nicely integrated. It's a great great thing. Give it a try. cheers, Stefano Masini
gthreadless is a twisted module that makes use of greenlets to implement the very nice primitive blockOn(d), that suspends the execution in the current stack frame while waiting for the deferred to fire. This way you can use a synchronous blocking-style programming model while still using twisted, all nicely integrated.
From v.2.0, inside the twisted/internet/defer.py there are one class, waitForDeferred, and one function, deferredGenerator, that implement a similar pseudo-synchronous style, but using standard generators instead of greenlets.
Defgen: Just one more reason that Twisted freakin' rocks http://mesozoic.geecs.org/cogito/archives/000160.html Furthermore, PEP 342 has been accepted for v.2.5: Coroutines via Enhanced Generators http://www.python.org/peps/pep-0342.html its enhancements should further simplify such a coding style in Twisted. However, a couple of recent blog entries show that this way of "hiding" Deferreds raises some eyebrows within Twisted's inner circle: Magical Concurrency Faeries or How I Learned To Stop Worrying and Love Deferreds http://www.livejournal.com/users/jcalderone/9531.html Knowing Santa Claus is Fake Doesn't Ruin Christmas http://www.livejournal.com/users/glyf/40037.html Personally, I think that while explicitly specifying deferreds and callbacks and errbacks can be quite verbose, and may sometimes obscure the program flow, the comfort of seeing clearly the boundaries of each uninterruptible execution unit makes it worthwhile. -- Nicola Larosa - nico@tekNico.net My god carries a hammer. Your god died nailed to a tree. Any questions? -- maxpublic on Slashdot, July 2005
On 8/30/05, Nicola Larosa <nico@teknico.net> wrote:
From v.2.0, inside the twisted/internet/defer.py there are one class, waitForDeferred, and one function, deferredGenerator, that implement a similar pseudo-synchronous style, but using standard generators instead of greenlets.
I'm familiar with deferredGenerator. I've been using twisted full time for two years and a half now and used deferredGenerator for quite a lot too. That's why I reimplemented gthreadless! :)
Furthermore, PEP 342 has been accepted for v.2.5:
Coroutines via Enhanced Generators http://www.python.org/peps/pep-0342.html
its enhancements should further simplify such a coding style in Twisted.
The problem with generators and enhanced generators, as I've been discussing with a few people at Europython, is that they allow you to jump back and forth between two stack *frames*. Whereas greenlets support jumping among *full* stacks. This means that from within a @deferredGreenlet'ed function you can make calls to other functions that call blockOn, while from a @deferredGenerator'ed function you can't call a method that in turn calls waitForDeferred. While this may seems quite a subtle difference, I think it is not if you look at it from the perspective of code readability -- let alone the 3 liner horrible hack that waitForDeferred forces you to, at least until PEP 342, as we all know.
However, a couple of recent blog entries show that this way of "hiding" Deferreds raises some eyebrows within Twisted's inner circle:
Magical Concurrency Faeries or How I Learned To Stop Worrying and Love Deferreds http://www.livejournal.com/users/jcalderone/9531.html
Knowing Santa Claus is Fake Doesn't Ruin Christmas http://www.livejournal.com/users/glyf/40037.html
Personally, I think that while explicitly specifying deferreds and callbacks and errbacks can be quite verbose, and may sometimes obscure the program flow, the comfort of seeing clearly the boundaries of each uninterruptible execution unit makes it worthwhile.
Believe me. I'm not one of those users in the "periphery of the Twisted community" thinking that asynchronous programming is too hard. I've been writing network code for quite a while and I matured the idea that threads get in your way back when I still didn't know python. What I'm only concerned about now is coding *style*. Making code better looking, thus more easily maintainable. I rewrote gthreadless because having a usable implementation allowed rewriting parts of my existing (big) application in a much simpler way. And let me stress this: *parts* of it. I think gthreadless should only be used here and there, not everywhere. One should always keep thinking of deferreds, and even inside a @deferredGreenlet'ed function one should be very clear that blockOn() really only spits back a deferred to the reactor. But at least, you can debug your function without having to jump back and forth 40 lines at a time just to get to the proper callback or errback. I think sometimes this verbosity may get in the way of the pythonic spirit of keeping stuff simple. Here is an example that makes justice to this approach. It involves Perspective Broker. Think of writing a web frontend to an application on the backend that exports functionality through pb. (the code may not work, I'm just making it up now without testing it) (I hope everybody is familiar with nevow.stan. If not, take a look at it. It's worth.) @deferredGreenlet def renderPage(self): dataList = [ blockOn(self.backend.callRemote('getDataFromId', elementId)) for elementId in self.idList] return T.html[ T.body [ 'The result:', T.br, [ (txt, T.br) for txt in dataList ] ] ] Without gthreadless: def renderPage(self): dataList = [] def fetchDataRemotely(elementList): def cbFetch(elementData): dataList.append(elementData) if len(elementList) > 0: return fetchDataRemotely(elementList[1:]) return self.backend.callRemote('getDataFromId', elementId).addCallback(cbFetch) def cb(crap): return T.html[ T.body [ 'The result:', T.br, [ (txt, T.br) for txt in dataList ] ] ] return fetchDataRemotely(self.idList).addCallback(cb) I hope everyone agrees that the level of complexity in *reading* and *understanting* what the above code snippets do is not the same. The above code could have actually been written the same way using deferredGenerator and PEP 342, or in a slightly more verbose way without PEP 342. But the example is simple. If instead of simply callRemote() you had to use another method, that in turn needed callRemote, maybe a couple of times (very possible if you use pb), then greenlets would have been indispensable, in order to keep the renderPage() the same as you saw. Cheers, stefano
On 8/30/05, Stefano Masini <stefano@pragma2000.com> wrote:
The problem with generators and enhanced generators, as I've been discussing with a few people at Europython, is that they allow you to jump back and forth between two stack *frames*. Whereas greenlets support jumping among *full* stacks. This means that from within a @deferredGreenlet'ed function you can make calls to other functions that call blockOn, while from a @deferredGenerator'ed function you can't call a method that in turn calls waitForDeferred.
While this may seems quite a subtle difference, I think it is not if you look at it from the perspective of code readability -- let alone the 3 liner horrible hack that waitForDeferred forces you to, at least until PEP 342, as we all know.
The effect that being able to call things that do context switches without explicitly marking them as doing so is much farther reaching than code readability. I consider it something of a feature that deferredGenerator forces you to know when context switches will happen at every level, and when I wrote gthreadless I was not intending it to be used in a way that didn't require that same knowledge at every level. And I don't think that this extra knowledge along the line isn't detracting at all to readability, but instead helping it. As I said in another thread, I'm really looking forward to PEP 342 and I'm looking forward to obsoleting all of these other deferred + cooperative multitasking things I've written (threadless, gthreadless, and old-school-defgen) with newschool-PEP342-defgen. And, just for onlookers, I'd like to point out that the code example below is not showing the difference between non-explicit-greenlets and explicit-defgen, but instead showing the difference between explicit-greenlets and plain ond deferreds, which most of us in the know about the various deferred+cooperative multitasking integration modules already understand. No one as yet has shown an example showing where implicit context switching is a good thing. ...
Here is an example that makes justice to this approach. It involves Perspective Broker. Think of writing a web frontend to an application on the backend that exports functionality through pb. (the code may not work, I'm just making it up now without testing it) (I hope everybody is familiar with nevow.stan. If not, take a look at it. It's worth.)
@deferredGreenlet def renderPage(self): dataList = [ blockOn(self.backend.callRemote('getDataFromId', elementId)) for elementId in self.idList] return T.html[ T.body [ 'The result:', T.br, [ (txt, T.br) for txt in dataList ] ] ]
Without gthreadless:
def renderPage(self): dataList = [] def fetchDataRemotely(elementList): def cbFetch(elementData): dataList.append(elementData) if len(elementList) > 0: return fetchDataRemotely(elementList[1:]) return self.backend.callRemote('getDataFromId', elementId).addCallback(cbFetch) def cb(crap): return T.html[ T.body [ 'The result:', T.br, [ (txt, T.br) for txt in dataList ] ] ] return fetchDataRemotely(self.idList).addCallback(cb)
-- Twisted | Christopher Armstrong: International Man of Twistery Radix | -- http://radix.twistedmatrix.com | Release Manager, Twisted Project \\\V/// | -- http://twistedmatrix.com |o O| | w----v----w-+
On 8/30/05, Christopher Armstrong <radeex@gmail.com> wrote:
The effect that being able to call things that do context switches without explicitly marking them as doing so is much farther reaching than code readability. I consider it something of a feature that deferredGenerator forces you to know when context switches will happen at every level, and when I wrote gthreadless I was not intending it to be used in a way that didn't require that same knowledge at every level. And I don't think that this extra knowledge along the line isn't detracting at all to readability, but instead helping it.
I can agree on this.
And, just for onlookers, I'd like to point out that the code example below is not showing the difference between non-explicit-greenlets and explicit-defgen, but instead showing the difference between explicit-greenlets and plain ond deferreds, which most of us in the
You are perfectly right. After my first post the discussion went on privately between me and Nicola, so I should post some of it here, since it contains exactly such example. I'll elaborate a bit on my first example, that was as follows (just to remind and get started with the rest): @deferredGreenlet def renderPage(self): dataList = [ blockOn(self.backend.callRemote('getDataFromId', elementId)) for elementId in self.idList] return T.html[ T.body [ 'The result:', T.br, [ (txt, T.br) for txt in dataList ] ] ] I basically build a list of values obtained by performing subsequent calls to callRemote(), everytime passing a parameter from a list. Let's say that instead of a list of parameters we have a list of objects, and I build the list of values by calling a method on each of these objects. Like so: def renderPage(self): dataList = [ element.getData() for element in self.elementList ] return T.html[ T.body [ 'The result:', T.br, [ (txt, T.br) for txt in dataList ] ] ] As you can see, I took out @deferredGreenlet for now, because it's not needed. Let's say that self.elementList is made of objects defined like this: class ElementObject(object): def getData(self): return 1 Indeed @deferredGreenlet is not needed because I'm not even using pb. Now, let's say that in a new version of the software I introduce objects that in order to obtain the result of getData() have to go and query a remote server through pb. Things get more complicated now because getData() would return a deferred, while other objects would return a straight result. This difference is very uncomfortable to live with because you don't know how to treat the result. The are two ways out, and in both cases you have to change code you've already written: 1) return defer.succeed(1) instead of return 1 2) defer.maybeDeferred(element.getData()) instead of element.getData() In other words, as soon as a blocking method pops up among your methods, you're forced to change and treat all of them as blocking, even by making up deferred if needed. In any case, you're also compelled to change the code of renderPage() from synchronos style to asynchronous, unless you use waitForDeferred. I often found myself propagating maybeDeferred's back up several levels in my code, and I didn't like it, to tell the truth. On the other hand, if you use greenlets, you can keep renderPage the same exact way as I wrote it the first time, you just need to decorate it with @deferredGreenlet. And those methods that perform blocking calls, simply need to wrap the deferreds with blockOn, and that's it. Yes, I agree on the following point: code that you though was non blocking, now can all of a sudden become blocking. In this respect, yes, a gthreadless implementation that would force you to decorate every method along the way could help gaining awareness. But let's think about it: why is it so bad that a method that was supposed to be non blocking now becomes blocking? I can't think of anything else than shared resources that now can get accessed concurrently by other parts of code. Right? But this problem persists with pure-deferred programming style too! The problem of concurrent access to shared resources does not depend on the programming model being synchronous or asynchronous, but simply on the presence of blocking operations. You have to use locks if you want to protect a shared resource while you block on a lengthy operation, no matter what programming model you're using. So, I hope this example makes my point a little more clear. As I was saying with Nicola, I don't think gthreadless should be used everywhere, like it was a solution to some horrible problem with asynchronous programming. Using deferreds is just great and the awareness you gain of the internals of your implementation by using them is just irreplaceable. But *some* code snippets just come out so much more naturally if you write them using a synchronous model, that being able to mix the two is just a terrific feature I think. cheers, stefano
Stefano Masini wrote:
Believe me. I'm not one of those users in the "periphery of the Twisted community" thinking that asynchronous programming is too hard. I've been writing network code for quite a while and I matured the idea that threads get in your way back when I still didn't know python.
Thanks for the example, Stefano! I think that this serves as an excellent counterpoint to my blog ramblings, and I think your ideas about where using gthreadless is appropriate are right on target. I think it might even be appropriate to include this in Twisted, with some appropriate disclaimers about needing to understand Deferreds *first*... what do you think?
On 8/30/05, Glyph Lefkowitz <glyph@divmod.com> wrote:
I think it might even be appropriate to include this in Twisted, with some appropriate disclaimers about needing to understand Deferreds *first*... what do you think?
I think I agree. After all this is a discussion. If you have anything to share with us other than rudeness, like maybe some code that will help understand your point of view, I'll be the first one to change his mind, and maybe decide that gthreadless is useless.
Stefano Masini wrote:
I think I agree. After all this is a discussion. If you have anything to share with us other than rudeness, like maybe some code that will help understand your point of view, I'll be the first one to change his mind, and maybe decide that gthreadless is useless.
Erm... You seem to have interpreted my point sarcastically. I am not sure why it seemed rude, but I was being serious, and I thought, polite.
On 8/30/05, Glyph Lefkowitz <glyph@divmod.com> wrote:
You seem to have interpreted my point sarcastically. I am not sure why
... and this is the reply I was fearing would have arrived. :) I must really apologize, it always makes you feel really stupid to misinterpret politeness with sarcasm. I had this doubt ever since I posted my unfortunate reply... I'm not sure either, I guess it was defining your own blog posting as "ramblings" that made my head spin the wrong way. Sorry. :) By the way, back on track... sure! I'd be very happy to contribute to the twisted project. blushing, stefano
On Tue, 2005-08-30 at 08:47 +0200, Stefano Masini wrote:
Here is an example that makes justice to this approach. It involves Perspective Broker. Think of writing a web frontend to an application on the backend that exports functionality through pb. (the code may not work, I'm just making it up now without testing it) (I hope everybody is familiar with nevow.stan. If not, take a look at it. It's worth.)
@deferredGreenlet def renderPage(self): dataList = [ blockOn(self.backend.callRemote('getDataFromId', elementId)) for elementId in self.idList] return T.html[ T.body [ 'The result:', T.br, [ (txt, T.br) for txt in dataList ] ] ]
Let me give a counterargument to this specific example - regardless of whether it uses greenlets or just Deferreds: it's slow. Really in this case you'd want callRemote("getDataFromIds", self.idList), or at least to run all the getDataFromId() in parallel, *not* wait for one to finish before calling the other. That being said, I have seen code where something like greenlets or defgen really makes the code much easier to read, so having one of those is worthwhile.
On 8/30/05, Itamar Shtull-Trauring <itamar@itamarst.org> wrote:
Let me give a counterargument to this specific example - regardless of whether it uses greenlets or just Deferreds: it's slow. Really in this case you'd want callRemote("getDataFromIds", self.idList), or at least to run all the getDataFromId() in parallel, *not* wait for one to finish before calling the other.
Sure, I agree. That example should be taken as a proof of concept. But we can easily think of another situation, where the parties carry over some distributed computation that *does* require going back and forth with intermediate results, like some cryptography algorithm for example.
participants (5)
-
Christopher Armstrong
-
Glyph Lefkowitz
-
Itamar Shtull-Trauring
-
Nicola Larosa
-
Stefano Masini