[Twisted-Python] Should I use asynchronous programming in my own modules?
Hello, I'm rather new to twisted and asynchronous programming in general. Overall, I think I've understood the asynchronous programming model and its implications quite well. Nevertheless, there are some remaining questions. To give some example, I'd like to develop my own simplified document format in XML and a corresponding parser. The output of the parser (a specialized document object model) will be traversed and translated into HTML afterwards. This module could be useful outside any twisted application, of course. Instead of generating HTML one could develop a generator that produces LaTeX, for example. But it could also be used to render HTML pages in a twisted web application. The question is this: since parsing and generating large documents could block the reactor in a twisted app, should I use any of twisted's asynchronous programming features in this module (for better integration with twisted) or should I rather develop it in a traditional way and run it in a thread? The question came to my mind, because somewhere I read that long lasting operations in third party modules should be called in a thread. This is clear. I also read that if one has the opportunity to develop an application from scratch, one should rather go for using twisted's asynchronous programming features and divide long lasting operations into small chunks. In principal, this approach is clear to me, but does it also apply for modules which are entirely independent from twisted networking code? And if so, is there any way to decouple them from the twisted library for reuse in other applications? The last question is what criteria I could use to divide long lasting operations into chunks. In almost all books about asynchronous programming I only read that if they're too big, they could block the event loop. Of course, but how big is too big? And what's the measure for it? Milliseconds, number of operations, number of code lines - or what? Doesn't it depend entirely on the application at hand and how reactive it has to be? Moreover, depending on the hardware used, on a Pentium II less chunks can be processed at the same time than on a Athlon 64, for example. And couldn't chunks also be too small, spending more time than necessary in putting them into the reactor's queue, then maybe sorting them and then calling them? In case the overhead involved in scheduling some chunk is bigger than the processing time of the chunk itself, the chunks are too small, aren't they? Thanks in advance for any answers, Jürgen
On Thu, 2007-10-18 at 14:41 +0200, Jürgen Strass wrote:
I'm rather new to twisted and asynchronous programming in general. Overall, I think I've understood the asynchronous programming model and its implications quite well. Nevertheless, there are some remaining questions.
To give some example, I'd like to develop my own simplified document format in XML and a corresponding parser. The output of the parser (a specialized document object model) will be traversed and translated into HTML afterwards. This module could be useful outside any twisted application, of course. Instead of generating HTML one could develop a generator that produces LaTeX, for example. But it could also be used to render HTML pages in a twisted web application. The question is this: since parsing and generating large documents could block the reactor in a twisted app, should I use any of twisted's asynchronous programming features in this module (for better integration with twisted) or should I rather develop it in a traditional way and run it in a thread?
What you mean by "traditional" is actually a pull parser. Parsing APIs can be pull or push (i.e. asynchronous). Well-designed parsers are always push, because push parsers can be trivially converted to blocking pull parsers, but not vice-versa. Some examples of push/asynch parsers: twisted's Protocol class, or the SAX API. The key difference: a pull parser will *read* data from some data source with a blocking API. A push parser gets the data *pushed* to it by the user. So: write a push parser. You won't need to use any Twisted facilities. To make things a bit clearer - here's how you convert a push parser into a pull parser: def parse(f): parser = MyParser() for line in f: parser.push(line) return parser.result() In Twisted, a push parser will often get data pushed to it from Protocol.dataReceived.
Itamar Shtull-Trauring wrote:
[...] What you mean by "traditional" is actually a pull parser. Parsing APIs can be pull or push (i.e. asynchronous). Well-designed parsers are always push, because push parsers can be trivially converted to blocking pull parsers, but not vice-versa. Some examples of push/asynch parsers: twisted's Protocol class, or the SAX API. Sorry, I think my example was somewhat misleading and it also becomes clear to me that I haven't used the word "asynchronous" correctly. I didn't consider that one can also register callbacks with a parser, for example, and call this type of programming asynchronous. (The principle "Don't call us, we call you" would apply here, too, of course.)
No, what I really meant by "traditional" was to write parsers and generators which traverse the document as a whole in one large step, without giving a chance to the twisted reactor to process any other events. Let's assume I got a dom tree from pythons XML parser. First, I'd traverse that tree and build up another tree consisting of element objects. Each element object is an instance of a class corresponding to a tag, for example for tag <chapter> I'd create a class "chapter". This is necessary because there's not always a one-to-one correspondence between tags and my document elements and to associate some additional attributes with such elements later, for example automatically generated chapter numbers. I'd then use the generator to traverse that element tree, calling "render_element" methods on my way. For element chapter with attribute title I'd call "render_chapter( node )", which then generates "<h1>chapter_title</h1>". Let's assume I had some element with child elements. Without knowing about twisted at all, I'd have created a foreach loop to process each child like this: foreach child_node in root_elem.children: if child_node.type = chapter: processChapter( child_node ) My idea now is that depending on the number of child elements, looping could take some time. So instead I'd use twisted's reactor, specifically its callLater method like this (it's only pseudo code!): class Generator: def generate_html( self ): self.d = defer.Deferred() self.startProcessing() return self.d def startProcessing( self ): self.current_element = root_elem self.processNextElement() def processNextElement( self ): if more elements to process: if current_element.type = chapter reactor.callLater( 0, processChapter, current_element ) ..... else: d.callback( "finished" ) In this way any twisted user could get a Deferred from the generate_html method and get called when the Generator has generated all HTML. The problem with this is that I couldn't ever use such code without also installing twisted, of course. It's more or less clear to me how to divide the traversal of such a dom tree into discrete steps, but it's not so clear how to call the processNextElement with reactor.callLater from the outside. Although, after I've read the other answers, it seems to me I'm not far from a solution. I think I could also create two classes: the Generator class, which would provide a processNextElement method and doesn't need to depend on the twisted framework, and a TwistedGenerator class, which would do exactly the same like the code above and repeatedly call processNextElement with reactor.callLater. But the internal housekeeping which element to process next could be more difficult than with the solution above, couldn't it? (Because instead of seperate methods like "processChapter", "processList", etc. I'd only have one method to call from outside, "processNextElement" (and something like "moreElementsToProcess"). The TwistedGenerator wrapper shouldn't know about the internal state of the Generator, I think.) Many greetings, Jürgen
Jürgen Strass wrote:
My idea now is that depending on the number of child elements, looping could take some time. So instead I'd use twisted's reactor, specifically its callLater method like this (it's only pseudo code!):
class Generator:
def generate_html( self ): self.d = defer.Deferred() self.startProcessing() return self.d
def startProcessing( self ): self.current_element = root_elem self.processNextElement()
def processNextElement( self ): if more elements to process: if current_element.type = chapter reactor.callLater( 0, processChapter, current_element ) ..... else: d.callback( "finished" )
This is not so good. You have taken away your users option to control which thread this processing is performed in, because it has to run in the reactor's thread to avoid breaking Deferred's threading rules. Also your users are not getting any benefit from the incremental nature of this code. For example they cant get access to the first chunk of html until after callback("finished"). At least, not without more complexity. They cant decide to stop processing early because they were only interested in the html <head>.
It's more or less clear to me how to divide the traversal of such a dom tree into discrete steps
It seems like you are confusing this goal - the ability to perform work in incremental steps - with the use of twisted's reactor to schedule those steps. Set twisted aside for a moment. I propose one good pythonic interface to your html creation code may be a generate_html_iter() method, which returns an iterator over the documents html fragments. You can implement this using the processNextElement approach you suggested, although python generator functions may be more convenient. A thin wrapper around this iterator could use reactor.callLater to schedule it, then fire a callback when complete. An equally thin wrapper could use PostMessage to calculate the document in the background of a win32 gui. Or it could feed a pull producer. Alternatively it could be run in another thread with deferToThread(lambda:''.join(g.generate_html_iter())) I hope this helps,
Toby Dickenson wrote:
[...]
Set twisted aside for a moment. I propose one good pythonic interface to your html creation code may be a generate_html_iter() method, which returns an iterator over the documents html fragments. You can implement this using the processNextElement approach you suggested, although python generator functions may be more convenient.
A thin wrapper around this iterator could use reactor.callLater to schedule it, then fire a callback when complete. An equally thin wrapper could use PostMessage to calculate the document in the background of a win32 gui. Or it could feed a pull producer. Alternatively it could be run in another thread with deferToThread(lambda:''.join(g.generate_html_iter()))
I hope this helps,
Some seconds before I read your message I've posted my reply to Christopher Armstrong. This message contains some code of mine that already uses some sort of wrapper class. Though it doesn't use an iterator and maybe isn't as pythonic, it wraps a synchronous interface. Not sure if this is similar to the thin wrapper you had in mind? Thank you very much, Jürgen
On 10/18/07, Jürgen Strass <jrg718@gmx.net> wrote:
To give some example, I'd like to develop my own simplified document format in XML and a corresponding parser. ... This module could be useful outside any twisted application, of course. ... The question is this: since parsing and generating large documents could block the reactor in a twisted app, should I use any of twisted's asynchronous programming features in this module (for better integration with twisted) or should I rather develop it in a traditional way and run it in a thread?
You don't need to make the module depend on Twisted, but you also don't need to force users to use a thread. Just make sure the library knows how to parse and process incrementally; then your asynchronous users can pass in chunks of data as they receive them and your other users can pass in everything at once. Basically, it's a matter of inverting your library's loop that would otherwise go "read; process;" to "when process_more_data is called, process that data".
The last question is what criteria I could use to divide long lasting operations into chunks. In almost all books about asynchronous programming I only read that if they're too big, they could block the event loop. Of course, but how big is too big? And what's the measure for it? Milliseconds, number of operations, number of code lines - or what? Doesn't it depend entirely on the application at hand and how reactive it has to be?
"Bigness" here refers to time spent. And yes, it depends on the application. -- Christopher Armstrong International Man of Twistery http://radix.twistedmatrix.com/ http://twistedmatrix.com/ http://canonical.com/
Christopher Armstrong wrote:
You don't need to make the module depend on Twisted, but you also don't need to force users to use a thread. Just make sure the library knows how to parse and process incrementally; then your asynchronous users can pass in chunks of data as they receive them and your other users can pass in everything at once. Basically, it's a matter of inverting your library's loop that would otherwise go "read; process;" to "when process_more_data is called, process that data".
As I already explained in my replies to Jean-Paul Calderone and Itamar Shtull-Trauring, I'm not sure if I have fully understood how this works in particular. The example of a parser probably wasn't well chosen, because in all replies to my original posting people seem to assume that I need to do something in response to an external event (e.g. I/O events). Please, better think of a long running CPU bound algorithm perhaps. From twisted, I'd like to start that algorithm in the background and get notified when it has finished. It shouldn't ever block the application for too long. In other applications, I'd like to call the algorithm in a synchronous way. It is most likely that my difficulty in understanding results from the fact that in such a case there is no external event that could trigger the processing of the individual steps the algorithm has. The best I've come up with so far uses a wrapper around a synchronous interface that provides methods for calling the next step of the algorithm and for testing if the algorithm has finished. What I'd like to know is if this solution basically is the style one has to use when writing algorithms divided into chunks and if the approach to integrate it into twisted is well chosen. Also, I don't fully understand the implications this programming style has on long running loops. I have the impression that I would need to divide loops into several methods: loopInit, loopCondition, loopNextStep and loopFinished. Is this correct? Following is my code for a counter class which also uses the wrapper class I've described above. Using the wrapper class, it is possibly to run several counters concurrently. The Counter class is already decoupled from twisted. In case I want to use the synchronous version, I'd use Counter, in case I want to use the asynchronous version with deferreds, I'd use TwistedCounter. ---- from twisted.internet import reactor, defer class Counter: def __init__( self, id, limit ): self.id = id self.limit = limit self.count = 0 def limitNotReached( self ): return self.count < self.limit def increment( self ): if self.limitNotReached(): print "Counter %d: %d" % ( self.id, self.count ) self.count = self.count + 1 def run( self ): while self.limitNotReached(): self.increment() class TwistedCounter: def __init__( self, counter ): self.counter = counter def run( self ): self.d = defer.Deferred() self.increment() return self.d def increment( self ): if self.counter.limitNotReached(): self.counter.increment() reactor.callLater( 0, self.increment ) else: self.d.callback( "finished" ) def counters_finished( result ): print " traditional counters finished." reactor.stop() print "-- Run traditional counter:\n" traditional_c = Counter( 1, 130 ) traditional_c.run() print "\n-- Run twisted counters concurrently:\n" c1 = TwistedCounter( Counter( 2, 100 ) ) c2 = TwistedCounter( Counter( 3, 150 ) ) d1 = c1.run() d2 = c2.run() d = defer.DeferredList( [ d1, d2 ] ) d.addCallback( counters_finished ) reactor.run() ---- What I can't fully imagine is if I can use this technique in all possible cases. In case I need to nest calls or algorithms, I'll run into the problem of not being able to call any of the asynchronous wrappers from the synchronous parts of my library, because those wrappers return deferreds which are part of twisted. So I'd end up with two versions of my library again. Jürgen
On Thu, 18 Oct 2007 14:41:38 +0200, Jürgen Strass <jrg718@gmx.net> wrote:
Hello,
I'm rather new to twisted and asynchronous programming in general. Overall, I think I've understood the asynchronous programming model and its implications quite well. Nevertheless, there are some remaining questions.
To give some example, I'd like to develop my own simplified document format in XML and a corresponding parser. The output of the parser (a specialized document object model) will be traversed and translated into HTML afterwards. This module could be useful outside any twisted application, of course. Instead of generating HTML one could develop a generator that produces LaTeX, for example. But it could also be used to render HTML pages in a twisted web application.
Have you seen Lore?
The question is this: since parsing and generating large documents could block the reactor in a twisted app, should I use any of twisted's asynchronous programming features in this module (for better integration with twisted) or should I rather develop it in a traditional way and run it in a thread?
Incremental parsing is often useful and simpler than the alternative. If you are accepting a document over the network, why buffer it yourself and then parse it when you could just be giving each piece directly to the parser? Done this way, it often is the case that even large documents can be parsed without blocking for an unreasonable amount of time.
The question came to my mind, because somewhere I read that long lasting operations in third party modules should be called in a thread. This is clear. I also read that if one has the opportunity to develop an application from scratch, one should rather go for using twisted's asynchronous programming features and divide long lasting operations into small chunks.
The CPU differs from the network. There are rarely points in a CPU-bound task where suspending to work on something else would not be an arbitrary decision. When dealing with the network, these points are obvious and not at all arbitrary. So, when dealing with the network, it's almost unarguable that you should use Twisted's APIs instead of using blocking APIs. However, Twisted doesn't provide any functionality specifically for breaking up CPU-bound tasks, primarily because any such functionality would be arbitrary.
In principal, this approach is clear to me, but does it also apply for modules which are entirely independent from twisted networking code? And if so, is there any way to decouple them from the twisted library for reuse in other applications?
It's typically trivial to drive code written to be used asynchronously in a synchronous manner. The opposite is rarely, if ever, true. Consider a parser API which consists of a "feed" method taking a string giving some more bytes from the input document. You can use this by passing in small chunks repeatedly until the entire document has been passed in, or you can pass in the entire document at once. Now consider an API where the entire document must be supplied at once: how do you use that without blocking?
The last question is what criteria I could use to divide long lasting operations into chunks. In almost all books about asynchronous programming I only read that if they're too big, they could block the event loop. Of course, but how big is too big? And what's the measure for it? Milliseconds, number of operations, number of code lines - or what? Doesn't it depend entirely on the application at hand and how reactive it has to be?
Yes.
Moreover, depending on the hardware used, on a Pentium II less chunks can be processed at the same time than on a Athlon 64, for example.
True as well. However, is your primary goal to provide ideal scheduling behavior both on a CPU released this year and a CPU released ten years ago?
And couldn't chunks also be too small, spending more time than necessary in putting them into the reactor's queue, then maybe sorting them and then calling them? In case the overhead involved in scheduling some chunk is bigger than the processing time of the chunk itself, the chunks are too small, aren't they?
Correct again. These problems can all be mitigated, at least partially, by allowing the application to decide how much work is done at once. Parsing one byte from an input document should take less time than parsing one megabyte. Let the application decide how much work is done at a time. Size of input is only one way in which this can be controlled. You could support explicit tuning of these parameters with a dedicated API, or you could support stepwise processing and let the application explicitly step it as far as it wants to at a time. In this direction, there are some extremely primitive tools in twisted.internet.task. They will not solve the problem for you, but they may give you some ideas or save you a bit of typing.
Thanks in advance for any answers, Jürgen
Jean-Paul
Jean-Paul Calderone schrieb:
On Thu, 18 Oct 2007 14:41:38 +0200, Jürgen Strass <jrg718@gmx.net> wrote:
To give some example, I'd like to develop my own simplified document format in XML and a corresponding parser. [...]
Have you seen Lore?
Not yet. I'll have a look at it, though. I guess to some degree I'm reinventing the wheel, but I rather see this a an exercise.
The question is this: since parsing and generating large documents could block the reactor in a twisted app, should I use any of twisted's asynchronous programming features in this module (for better integration with twisted) or should I rather develop it in a traditional way and run it in a thread? Incremental parsing is often useful and simpler than the alternative. If you are accepting a document over the network, why buffer it yourself and then parse it when you could just be giving each piece directly to the parser? Done this way, it often is the case that even large documents can be parsed without blocking for an unreasonable amount of time. Okay, I agree. To get away somewhat from parsers, the question arises for other programming tasks as well, of course.
So I already asked myself how one would translate the example of a factorial function in twisted's core documentation to use the reactor's scheduling mechanism instead of running it in a thread. I think an example of how to divide it into chunks and how to use the reactor would be great. What I tried at first was programming a simple counter this way. It would look much similar to the code I presented in reply to Itamar Shtull-Trauring's answer. What I'm not sure about is if this is the correct way to go for. Many thanks for all the other points you've answered, it made a lot of things much clearer to me. Jürgen
Jürgen Strass wrote:
[..]
So I already asked myself how one would translate the example of a factorial function in twisted's core documentation to use the reactor's scheduling mechanism instead of running it in a thread. I think an example of how to divide it into chunks and how to use the reactor would be great.
Hi, ok, as I understand you are *not* talking about "accepting documents over the network" or similar, but rather about independent, long running, CPU-bound tasks. While you could split them up into chunks allowing the reactor to do it's work, there are a number of arguments against doing so. If you don't need intermediate results, and don't feed your engine incrementally with more data, the way to go is neither 'chunking' nor using threads, but rather using worker-processes, because pythons threads are only well-suited for IO-bound stuff, that cannot be done asynchronously (e.g. if you have to use some blocking db-interface because there's no alternative asynchronous implementation) and function calls in python are expensive. If you split up the work to do it in the reactor-thread, and it requires heavy processing, there will be many unnecessary calls switching between the reactor and your engine. If you instead have a tight loop (or whatever) in a different process, those calls would be saved, and on a multi-core or multi-processor system you could use its additional processing power. So use IPC to communicate with worker(s), and let them spit out the result as fast as possible, instead of unnecessarily slowing down both your calculation and the networking part without the option to parallelize processing. The decisive aspect is interactivity - if you need it for your processing 'chunking' it up is the way to go, if not, use another process. If you don't really need to process events, but still want to do some kind of streaming the decision is not that easy. If you need parallel processing anyway you have no choice but to use some kind of IPC. The safe bet is probably designing your code to be able to be processed in chunks, and then to run it in a separate process. I think the main misunderstanding is "[..]to use the reactor's scheduling mechanism instead of running it in a thread." Twisteds reactor is not a superior multi-purpose scheduler (as JP mentioned), but a domain-specific event handler for networking. While your use-case might (that's my guess) profit from choosing 'chunking' over pythons threading, it still wouldn't from choosing it over the scheduling of your OS. hm, did I get you right there? Johann
What I tried at first was programming a simple counter this way. It would look much similar to the code I presented in reply to Itamar Shtull-Trauring's answer. What I'm not sure about is if this is the correct way to go for.
Many thanks for all the other points you've answered, it made a lot of things much clearer to me.
Jürgen
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Johann Borck schrieb:
[...] I think the main misunderstanding is "[..]to use the reactor's scheduling mechanism instead of running it in a thread." Twisteds reactor is not a superior multi-purpose scheduler (as JP mentioned), but a domain-specific event handler for networking. While your use-case might (that's my guess) profit from choosing 'chunking' over pythons threading, it still wouldn't from choosing it over the scheduling of your OS.
hm, did I get you right there?
Oh yes, you're right. The whole time I thought of the reactor as a multi-purpose scheduler and didn't get JP's answer right. The misunderstanding in part results from the core documentation. In chapter 1.3, it is said: "This document will give you a high level overview of concurrent programming (interleaving several tasks) and of Twisted's concurrency model: non-blocking code or asynchronous code." Then, two examples follow: (1) CPU bound tasks and (2) tasks that wait for data. Because I thought the introduction applied to both type of examples, I also assumed that Twisted's concurrency model would apply for both types of tasks. Of course, I already wondered about not being able to find any examples for (1), while the whole rest of the docs deals with (2). ;-) Moreover, I once read Douglas Schmidt's book "Pattern Oriented Software Architecture (2)", which describes several patterns - including the reactor pattern - for middleware-oriented applications. The book led to some confusion on mine about when to use which pattern and what concurrency mechanism would be best for a particular situation. The - maybe wrong - conclusion I've drawn from that book is that context switching overhead (be it threads or processes) isn't only bad for I/O bound tasks, but also for most other concurrent tasks. As you said, function calls in python are expensive, nevertheless I thought they were less expensive than the overhead caused by context switching between threads and processes, at least on a single processor system. Or have I made a mistake here? Moreover, couldn't the creation of whole new processes be even more expensive? I mean, with "long" running algorithms I really meant tasks that could take some minutes. Processes would be very fine here. But what about - for example - CPU bound tasks that only take some hundred milliseconds, but nevertheless would block the reactor? Would you use processes in this case, too? Maybe prespawned processes? Or should I use rather threads in such a case? Many thanks for your enlighting reply, Jürgen
participants (6)
-
Christopher Armstrong
-
Itamar Shtull-Trauring
-
Jean-Paul Calderone
-
Johann Borck
-
Jürgen Strass
-
Toby Dickenson