[Twisted-Python] integrating CompStrm//adding background processing

I've been working on integrating Compstrm ( http://compstrm.sourceforge.net ) and, while integration was pretty easy, it got harder when I wanted to speed things up. Basicly, compstrm uses yields to impliment a kind of light-weight threads. So I needed to add background processing to the main reactor loop. Here's the code I finally came up with: from twisted.internet import reactor def _runUntilCurrentNew(): if reactor.poll: p=reactor.poll reactor.poll=None p() _runUntilCurrentOld() _runUntilCurrentOld=reactor.runUntilCurrent reactor.runUntilCurrent=_runUntilCurrentNew reactor.poll=None def _timeoutNew(): if reactor.poll: return 0 return _timeoutOld _timeoutOld=reactor.timeout reactor.timeout=_timeoutNew Just using reactor.callLater, I could only get a speed of 90, in contrast to the asyncore integration which was doing better than 12,000. By replacing runUntilCurrent and timeout, I managed to bump my speed up to better than 8,000, which seems reasonable, as Twisted is a bit more "heavy weight" than asyncore. ;-) While I'm at it, here's my revised takedown code: class whenNoDelayedCalls: "I check for when there are no delayed calls." def __init__(self,granularity=1.0,func=reactor.stop): self.func=func self.granularity=granularity reactor.callLater(granularity,self) def __call__(self): c=len(reactor.getDelayedCalls()) if c or reactor.poll: reactor.callLater(self.granularity,self) else: self.func() def pollLoop(granularity=1.0,func=reactor.stop): "I run the reactor until there are no more delayed calls." whenNoDelayedCalls(granularity,func) reactor.run() This gives me an approximate equivalent to the asyncore poll loop, at least when there's no threads or sockets running. ;-) Bill la Forge http://www.geocities.com/laforge49/ Yahoo! India Matrimony: Find your partner online.

Hi Bill, Bill la Forge wrote:
I appreciate reports of performance issues, and I'm sure the reactor could be sped up a bit - but this report seemed a bit vague, and as far as a few helpful folks on #twisted could tell, wrong.
90 whats per what on what? How exactly did you measure this? We assumed you meant "90 trivial calls per second" and "on relatively recent hardware". The only test we did that made this radical kind of difference was inserting a 'print' statement into the "trivial" callback. The numbers for callLaters-per-second on various hardware we had lying around, mostly around ~2GhZ athlons, were in the 15000-25000 range.
Also I'm not sure why your code would have sped up the reactor. Would you mind sending your code in the form of two complete Python programs that will demonstrate the difference in speed between your newer callLater code and the existing reactor?

Sorry for the confusion. In the CompStrm project, I'm developing an alternative asyn programming style, in an attempt to make such code easier to read/maintain. Like flow, it is based on the yield statement. Unlike flow, it requires a mechanism for executing code in the background of the main thread. So I'm not speeding up Twisted in any way. Rather, I'm adding a feature, all be it one that could be easily abused, by allowing for the inclusion of an additional function call in the reactor main loop. My initial approach, of using callLater for successive invocations, was flawed because of the limited granularity of time available on a PC. This means that I was only able to capture control for a very small number of main loop cycles. But by inserting an additional method call, I am now able to perform a little background processing with every cycle. As for the timing, I'm counting the number of items I can pass on an async pipe between two virtual processes running in the background of the main thread. Details for running over asyncore are available here: http://compstrm.sourceforge.net/timing.html Bill Glyph Lefkowitz <glyph@divmod.com> wrote: Hi Bill, I appreciate reports of performance issues, and I'm sure the reactor could be sped up a bit - but this report seemed a bit vague, and as far as a few helpful folks on #twisted could tell, wrong.
90 whats per what on what? How exactly did you measure this? We assumed you meant "90 trivial calls per second" and "on relatively recent hardware". The only test we did that made this radical kind of difference was inserting a 'print' statement into the "trivial" callback. The numbers for callLaters-per-second on various hardware we had lying around, mostly around ~2GhZ athlons, were in the 15000-25000 range. Yahoo! India Matrimony: Find your partner online.

On Sun, 2004-05-16 at 22:43, Bill la Forge wrote:
The suggested way for doing highly frequent calls to scheduled events in twisted is twisted.internet.task.LoopingCall. It does not have this flaw. -- Itamar Shtull-Trauring http://itamarst.org

It isn't that I want to call a function with a particular frequency. Rather, I want to call a function once per main loop. Further, when the function is available to be called, I want to run with a timeout of 0 to further increase the frequency of calls. LoopingCall still uses callLater which simply can not deliver this type of service. My view on things is that, while blocking I/O is best put in another thread, computations that can be broken into small pieces (putting yield statements in every loop, for example) can run very nicely in the background. Between each chunck of processing, you check for any timer events and for I/O completions, of course. But you want to execute these small chuncks AS FREQUENTLY AS POSSIBLE. But like I said, I'm developing a new style of async programming. So its bound to be something of a heresy. ;-) Bill Itamar Shtull-Trauring <itamar@itamarst.org> wrote: The suggested way for doing highly frequent calls to scheduled events in twisted is twisted.internet.task.LoopingCall. Yahoo! India Matrimony: Find your partner online.

Bill la Forge wrote:
Indeed, it sounds like what you're doing doesn't make any sense. Why do you need to call something every mainloop iteration? That's exactly the worst way to write asynchronous code. Polling sucks. :) Maybe you can explain more about what you're doing? What *applications* do you have in mind that this thing would be good for? (I notice that your web site's first sentence mentions how great it would be to create applications from components, but the web site doesn't mention anything at all about existing or even theoretical applications). -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://radix.twistedmatrix.com/

If it looks like polling and acts like polling, when is it not polling? --When its incrimental computing! I've updated the project web pages to cover the Twisted integration: http://compstrm.sourceforge.net But It really does not address this issue. Polling, or rather, unnecessary polling is EVIL, especially in a framework like Twisted. Its an even greater sin than using BLOCKING I/O for reading email files that could be very very large. ;-) But polling is when you are checking something, like I/O completion. That's NOT what is happening in CompStrm. Instead, CompStrm is executing (hopefully!) useful application code, in small chunks. When is this useful? Well, when you have some long-running application code, the alternative is to execute it on a separate thread. Indeed, this is a good solution. Except that not everyone can debug some of the nasties that occur. And then you've got all that overhead and the delays that seem to be part of inter-task communication. So you start optimizing to pass large chuncks between threads and, woops, there goes your response time. (The old fast or cheap choice.) CompStrm also integrates well with async I/O, allowing you to untwist your logic a bit. Have you read anything about Stackless Python? This is similar, but builds on Python Generators instead. Does any of this help? And as for applications, indeed, CompStrm developed from the latest requests my client has given me. I've got a client/server app with an asynchronous interface used to compute just-in-time displays of multiple streams. Now he wants the existing operations pushed into scripts. With compstrm, I can have light-weight child processes sharing the same I/O streams. I want to port all this to Twisted, but need several additions: CompStrm and a good bsddb integration. Now that I've completed the Twisted/CompStrm integration, I'll turn to Twisted/bsddb integration. Be assured, I will not include CompStrm in that integration!!! (It should be generally useful, and provide a means of reading potentially large data using server threads, while minimizing inter-thread overheads.) Bill Christopher Armstrong <radix@twistedmatrix.com> wrote: Indeed, it sounds like what you're doing doesn't make any sense. Why do you need to call something every mainloop iteration? That's exactly the worst way to write asynchronous code. Polling sucks. :) Maybe you can explain more about what you're doing? What *applications* do you have in mind that this thing would be good for? (I notice that your web site's first sentence mentions how great it would be to create applications from components, but the web site doesn't mention anything at all about existing or even theoretical applications). Yahoo! India Matrimony: Find your partner online.

On Sun, 2004-05-16 at 22:43, Bill la Forge wrote:
That code doesn't run - for starters, there is no variable 'd' in WriteMany.cs, endWrite does not appear to take an argument, I think readCount() is supposed to be ReadCount() in test(); it also doesn't appear to test the twcs module, but rather only the asyncore loop. Can you package some easy-to-run tests with your next release?

On Mon, 2004-05-17 at 08:08, Glyph Lefkowitz wrote:
Well, this was an intriguing performance problem, and one that likely impacts my work, so I went ahead and fixed the tests. Attached is a modified copy of twcs.py from the may 16th distribution of compstrm on sf.net, a twcsperf.py that tests it, and a patch to Twisted that may be good to consider including. This patch special-cases a 0 argument to callLater to bypass the incredibly expensive gettimeofday syscall that we end up making as a result. I _think_ this is safe but I haven't run the test suite on it yet. On my machine, running the tests twice in a row: glyph@kazekage:~/Desktop% python twcsperf.py Pristine Twisted 7.85587286949e-05 12729.3302299 per second callLater-Patched Twisted 8.00807499886e-05 12487.3955369 per second glyph@kazekage:~/Desktop% python twcsperf.py Pristine Twisted 7.8611869812e-05 12720.7252848 per second callLater-Patched Twisted 7.80673313141e-05 12809.4554171 per second YMMV, but I believe this effectively eliminates any performance difference for your use case. (Without the patch, I was getting more like "8000 per second".)

Many thanks! I'll dig into this tomorrow. (I'm in India, so your day is my night. ;-) Bill Glyph Lefkowitz <glyph@divmod.com> wrote: On Mon, 2004-05-17 at 08:08, Glyph Lefkowitz wrote:
Well, this was an intriguing performance problem, and one that likely impacts my work, so I went ahead and fixed the tests. Attached is a modified copy of twcs.py from the may 16th distribution of compstrm on sf.net, a twcsperf.py that tests it, and a patch to Twisted that may be good to consider including. This patch special-cases a 0 argument to callLater to bypass the incredibly expensive gettimeofday syscall that we end up making as a result. I _think_ this is safe but I haven't run the test suite on it yet. Yahoo! India Matrimony: Find your partner online.

Hi Bill, Bill la Forge wrote:
I appreciate reports of performance issues, and I'm sure the reactor could be sped up a bit - but this report seemed a bit vague, and as far as a few helpful folks on #twisted could tell, wrong.
90 whats per what on what? How exactly did you measure this? We assumed you meant "90 trivial calls per second" and "on relatively recent hardware". The only test we did that made this radical kind of difference was inserting a 'print' statement into the "trivial" callback. The numbers for callLaters-per-second on various hardware we had lying around, mostly around ~2GhZ athlons, were in the 15000-25000 range.
Also I'm not sure why your code would have sped up the reactor. Would you mind sending your code in the form of two complete Python programs that will demonstrate the difference in speed between your newer callLater code and the existing reactor?

Sorry for the confusion. In the CompStrm project, I'm developing an alternative asyn programming style, in an attempt to make such code easier to read/maintain. Like flow, it is based on the yield statement. Unlike flow, it requires a mechanism for executing code in the background of the main thread. So I'm not speeding up Twisted in any way. Rather, I'm adding a feature, all be it one that could be easily abused, by allowing for the inclusion of an additional function call in the reactor main loop. My initial approach, of using callLater for successive invocations, was flawed because of the limited granularity of time available on a PC. This means that I was only able to capture control for a very small number of main loop cycles. But by inserting an additional method call, I am now able to perform a little background processing with every cycle. As for the timing, I'm counting the number of items I can pass on an async pipe between two virtual processes running in the background of the main thread. Details for running over asyncore are available here: http://compstrm.sourceforge.net/timing.html Bill Glyph Lefkowitz <glyph@divmod.com> wrote: Hi Bill, I appreciate reports of performance issues, and I'm sure the reactor could be sped up a bit - but this report seemed a bit vague, and as far as a few helpful folks on #twisted could tell, wrong.
90 whats per what on what? How exactly did you measure this? We assumed you meant "90 trivial calls per second" and "on relatively recent hardware". The only test we did that made this radical kind of difference was inserting a 'print' statement into the "trivial" callback. The numbers for callLaters-per-second on various hardware we had lying around, mostly around ~2GhZ athlons, were in the 15000-25000 range. Yahoo! India Matrimony: Find your partner online.

On Sun, 2004-05-16 at 22:43, Bill la Forge wrote:
The suggested way for doing highly frequent calls to scheduled events in twisted is twisted.internet.task.LoopingCall. It does not have this flaw. -- Itamar Shtull-Trauring http://itamarst.org

It isn't that I want to call a function with a particular frequency. Rather, I want to call a function once per main loop. Further, when the function is available to be called, I want to run with a timeout of 0 to further increase the frequency of calls. LoopingCall still uses callLater which simply can not deliver this type of service. My view on things is that, while blocking I/O is best put in another thread, computations that can be broken into small pieces (putting yield statements in every loop, for example) can run very nicely in the background. Between each chunck of processing, you check for any timer events and for I/O completions, of course. But you want to execute these small chuncks AS FREQUENTLY AS POSSIBLE. But like I said, I'm developing a new style of async programming. So its bound to be something of a heresy. ;-) Bill Itamar Shtull-Trauring <itamar@itamarst.org> wrote: The suggested way for doing highly frequent calls to scheduled events in twisted is twisted.internet.task.LoopingCall. Yahoo! India Matrimony: Find your partner online.

Bill la Forge wrote:
Indeed, it sounds like what you're doing doesn't make any sense. Why do you need to call something every mainloop iteration? That's exactly the worst way to write asynchronous code. Polling sucks. :) Maybe you can explain more about what you're doing? What *applications* do you have in mind that this thing would be good for? (I notice that your web site's first sentence mentions how great it would be to create applications from components, but the web site doesn't mention anything at all about existing or even theoretical applications). -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://radix.twistedmatrix.com/

If it looks like polling and acts like polling, when is it not polling? --When its incrimental computing! I've updated the project web pages to cover the Twisted integration: http://compstrm.sourceforge.net But It really does not address this issue. Polling, or rather, unnecessary polling is EVIL, especially in a framework like Twisted. Its an even greater sin than using BLOCKING I/O for reading email files that could be very very large. ;-) But polling is when you are checking something, like I/O completion. That's NOT what is happening in CompStrm. Instead, CompStrm is executing (hopefully!) useful application code, in small chunks. When is this useful? Well, when you have some long-running application code, the alternative is to execute it on a separate thread. Indeed, this is a good solution. Except that not everyone can debug some of the nasties that occur. And then you've got all that overhead and the delays that seem to be part of inter-task communication. So you start optimizing to pass large chuncks between threads and, woops, there goes your response time. (The old fast or cheap choice.) CompStrm also integrates well with async I/O, allowing you to untwist your logic a bit. Have you read anything about Stackless Python? This is similar, but builds on Python Generators instead. Does any of this help? And as for applications, indeed, CompStrm developed from the latest requests my client has given me. I've got a client/server app with an asynchronous interface used to compute just-in-time displays of multiple streams. Now he wants the existing operations pushed into scripts. With compstrm, I can have light-weight child processes sharing the same I/O streams. I want to port all this to Twisted, but need several additions: CompStrm and a good bsddb integration. Now that I've completed the Twisted/CompStrm integration, I'll turn to Twisted/bsddb integration. Be assured, I will not include CompStrm in that integration!!! (It should be generally useful, and provide a means of reading potentially large data using server threads, while minimizing inter-thread overheads.) Bill Christopher Armstrong <radix@twistedmatrix.com> wrote: Indeed, it sounds like what you're doing doesn't make any sense. Why do you need to call something every mainloop iteration? That's exactly the worst way to write asynchronous code. Polling sucks. :) Maybe you can explain more about what you're doing? What *applications* do you have in mind that this thing would be good for? (I notice that your web site's first sentence mentions how great it would be to create applications from components, but the web site doesn't mention anything at all about existing or even theoretical applications). Yahoo! India Matrimony: Find your partner online.

On Sun, 2004-05-16 at 22:43, Bill la Forge wrote:
That code doesn't run - for starters, there is no variable 'd' in WriteMany.cs, endWrite does not appear to take an argument, I think readCount() is supposed to be ReadCount() in test(); it also doesn't appear to test the twcs module, but rather only the asyncore loop. Can you package some easy-to-run tests with your next release?

On Mon, 2004-05-17 at 08:08, Glyph Lefkowitz wrote:
Well, this was an intriguing performance problem, and one that likely impacts my work, so I went ahead and fixed the tests. Attached is a modified copy of twcs.py from the may 16th distribution of compstrm on sf.net, a twcsperf.py that tests it, and a patch to Twisted that may be good to consider including. This patch special-cases a 0 argument to callLater to bypass the incredibly expensive gettimeofday syscall that we end up making as a result. I _think_ this is safe but I haven't run the test suite on it yet. On my machine, running the tests twice in a row: glyph@kazekage:~/Desktop% python twcsperf.py Pristine Twisted 7.85587286949e-05 12729.3302299 per second callLater-Patched Twisted 8.00807499886e-05 12487.3955369 per second glyph@kazekage:~/Desktop% python twcsperf.py Pristine Twisted 7.8611869812e-05 12720.7252848 per second callLater-Patched Twisted 7.80673313141e-05 12809.4554171 per second YMMV, but I believe this effectively eliminates any performance difference for your use case. (Without the patch, I was getting more like "8000 per second".)

Many thanks! I'll dig into this tomorrow. (I'm in India, so your day is my night. ;-) Bill Glyph Lefkowitz <glyph@divmod.com> wrote: On Mon, 2004-05-17 at 08:08, Glyph Lefkowitz wrote:
Well, this was an intriguing performance problem, and one that likely impacts my work, so I went ahead and fixed the tests. Attached is a modified copy of twcs.py from the may 16th distribution of compstrm on sf.net, a twcsperf.py that tests it, and a patch to Twisted that may be good to consider including. This patch special-cases a 0 argument to callLater to bypass the incredibly expensive gettimeofday syscall that we end up making as a result. I _think_ this is safe but I haven't run the test suite on it yet. Yahoo! India Matrimony: Find your partner online.
participants (4)
-
Bill la Forge
-
Christopher Armstrong
-
Glyph Lefkowitz
-
Itamar Shtull-Trauring