[Twisted-Python] profiling twisted
![](https://secure.gravatar.com/avatar/add50dc31d8f3aa3971ead24ee8f6601.jpg?s=120&d=mm&r=g)
Hi, first of all, thank you very much for this great peace of software, I'm using twisted for many different projects and really enjoy it's clean and simple design. Now, I'm wondering if there's a profiling tool, which can quickly tell me what events are blocking my server. I have even deferred the file I/O to separate threads, as I've suspected them to make the server behave poorly when under load. But there still seems to be something blocking... What do you use to find the callbacks which block (for too long)? Regards Markus
![](https://secure.gravatar.com/avatar/109faffbfba5325ae74b67c0e4fe3928.jpg?s=120&d=mm&r=g)
Markus Schiltknecht wrote: ...
HTH, Mike -- ________________________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://www.vrplumber.com http://blog.vrplumber.com
![](https://secure.gravatar.com/avatar/add50dc31d8f3aa3971ead24ee8f6601.jpg?s=120&d=mm&r=g)
Hi, Mike C. Fletcher wrote:
Thanks, nice article. And I've tried that code snipped, but it didn't help much in my case, as response times are below 1 second. Please see these measurements, taken with apache bench. I'm testing response times on a simple status resource, which doesn't take long to calculate (i.e. doesn't even throw a deferred, but returns a response immediately): Document Path: /status Document Length: 202 bytes Concurrency Level: 10 Time taken for tests: 0.202266 seconds Complete requests: 100 Failed requests: 0 Write errors: 0 Total transferred: 38400 bytes HTML transferred: 20200 bytes Requests per second: 494.40 [#/sec] (mean) Time per request: 20.227 [ms] (mean) Time per request: 2.023 [ms] (mean, across all concurrent requests) Transfer rate: 182.93 [Kbytes/sec] received Now, measured while the server is under very small load: Document Path: /status Document Length: 202 bytes Concurrency Level: 10 Time taken for tests: 4.103465 seconds Complete requests: 100 Failed requests: 0 Write errors: 0 Total transferred: 38400 bytes HTML transferred: 20200 bytes Requests per second: 24.37 [#/sec] (mean) Time per request: 410.347 [ms] (mean) Time per request: 41.035 [ms] (mean, across all concurrent requests) Transfer rate: 9.02 [Kbytes/sec] received When putting the server under real load, those response times climb up to two seconds, so there must be something wrong. Can I somehow get the reactor's state, i.e. how many deferreds are waiting in the queue, how many threads are running concurrently, etc? How good is the idea of deferring File I/O to threads, i.e. threads.deferToThread(self.filehandle.write, data)? Another possibly blocking module might be the database api, but I'm using twisted's enterprise adbapi, which should be async, AFAICT. Maybe copying data around takes time. I'm sending around chunks of 64k size (streaming from the database to an external programm). Reducing chunk size to 1k helps somewhat (i.e. response time is seldom over 150ms, but can still climb up to > 0.5 seconds). Hum... external program.... maybe it's the self.transport.write() call which blocks several 100ms? Is it safe to write: d = threads.deferToThread(self.transport.write, dataChunk) (i.e. call transport.write from a thread?) How much resources do these deferToThread() deferreds eat? AFAICT, the reactor prepares a thread pool, which leads me to think that it's a well optimized implementation... Regards Markus
![](https://secure.gravatar.com/avatar/d6328babd9f9a98ecc905e1ccac2495e.jpg?s=120&d=mm&r=g)
On 02:32 pm, markus@bluegap.ch wrote:
Requests per second: 494.40 [#/sec] (mean)
Now, measured while the server is under very small load:
Requests per second: 24.37 [#/sec] (mean)
When putting the server under real load, those response times climb up to
What is 'real load'? Are you talking about things in process with, but not related to, the web server?
two seconds, so there must be something wrong.
Maybe your server is slow? :)
Can I somehow get the reactor's state, i.e. how many deferreds are waiting in the queue, how many threads are running concurrently, etc?
There is no queue of Deferreds. They don't actually have anything to do with the reactor.
How good is the idea of deferring File I/O to threads, i.e. threads.deferToThread(self.filehandle.write, data)?
If you do indeed discover that you are waiting a long time to write your particular stuff to files, then that might help. It might also randomly interleave the data and corrupt your files, if 'data' is big enough.
Another possibly blocking module might be the database api, but I'm using twisted's enterprise adbapi, which should be async, AFAICT.
It does the operations in threads, yes. However, the threadpool will eventually fill up; the concurrency is fairly limited. (The default is 10 or so workers, I think).
That's a possibility that the "--profile" option to twistd which JP suggested might help you with. You'll see the functions copying data taking a lot of CPU time in that case.
No. _All_ Twisted APIs are not thread safe. This call does not block though, and it is extremely, vanishingly unlikely that it is causing your problems. It just sticks some data into the outgoing queue and returns immediately.
It's not particularly "optimized", in the sense that we haven't measured the performance or improved it much, but it's also not doing very much; put a value into a queue, get it out in a thread, do it: that's about all. It would certainly surprise me if that operation were taking 100ms. One quick measurement you can do to determine what might be causing this performance loss is to examine the server in 'top' while it is allegedly under load. Is it taking up 100% CPU? If not, then it's probably blocked on some kind of I/O in your application, or perhaps writing the log. If so, then there's some inefficient application code (or Twisted code) that you need to profile and optimize. The output of "strace -T" on your Twisted server *might* be useful if you discover that you're blocking on some kind of I/O.
![](https://secure.gravatar.com/avatar/add50dc31d8f3aa3971ead24ee8f6601.jpg?s=120&d=mm&r=g)
Hi, glyph@divmod.com wrote:
What is 'real load'? Are you talking about things in process with, but not related to, the web server?
I should have been more specific, but OTOH, it really didn't matter. Anyway: what I meant was: 1 concurrent request => very small load, 8 concurrent requests => real load.
Maybe your server is slow? :)
;-) Probably not that slow...
Hm.. good to know.
Uh.. I'm taking care, that only one thread writes to a file at any time.
Yeah, I've enlarged that to 50 threads, which should be enough.
Didn't help me much, this time...
Okay, I've removed that deferToThread(). Didn't solve my problem anyway.
Yup, I did that, but it's hard to tell what's wrong from these things. Anyway, with very simple timing measures within the twisted server itself, I've figured out what was causing the delays: reactor.spawnProcess() takes more than a second. I knew that fork() was expensive, but that expensive? What I'm doing now feels very dirty: I'm calling reactor.spawnProcess() from a thread. (Yes, I'm taking care that only one thread can spawn a process at any time.) At least on my Linux Dev-Box, that seems to work - and resolves my issue. But... calling fork() from a thread??? Are there other ways to start and control external processes? Preferably even compatible to Windoze? Thanks to everybody for your help... Regards Markus
![](https://secure.gravatar.com/avatar/d6328babd9f9a98ecc905e1ccac2495e.jpg?s=120&d=mm&r=g)
On 11:14 am, exarkun@divmod.com wrote:
Markus, can you instrument spawnProcess and figure out which *part* of it is taking so long? Like exarkun says: this is not normal. (I wonder if we are unwittingly calling some blocking API in there? I don't think so, but...)
In other words, this may appear to work, but if you suddenly burst into flame unexpectedly, don't blame *us*. And you probably will, eventually, if you keep doing that.
![](https://secure.gravatar.com/avatar/add50dc31d8f3aa3971ead24ee8f6601.jpg?s=120&d=mm&r=g)
Hi, glyph@divmod.com wrote:
Markus, can you instrument spawnProcess and figure out which *part* of it is taking so long?
I give it a try. Especially, because I need to fiddle with process.Process anyway, see the mail I'm going to send in a minute or two...
I'll try to do some more benchmarking.
Yup, I'm aware of that. It's what I meant when calling it ugly or dirty or whatever I said. Regards Markus
![](https://secure.gravatar.com/avatar/fbd473e7e3b6675a84cd3c3b4a2c1972.jpg?s=120&d=mm&r=g)
On Wednesday 27 June 2007, Markus Schiltknecht wrote:
Does your server process use a lot of memory? I had problems with a random simple request taking a long time every now and then and it turned out to be the Python garbage collector. The collection itself can take some time if you have a lot of objects. If some of the Python process ends up in the swap instead of main memory, garbage collection will block until it is swapped back in, which can slow it down dramatically. Using multiple threads does not give you any performance advantage in this case: during garbage collection the entire Python VM is frozen. One thing you could try is: import gc gc.disable() in the startup code of your server. Note that this does not turn off all memory recycling; it only turns off the mark-and-sweep part of the garbage collector. Objects which have a zero reference count are still collected. Bye, Maarten
![](https://secure.gravatar.com/avatar/109faffbfba5325ae74b67c0e4fe3928.jpg?s=120&d=mm&r=g)
Markus Schiltknecht wrote: ...
HTH, Mike -- ________________________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://www.vrplumber.com http://blog.vrplumber.com
![](https://secure.gravatar.com/avatar/add50dc31d8f3aa3971ead24ee8f6601.jpg?s=120&d=mm&r=g)
Hi, Mike C. Fletcher wrote:
Thanks, nice article. And I've tried that code snipped, but it didn't help much in my case, as response times are below 1 second. Please see these measurements, taken with apache bench. I'm testing response times on a simple status resource, which doesn't take long to calculate (i.e. doesn't even throw a deferred, but returns a response immediately): Document Path: /status Document Length: 202 bytes Concurrency Level: 10 Time taken for tests: 0.202266 seconds Complete requests: 100 Failed requests: 0 Write errors: 0 Total transferred: 38400 bytes HTML transferred: 20200 bytes Requests per second: 494.40 [#/sec] (mean) Time per request: 20.227 [ms] (mean) Time per request: 2.023 [ms] (mean, across all concurrent requests) Transfer rate: 182.93 [Kbytes/sec] received Now, measured while the server is under very small load: Document Path: /status Document Length: 202 bytes Concurrency Level: 10 Time taken for tests: 4.103465 seconds Complete requests: 100 Failed requests: 0 Write errors: 0 Total transferred: 38400 bytes HTML transferred: 20200 bytes Requests per second: 24.37 [#/sec] (mean) Time per request: 410.347 [ms] (mean) Time per request: 41.035 [ms] (mean, across all concurrent requests) Transfer rate: 9.02 [Kbytes/sec] received When putting the server under real load, those response times climb up to two seconds, so there must be something wrong. Can I somehow get the reactor's state, i.e. how many deferreds are waiting in the queue, how many threads are running concurrently, etc? How good is the idea of deferring File I/O to threads, i.e. threads.deferToThread(self.filehandle.write, data)? Another possibly blocking module might be the database api, but I'm using twisted's enterprise adbapi, which should be async, AFAICT. Maybe copying data around takes time. I'm sending around chunks of 64k size (streaming from the database to an external programm). Reducing chunk size to 1k helps somewhat (i.e. response time is seldom over 150ms, but can still climb up to > 0.5 seconds). Hum... external program.... maybe it's the self.transport.write() call which blocks several 100ms? Is it safe to write: d = threads.deferToThread(self.transport.write, dataChunk) (i.e. call transport.write from a thread?) How much resources do these deferToThread() deferreds eat? AFAICT, the reactor prepares a thread pool, which leads me to think that it's a well optimized implementation... Regards Markus
![](https://secure.gravatar.com/avatar/d6328babd9f9a98ecc905e1ccac2495e.jpg?s=120&d=mm&r=g)
On 02:32 pm, markus@bluegap.ch wrote:
Requests per second: 494.40 [#/sec] (mean)
Now, measured while the server is under very small load:
Requests per second: 24.37 [#/sec] (mean)
When putting the server under real load, those response times climb up to
What is 'real load'? Are you talking about things in process with, but not related to, the web server?
two seconds, so there must be something wrong.
Maybe your server is slow? :)
Can I somehow get the reactor's state, i.e. how many deferreds are waiting in the queue, how many threads are running concurrently, etc?
There is no queue of Deferreds. They don't actually have anything to do with the reactor.
How good is the idea of deferring File I/O to threads, i.e. threads.deferToThread(self.filehandle.write, data)?
If you do indeed discover that you are waiting a long time to write your particular stuff to files, then that might help. It might also randomly interleave the data and corrupt your files, if 'data' is big enough.
Another possibly blocking module might be the database api, but I'm using twisted's enterprise adbapi, which should be async, AFAICT.
It does the operations in threads, yes. However, the threadpool will eventually fill up; the concurrency is fairly limited. (The default is 10 or so workers, I think).
That's a possibility that the "--profile" option to twistd which JP suggested might help you with. You'll see the functions copying data taking a lot of CPU time in that case.
No. _All_ Twisted APIs are not thread safe. This call does not block though, and it is extremely, vanishingly unlikely that it is causing your problems. It just sticks some data into the outgoing queue and returns immediately.
It's not particularly "optimized", in the sense that we haven't measured the performance or improved it much, but it's also not doing very much; put a value into a queue, get it out in a thread, do it: that's about all. It would certainly surprise me if that operation were taking 100ms. One quick measurement you can do to determine what might be causing this performance loss is to examine the server in 'top' while it is allegedly under load. Is it taking up 100% CPU? If not, then it's probably blocked on some kind of I/O in your application, or perhaps writing the log. If so, then there's some inefficient application code (or Twisted code) that you need to profile and optimize. The output of "strace -T" on your Twisted server *might* be useful if you discover that you're blocking on some kind of I/O.
![](https://secure.gravatar.com/avatar/add50dc31d8f3aa3971ead24ee8f6601.jpg?s=120&d=mm&r=g)
Hi, glyph@divmod.com wrote:
What is 'real load'? Are you talking about things in process with, but not related to, the web server?
I should have been more specific, but OTOH, it really didn't matter. Anyway: what I meant was: 1 concurrent request => very small load, 8 concurrent requests => real load.
Maybe your server is slow? :)
;-) Probably not that slow...
Hm.. good to know.
Uh.. I'm taking care, that only one thread writes to a file at any time.
Yeah, I've enlarged that to 50 threads, which should be enough.
Didn't help me much, this time...
Okay, I've removed that deferToThread(). Didn't solve my problem anyway.
Yup, I did that, but it's hard to tell what's wrong from these things. Anyway, with very simple timing measures within the twisted server itself, I've figured out what was causing the delays: reactor.spawnProcess() takes more than a second. I knew that fork() was expensive, but that expensive? What I'm doing now feels very dirty: I'm calling reactor.spawnProcess() from a thread. (Yes, I'm taking care that only one thread can spawn a process at any time.) At least on my Linux Dev-Box, that seems to work - and resolves my issue. But... calling fork() from a thread??? Are there other ways to start and control external processes? Preferably even compatible to Windoze? Thanks to everybody for your help... Regards Markus
![](https://secure.gravatar.com/avatar/d6328babd9f9a98ecc905e1ccac2495e.jpg?s=120&d=mm&r=g)
On 11:14 am, exarkun@divmod.com wrote:
Markus, can you instrument spawnProcess and figure out which *part* of it is taking so long? Like exarkun says: this is not normal. (I wonder if we are unwittingly calling some blocking API in there? I don't think so, but...)
In other words, this may appear to work, but if you suddenly burst into flame unexpectedly, don't blame *us*. And you probably will, eventually, if you keep doing that.
![](https://secure.gravatar.com/avatar/add50dc31d8f3aa3971ead24ee8f6601.jpg?s=120&d=mm&r=g)
Hi, glyph@divmod.com wrote:
Markus, can you instrument spawnProcess and figure out which *part* of it is taking so long?
I give it a try. Especially, because I need to fiddle with process.Process anyway, see the mail I'm going to send in a minute or two...
I'll try to do some more benchmarking.
Yup, I'm aware of that. It's what I meant when calling it ugly or dirty or whatever I said. Regards Markus
![](https://secure.gravatar.com/avatar/fbd473e7e3b6675a84cd3c3b4a2c1972.jpg?s=120&d=mm&r=g)
On Wednesday 27 June 2007, Markus Schiltknecht wrote:
Does your server process use a lot of memory? I had problems with a random simple request taking a long time every now and then and it turned out to be the Python garbage collector. The collection itself can take some time if you have a lot of objects. If some of the Python process ends up in the swap instead of main memory, garbage collection will block until it is swapped back in, which can slow it down dramatically. Using multiple threads does not give you any performance advantage in this case: during garbage collection the entire Python VM is frozen. One thing you could try is: import gc gc.disable() in the startup code of your server. Note that this does not turn off all memory recycling; it only turns off the mark-and-sweep part of the garbage collector. Objects which have a zero reference count are still collected. Bye, Maarten
participants (5)
-
glyph@divmod.com
-
Jean-Paul Calderone
-
Maarten ter Huurne
-
Markus Schiltknecht
-
Mike C. Fletcher