Mailman 3 [Twisted-Python] profiling twisted - Twisted

[Twisted-Python] profiling twisted

Markus Schiltknecht

27 Jun 2007 27 Jun '07

12:46 p.m.

Hi, first of all, thank you very much for this great peace of software, I'm using twisted for many different projects and really enjoy it's clean and simple design. Now, I'm wondering if there's a profiling tool, which can quickly tell me what events are blocking my server. I have even deferred the file I/O to separate threads, as I've suspected them to make the server behave poorly when under load. But there still seems to be something blocking... What do you use to find the callbacks which block (for too long)? Regards Markus

Show replies by date

Mike C. Fletcher

27 Jun 27 Jun

1:23 p.m.

Markus Schiltknecht wrote: ...

...

Now, I'm wondering if there's a profiling tool, which can quickly tell me what events are blocking my server. I have even deferred the file I/O to separate threads, as I've suspected them to make the server behave poorly when under load. But there still seems to be something blocking... What do you use to find the callbacks which block (for too long)? http://blog.vrplumber.com/1353

HTH, Mike -- ________________________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://www.vrplumber.com http://blog.vrplumber.com

Markus Schiltknecht

2:32 p.m.

Hi, Mike C. Fletcher wrote:

...

http://blog.vrplumber.com/1353

Thanks, nice article. And I've tried that code snipped, but it didn't help much in my case, as response times are below 1 second. Please see these measurements, taken with apache bench. I'm testing response times on a simple status resource, which doesn't take long to calculate (i.e. doesn't even throw a deferred, but returns a response immediately): Document Path: /status Document Length: 202 bytes Concurrency Level: 10 Time taken for tests: 0.202266 seconds Complete requests: 100 Failed requests: 0 Write errors: 0 Total transferred: 38400 bytes HTML transferred: 20200 bytes Requests per second: 494.40 [#/sec] (mean) Time per request: 20.227 [ms] (mean) Time per request: 2.023 [ms] (mean, across all concurrent requests) Transfer rate: 182.93 [Kbytes/sec] received Now, measured while the server is under very small load: Document Path: /status Document Length: 202 bytes Concurrency Level: 10 Time taken for tests: 4.103465 seconds Complete requests: 100 Failed requests: 0 Write errors: 0 Total transferred: 38400 bytes HTML transferred: 20200 bytes Requests per second: 24.37 [#/sec] (mean) Time per request: 410.347 [ms] (mean) Time per request: 41.035 [ms] (mean, across all concurrent requests) Transfer rate: 9.02 [Kbytes/sec] received When putting the server under real load, those response times climb up to two seconds, so there must be something wrong. Can I somehow get the reactor's state, i.e. how many deferreds are waiting in the queue, how many threads are running concurrently, etc? How good is the idea of deferring File I/O to threads, i.e. threads.deferToThread(self.filehandle.write, data)? Another possibly blocking module might be the database api, but I'm using twisted's enterprise adbapi, which should be async, AFAICT. Maybe copying data around takes time. I'm sending around chunks of 64k size (streaming from the database to an external programm). Reducing chunk size to 1k helps somewhat (i.e. response time is seldom over 150ms, but can still climb up to > 0.5 seconds). Hum... external program.... maybe it's the self.transport.write() call which blocks several 100ms? Is it safe to write: d = threads.deferToThread(self.transport.write, dataChunk) (i.e. call transport.write from a thread?) How much resources do these deferToThread() deferreds eat? AFAICT, the reactor prepares a thread pool, which leads me to think that it's a well optimized implementation... Regards Markus

Jean-Paul Calderone

3:08 p.m.

On Wed, 27 Jun 2007 16:32:07 +0200, Markus Schiltknecht <markus@bluegap.ch> wrote:

...

Hi,

Mike C. Fletcher wrote:

...
http://blog.vrplumber.com/1353

Thanks, nice article. And I've tried that code snipped, but it didn't help much in my case, as response times are below 1 second.

You could try twistd --profile or one of the other Python profilers. Jean-Paul

glyph＠divmod.com

4:32 p.m.

On 02:32 pm, markus@bluegap.ch wrote:

...

Requests per second: 494.40 [#/sec] (mean)

...

Now, measured while the server is under very small load:

...

Requests per second: 24.37 [#/sec] (mean)

...

When putting the server under real load, those response times climb up to

What is 'real load'? Are you talking about things in process with, but not related to, the web server?

...

two seconds, so there must be something wrong.

Maybe your server is slow? :)

...

Can I somehow get the reactor's state, i.e. how many deferreds are waiting in the queue, how many threads are running concurrently, etc?

There is no queue of Deferreds. They don't actually have anything to do with the reactor.

...

How good is the idea of deferring File I/O to threads, i.e. threads.deferToThread(self.filehandle.write, data)?

If you do indeed discover that you are waiting a long time to write your particular stuff to files, then that might help. It might also randomly interleave the data and corrupt your files, if 'data' is big enough.

...

Another possibly blocking module might be the database api, but I'm using twisted's enterprise adbapi, which should be async, AFAICT.

It does the operations in threads, yes. However, the threadpool will eventually fill up; the concurrency is fairly limited. (The default is 10 or so workers, I think).

...

Maybe copying data around takes time. I'm sending around chunks of 64k size (streaming from the database to an external programm). Reducing chunk size to 1k helps somewhat (i.e. response time is seldom over 150ms, but can still climb up to > 0.5 seconds).

That's a possibility that the "--profile" option to twistd which JP suggested might help you with. You'll see the functions copying data taking a lot of CPU time in that case.

...

Hum... external program.... maybe it's the self.transport.write() call which blocks several 100ms? Is it safe to write:

d = threads.deferToThread(self.transport.write, dataChunk)

(i.e. call transport.write from a thread?)

No. _All_ Twisted APIs are not thread safe. This call does not block though, and it is extremely, vanishingly unlikely that it is causing your problems. It just sticks some data into the outgoing queue and returns immediately.

...

How much resources do these deferToThread() deferreds eat? AFAICT, the reactor prepares a thread pool, which leads me to think that it's a well optimized implementation...

It's not particularly "optimized", in the sense that we haven't measured the performance or improved it much, but it's also not doing very much; put a value into a queue, get it out in a thread, do it: that's about all. It would certainly surprise me if that operation were taking 100ms. One quick measurement you can do to determine what might be causing this performance loss is to examine the server in 'top' while it is allegedly under load. Is it taking up 100% CPU? If not, then it's probably blocked on some kind of I/O in your application, or perhaps writing the log. If so, then there's some inefficient application code (or Twisted code) that you need to profile and optimize. The output of "strace -T" on your Twisted server *might* be useful if you discover that you're blocking on some kind of I/O.

Markus Schiltknecht

2 Jul 2 Jul

9:04 a.m.

Hi, glyph@divmod.com wrote:

...

What is 'real load'? Are you talking about things in process with, but not related to, the web server?

I should have been more specific, but OTOH, it really didn't matter. Anyway: what I meant was: 1 concurrent request => very small load, 8 concurrent requests => real load.

...

Maybe your server is slow? :)

;-) Probably not that slow...

...

...
Can I somehow get the reactor's state, i.e. how many deferreds are waiting in the queue, how many threads are running concurrently, etc?

There is no queue of Deferreds. They don't actually have anything to do with the reactor.

Hm.. good to know.

...

...
How good is the idea of deferring File I/O to threads, i.e. threads.deferToThread(self.filehandle.write, data)?

If you do indeed discover that you are waiting a long time to write your particular stuff to files, then that might help. It might also randomly interleave the data and corrupt your files, if 'data' is big enough.

Uh.. I'm taking care, that only one thread writes to a file at any time.

...

...
Another possibly blocking module might be the database api, but I'm using twisted's enterprise adbapi, which should be async, AFAICT.

It does the operations in threads, yes. However, the threadpool will eventually fill up; the concurrency is fairly limited. (The default is 10 or so workers, I think).

Yeah, I've enlarged that to 50 threads, which should be enough.

...

...
Maybe copying data around takes time. I'm sending around chunks of 64k size (streaming from the database to an external programm). Reducing chunk size to 1k helps somewhat (i.e. response time is seldom over 150ms, but can still climb up to > 0.5 seconds).

That's a possibility that the "--profile" option to twistd which JP suggested might help you with. You'll see the functions copying data taking a lot of CPU time in that case.

Didn't help me much, this time...

...

...
Hum... external program.... maybe it's the self.transport.write() call which blocks several 100ms? Is it safe to write:

d = threads.deferToThread(self.transport.write, dataChunk)

(i.e. call transport.write from a thread?)

No. _All_ Twisted APIs are not thread safe. This call does not block though, and it is extremely, vanishingly unlikely that it is causing your problems. It just sticks some data into the outgoing queue and returns immediately.

Okay, I've removed that deferToThread(). Didn't solve my problem anyway.

...

One quick measurement you can do to determine what might be causing this performance loss is to examine the server in 'top' while it is allegedly under load. Is it taking up 100% CPU? If not, then it's probably blocked on some kind of I/O in your application, or perhaps writing the log. If so, then there's some inefficient application code (or Twisted code) that you need to profile and optimize. The output of "strace -T" on your Twisted server *might* be useful if you discover that you're blocking on some kind of I/O.

Yup, I did that, but it's hard to tell what's wrong from these things. Anyway, with very simple timing measures within the twisted server itself, I've figured out what was causing the delays: reactor.spawnProcess() takes more than a second. I knew that fork() was expensive, but that expensive? What I'm doing now feels very dirty: I'm calling reactor.spawnProcess() from a thread. (Yes, I'm taking care that only one thread can spawn a process at any time.) At least on my Linux Dev-Box, that seems to work - and resolves my issue. But... calling fork() from a thread??? Are there other ways to start and control external processes? Preferably even compatible to Windoze? Thanks to everybody for your help... Regards Markus

Jean-Paul Calderone

11:14 a.m.

On Mon, 02 Jul 2007 11:04:56 +0200, Markus Schiltknecht <markus@bluegap.ch> wrote:

...

[snip] Anyway, with very simple timing measures within the twisted server itself, I've figured out what was causing the delays: reactor.spawnProcess() takes more than a second. I knew that fork() was expensive, but that expensive?

This isn't normal.

...

What I'm doing now feels very dirty: I'm calling reactor.spawnProcess() from a thread. (Yes, I'm taking care that only one thread can spawn a process at any time.) At least on my Linux Dev-Box, that seems to work - and resolves my issue. But... calling fork() from a thread???

Er. That's not supported. The only Twisted API you can call from a thread is reactor.callFromThread. Jean-Paul

glyph＠divmod.com

4:52 p.m.

On 11:14 am, exarkun@divmod.com wrote:

...

On Mon, 02 Jul 2007 11:04:56 +0200, Markus Schiltknecht <markus@bluegap.ch> wrote:

...
[snip] Anyway, with very simple timing measures within the twisted server itself, I've figured out what was causing the delays: reactor.spawnProcess() takes more than a second. I knew that fork() was expensive, but that expensive?

This isn't normal.

Markus, can you instrument spawnProcess and figure out which *part* of it is taking so long? Like exarkun says: this is not normal. (I wonder if we are unwittingly calling some blocking API in there? I don't think so, but...)

...

...
What I'm doing now feels very dirty: I'm calling reactor.spawnProcess() from a thread. (Yes, I'm taking care that only one thread can spawn a process at any time.) At least on my Linux Dev- Box, that seems to work - and resolves my issue. But... calling fork() from a thread???

Er. That's not supported. The only Twisted API you can call from a thread is reactor.callFromThread.

In other words, this may appear to work, but if you suddenly burst into flame unexpectedly, don't blame *us*. And you probably will, eventually, if you keep doing that.

Markus Schiltknecht

4 Jul 4 Jul

1:56 p.m.

Hi, glyph@divmod.com wrote:

...

Markus, can you instrument spawnProcess and figure out which *part* of it is taking so long?

I give it a try. Especially, because I need to fiddle with process.Process anyway, see the mail I'm going to send in a minute or two...

...

Like exarkun says: this is not normal. (I wonder if we are unwittingly calling some blocking API in there? I don't think so, but...)

I'll try to do some more benchmarking.

...

In other words, this may appear to work, but if you suddenly burst into flame unexpectedly, don't blame *us*. And you probably will, eventually, if you keep doing that.

Yup, I'm aware of that. It's what I meant when calling it ugly or dirty or whatever I said. Regards Markus

Maarten ter Huurne

27 Jun 27 Jun

5:04 p.m.

On Wednesday 27 June 2007, Markus Schiltknecht wrote:

...

Now, I'm wondering if there's a profiling tool, which can quickly tell me what events are blocking my server. I have even deferred the file I/O to separate threads, as I've suspected them to make the server behave poorly when under load. But there still seems to be something blocking... What do you use to find the callbacks which block (for too long)?

Does your server process use a lot of memory? I had problems with a random simple request taking a long time every now and then and it turned out to be the Python garbage collector. The collection itself can take some time if you have a lot of objects. If some of the Python process ends up in the swap instead of main memory, garbage collection will block until it is swapped back in, which can slow it down dramatically. Using multiple threads does not give you any performance advantage in this case: during garbage collection the entire Python VM is frozen. One thing you could try is: import gc gc.disable() in the startup code of your server. Note that this does not turn off all memory recycling; it only turns off the mark-and-sweep part of the garbage collector. Objects which have a zero reference count are still collected. Bye, Maarten

Markus Schiltknecht

2 Jul 2 Jul

9:06 a.m.

Hi, Maarten ter Huurne wrote:

...

Does your server process use a lot of memory?

Thanks, but no, that's certainly not the problem here. reactor.spawnProcess() was it. See my other mail. Regards Markus

6304

Age (days ago)

6311

Last active (days ago)

List overview

Download

10 comments

5 participants

participants (5)

glyph＠divmod.com
Jean-Paul Calderone
Maarten ter Huurne
Markus Schiltknecht
Mike C. Fletcher

[Twisted-Python] profiling twisted

glyph＠divmod.com

glyph＠divmod.com

tags

participants (5)