Re: [Twisted-web] Performance of twisted web with HTTP/1.1 vs. HTTP/1.0

On Mon, Apr 05, 2004 at 12:32:37PM -0500, Jason E. Sibre wrote:
ACT is part of the Microsoft Visual Studio .NET. It stands for Application Center Test. The reason I tried it is that it allows the person setting up the test to control which headers are sent to the server. In fact, the test is actually controlled programmatically via VBScript (it hooks into Internet Explorer to 'record' the test, if you don't want to create it from scratch, which would be very tedious).
Ah, ok. I've played with an older version of it a couple of years ago I think, back when I was still working for a company that used windows :) I think it was called "WAST" (Web Application Stress Tool) or something at the time, I guess ACT is probably the new version of that :)
Anyway, that explains why I couldn't find it on google :)
If there's an open source tester that you'd rather I use for this discussion, please let me know about it, and I'll chase it down.
Well, any open source tester, even a simple dodgy one you write yourself -- I don't have any strong preference, but it would be nice to be able to reproduce your results :)
In the meantime, I'm attaching the test script that is being used (test.vbs). It's written in VBScript, but it can't be run standalone, as the "Test" object won't exist unless it's being run from the ACT. It is easy to read however, so you can see what's going on.
Yeah, nothing usual, certainly nothing that looks problematic...
I've modified a portion of the script that ties Quixote to Twisted, so that I can 'peek' at the HTTP headers going into/coming out of Quixote. Here are the headers when things are running fast:
INBOUND header GATEWAY_INTERFACE: CGI/1.1
[..some headers..]
INBOUND header SERVER_PROTOCOL: HTTP/1.0 INBOUND header SERVER_SOFTWARE: TwistedWeb/1.2.0
And here they are when things are running slow:
INBOUND header GATEWAY_INTERFACE: CGI/1.1
[..identical headers..]
INBOUND header SERVER_PROTOCOL: HTTP/1.1 INBOUND header SERVER_SOFTWARE: TwistedWeb/1.2.0
I only note two differences, the HTTP/1.0 vs. HTTP/1.1 and the REMOTE_PORT 1308 vs. 1445, so, one significant difference.
Not much help, I guess, but does it give you any clues?
Not really. I've read through the code in twisted/protocols/http.py, and I simply don't see any significant differences in the code paths for serving HTTP/1.0 and HTTP/1.1.
(I've joined the Twisted-web list for the duration of this conversation, at least)
I've CC'd the list.
I can't think of what else to suggest, except for running the server in the Python profiler (the -p=profile.log switch to twistd, if you're using it), and seeing if that reveals where the extra time is being spent.
Maybe you could insert a "print repr(line)" into HTTPChannel's lineReceiver handler (see twisted/protocols/http.py) to get a more raw dump of the request, and double-check that there's no significant differences in the HTTP/1.0 vs. HTTP/1.1 requests, but I'm doubtful...
Also, to eliminate another difference between your environment and mine, is it possible to run your tester against a Twisted server on a linux box, and verify that it still gives the same behaviour? If for some bizarre reason it turns out to be a windows-only issue, it'd be good to know before I waste too much time trying to reproduce it on Linux :)
Also, I'm losing track of what you have and haven't tried... it might be time we opened a bug report for this, so we can track this properly.
-Andrew.

*** Probably not a problem with Twisted... see my last comments ***
Well, any open source tester, even a simple dodgy one you write yourself -- I don't have any strong preference, but it would be nice to be able to reproduce your results :)
Well, I've attempted to write a script in python to do this, but, I'm not seeing the performance difference when I run it (It performs well, whether it's 1.0 or 1.1. Actually, it performs slightly BETTER if it's 1.1, but only about 10%)! After rubbing my eyes, and trying again (and again, and again....) I've concluded that I'm NOT losing my mind (the perf degradation is very real and noticable when loading/refreshing a page with lot's of small images), but for some reason, the way httplib constructs an HTTP/1.1 request doesn't trigger it (while ACT, Internet Explorer, and Mozilla 1.7b do trigger it)
I can't think of what else to suggest, except for running the server in the Python profiler (the -p=profile.log switch to twistd, if you're using it), and seeing if that reveals where the extra time is being spent.
Hmm... I'm not running twistd, but maybe I can do something along those lines in my script that kicks Twisted off... It already has some profiling bootstrap code in it. I hate wading through all the profiler output, but at least this may be easy to spot, since it's so dramatic.
Maybe you could insert a "print repr(line)" into HTTPChannel's lineReceiver handler (see twisted/protocols/http.py) to get a more raw dump of the request, and double-check that there's no significant differences in the HTTP/1.0 vs. HTTP/1.1 requests, but I'm doubtful...
Another good suggestion. I don't know where that is at the moment, but I'm sure it'll be easy to find once I dig/grep for it. I assume that's a pretty low level spot in twisted's comm stack?
Also, to eliminate another difference between your environment and mine, is it possible to run your tester against a Twisted server on a linux box, and verify that it still gives the same behaviour? If for some bizarre reason it turns out to be a windows-only issue, it'd be good to know before I waste too much time trying to reproduce it on Linux :)
Actually, I am running it on a Linux box. Of course I ran ACT from a windows box, but they're sitting next to each other on a 100mbps lan. The data transfer rate was not a factor (Apache on Linux can hand about 500 images / second to the windows box). I'm using Python 2.3 on SuSE 8.1 with a 2.4.24 kernel (I compiled it myself, but I haven't had any troubles with it -- that I know of ;) )
Also, I'm losing track of what you have and haven't tried... it might be time we opened a bug report for this, so we can track this properly.
I'd be up for that, but I'd really like to see at least one other person duplicate it before we went that far. I may just have something stupid going on over here. ---------------------------------------------------------------- Ok, you know what? I'm starting to think it's a not a problem intrinsic with twisted. I just put together a html page with 100 small images, and served it through the Qx / Twisted combo and it was slow. Through Qx / Medusa, it was fast. Through Twisted by itself, it was fast (using Moz 1.7b). Here's the script I used:
#!/usr/bin/python from twisted.internet import reactor from twisted.web import static, server
root = static.File( "/home/jsibre/programs/python/sitehostApp/sitehost/web/static/test") reactor.listenTCP(8082, server.Site(root)) reactor.run()
It ran really fast, all images loaded in about a second (through the Qx / Twisted combo, you can watch them load one after the other... Takes about 8 or 9 seconds).
I was afraid of this... It's probably something in the glue that ties the Qx to Twisted. I'll work it from that angle for a while and see if I find a problem that is fixable, and provide the fix to the Qx folks when I figure it out.
Thanks for your help, Andrew.
Jason

Hi folks,
I previously wrote to this list about a performance problem I was having with Twisted, Quixote, and (I thought) HTTP/1.1, which I erroneously thought was a problem in Twisted's ability to deal with HTTP/1.1...
I've since spent lots of time digging, and first figured out that the problem wasn't really in Twisted (and it really didn't have anything to do with HTTP/1.1, though persistent connections did contribute. More accurately, the lack of persistent connections would mask the problem.), and then eventually figured out what the problem REALLY was.
It was an odd little thing that had to do with Linux, Windows, network stacks, slow ACKs, and sending more packets than were needed. Well, I don't want to go into much more detail, because your time is valuable.
First, for those that haven't heard of it, Quixote is a python based web publishing framework that doesn't include a web server. Instead, it can be published through a number of mechanisms: CGI, FastCGI, SCGI, or mod_python, plus it has interfaces for Twisted and Medusa. I think I may be missing one, but I'm not sure. It's home page is at http://www.mems-exchange.org/software/quixote/
We (the quixote-users folks) seem to have a lack of expertise in Twisted :)
The interface between twisted and quixote: A twisted request object is used to create a quixote request object, quixote is called to publish the request, and then the output of quixote is wrapped into a producer which twisted then finishes handling. Actually, that's how it has been for quite some time, except for the producer bit. My modifications revolved around creating the producer class that (I think/hope) works well in the Twisted framework, and let's twisted publish it when it's ready (i.e., in it's event loop). Formerly, quixote's output was just pushed out through the twisted request object's write() method. Which could cause REALLY bad performance; the bug I was chasing. In many cases it did just fine, however. This was also just a generally bad idea, because, for instance, publishing a large file could consume large amounts of RAM until it was done being pushed over the wire.
It's also worth mentioning that a quixote Stream object (noticable in the source) is a producer, but it uses the iterator protocol instead of .more() or resumeProducing().
I'm hoping that someone can take a look at the finished product (just the interface module) and say something like, "you're nuts! you're doing this all wrong!", or "yeah, this looks like the right general idea, except maybe this bit here...".
Also, if anyone can share a brief one-liner or two about whether or not I should leave in the hooks for pb and threadable, I'd appreciate it (quixote is almost always run single threaded... Maybe just always...). I also changed the demo/test code at the bottom of the module from using the Application object to using the reactor. I'd appreciate any feedback on that and the SSL code (it's also new...) as well.
If anyone should want to actually run this, it'll work with Quixote-1.0b1, and the previous 'stable' (I say that because it was the latest version for several months...) version 0.7a3. I wrote the interface against twisted 1.2.0, but I think it'll work with older versions. I just don't know how old. Oh, and if you wanna drop it in a quixote install, it lives as quixote.server.twisted_http
Thanks in advance for any help,
Jason Sibre
participants (2)
-
Andrew Bennetts
-
Jason E. Sibre