A few days ago, JP showed me some benchmarks of web2 vs. CherryPy hosting a trivial WSGI application. It turns out it's 4x slower, clocking in only ~250 req/sec on a middle-spec Opteron, as opposed to CherryPy's ~1000. From previous benchmarks, I happen to know that Apache can do ~3000 req/sec on that same hardware, making web2 well over 10x slower. I didn't look at the profile too closely, but preliminarily it seemed clear that the major bottleneck at this point was the creation and invocation of tons of Deferreds in _NotifyingProducerStream, while generating the response. Considering that streams aren't used at all in the actual HTTP protocol implementation (and thank goodness for that, given this discovery) it strikes me that the use of them in the response-generating API ought to be more optional, for cases where efficiency is important. Like, for example, DAV, which also seems to be making embarrassingly heavy use of the streams API - I haven't benchmarked that yet, but I'm sure it will be an adventure if I do :). I'm not sure I totally understand the use of streams here, anyway. I notice that Request purports to implement IConsumer, but I don't see any of IConsumer's methods actually present on Request. I'd prefer it if it actually did implement IConsumer and I could just write to that as an API. Finally, a minor nit; can we rename the various "http.py"s? I'd rather a naming convention than subdirectories anyway, http_channel.py rather than channel/http. Right now we have web2/http.py, web2/channel/http.py, and web2/dav/http.py. I can only see this problem getting worse.
On Feb 9, 2006, at 6:21 AM, glyph@divmod.com wrote:
A few days ago, JP showed me some benchmarks of web2 vs. CherryPy hosting a trivial WSGI application.
It turns out it's 4x slower, clocking in only ~250 req/sec on a middle-spec Opteron, as opposed to CherryPy's ~1000. From previous benchmarks, I happen to know that Apache can do ~3000 req/sec on that same hardware, making web2 well over 10x slower.
I didn't look at the profile too closely, but preliminarily it seemed clear that the major bottleneck at this point was the creation and invocation of tons of Deferreds in _NotifyingProducerStream, while generating the response.
Considering that streams aren't used at all in the actual HTTP protocol implementation (and thank goodness for that, given this discovery) it strikes me that the use of them in the response- generating API ought to be more optional, for cases where efficiency is important. Like, for example, DAV, which also seems to be making embarrassingly heavy use of the streams API - I haven't benchmarked that yet, but I'm sure it will be an adventure if I do :).
Benchmarking is a dangerous activity. I only get 131req/sec from ab on my mac from the following IResource resource, vs 114 from the wsgi resource, vs 706 from apache. Vs 74 from cherrypy's tut01_helloworld.py. So, shrug, I'm sure my benchmark sucks, and must be measuring something completely different than yours, given the wide variance in relative numbers. Anyhow, I'm happy that someone is benchmarking web2, because I haven't done so any time recently, but what you write above comes quite close to random flaming. Here's the simple resources: def simple_wsgi_app(environ, start_response): start_response("200 OK", []) return ['Hello world!'] class SimpleResource(object): implements(iweb.IResource) def renderHTTP(self, req): return http.Response(200, None, "Hello world!") def locateChild(self, req, seg): return self, () Here's all the construction of Deferreds from the native response. Only one is from the streams module. Looks like some of the others could be removed without changing any APIs, if that actually helps things. Going through the WSGI interface adds one additional deferred construction in addition to these 4. (Pdb) break twisted.internet.defer.Deferred.__init__ Breakpoint 1 at /Users/jknight/Tw/trunk/twisted/internet/defer.py:163 (Pdb) bt /Users/jknight/Tw/trunk/twisted/internet/selectreactor.py(139) _doReadOrWrite() -> why = getattr(selectable, method)() /Users/jknight/Tw/trunk/twisted/internet/tcp.py(350)doRead() -> return self.protocol.dataReceived(data) /Users/jknight/Tw/trunk/twisted/protocols/basic.py(232)dataReceived() -> why = self.lineReceived(line) /Users/jknight/Tw/trunk/twisted/web2/channel/http.py(664) lineReceived() -> self.chanRequest.lineReceived(line) /Users/jknight/Tw/trunk/twisted/web2/channel/http.py(142) lineReceived() -> self.processRequest() /Users/jknight/Tw/trunk/twisted/web2/channel/http.py(384) processRequest() -> self.request.process() /Users/jknight/Tw/trunk/twisted/web2/server.py(256)process() -> d = self._getChild(self.site.resource, self.postpath) /Users/jknight/Tw/trunk/twisted/web2/server.py(283)_getChild() -> return defer.maybeDeferred( /Users/jknight/Tw/trunk/twisted/internet/defer.py(116)maybeDeferred() -> return succeed(result) /Users/jknight/Tw/trunk/twisted/internet/defer.py(49)succeed() -> d = Deferred()
/Users/jknight/Tw/trunk/twisted/internet/defer.py(163)__init__() -> self.callbacks = []
(Pdb) bt /Users/jknight/Tw/trunk/twisted/internet/selectreactor.py(139) _doReadOrWrite() -> why = getattr(selectable, method)() /Users/jknight/Tw/trunk/twisted/internet/tcp.py(350)doRead() -> return self.protocol.dataReceived(data) /Users/jknight/Tw/trunk/twisted/protocols/basic.py(232)dataReceived() -> why = self.lineReceived(line) /Users/jknight/Tw/trunk/twisted/web2/channel/http.py(664) lineReceived() -> self.chanRequest.lineReceived(line) /Users/jknight/Tw/trunk/twisted/web2/channel/http.py(142) lineReceived() -> self.processRequest() /Users/jknight/Tw/trunk/twisted/web2/channel/http.py(384) processRequest() -> self.request.process() /Users/jknight/Tw/trunk/twisted/web2/server.py(256)process() -> d = self._getChild(self.site.resource, self.postpath) /Users/jknight/Tw/trunk/twisted/web2/server.py(283)_getChild() -> return defer.maybeDeferred( /Users/jknight/Tw/trunk/twisted/internet/defer.py(191)addCallback() -> callbackKeywords=kw) /Users/jknight/Tw/trunk/twisted/internet/defer.py(182)addCallbacks() -> self._runCallbacks() /Users/jknight/Tw/trunk/twisted/internet/defer.py(307)_runCallbacks() -> self.result = callback(self.result, *args, **kw) /Users/jknight/Tw/trunk/twisted/web2/server.py(312)_handleSegment() -> return self._getChild(newres, newpath) /Users/jknight/Tw/trunk/twisted/web2/server.py(281)_getChild() -> return defer.succeed(res) /Users/jknight/Tw/trunk/twisted/internet/defer.py(49)succeed() -> d = Deferred()
/Users/jknight/Tw/trunk/twisted/internet/defer.py(163)__init__() -> self.callbacks = []
(Pdb) bt /Users/jknight/Tw/trunk/twisted/internet/selectreactor.py(139) _doReadOrWrite() -> why = getattr(selectable, method)() /Users/jknight/Tw/trunk/twisted/internet/tcp.py(350)doRead() -> return self.protocol.dataReceived(data) /Users/jknight/Tw/trunk/twisted/protocols/basic.py(232)dataReceived() -> why = self.lineReceived(line) /Users/jknight/Tw/trunk/twisted/web2/channel/http.py(664) lineReceived() -> self.chanRequest.lineReceived(line) /Users/jknight/Tw/trunk/twisted/web2/channel/http.py(142) lineReceived() -> self.processRequest() /Users/jknight/Tw/trunk/twisted/web2/channel/http.py(384) processRequest() -> self.request.process() /Users/jknight/Tw/trunk/twisted/web2/server.py(258)process() -> d.addCallback(self._cbFinishRender) /Users/jknight/Tw/trunk/twisted/internet/defer.py(191)addCallback() -> callbackKeywords=kw) /Users/jknight/Tw/trunk/twisted/internet/defer.py(182)addCallbacks() -> self._runCallbacks() /Users/jknight/Tw/trunk/twisted/internet/defer.py(307)_runCallbacks() -> self.result = callback(self.result, *args, **kw) /Users/jknight/Tw/trunk/twisted/web2/server.py(355)_cbFinishRender() -> d = defer.Deferred()
/Users/jknight/Tw/trunk/twisted/internet/defer.py(163)__init__() -> self.callbacks = []
(Pdb) bt /Users/jknight/Tw/trunk/twisted/internet/selectreactor.py(139) _doReadOrWrite() -> why = getattr(selectable, method)() /Users/jknight/Tw/trunk/twisted/internet/tcp.py(350)doRead() -> return self.protocol.dataReceived(data) /Users/jknight/Tw/trunk/twisted/protocols/basic.py(232)dataReceived() -> why = self.lineReceived(line) /Users/jknight/Tw/trunk/twisted/web2/channel/http.py(664) lineReceived() -> self.chanRequest.lineReceived(line) /Users/jknight/Tw/trunk/twisted/web2/channel/http.py(142) lineReceived() -> self.processRequest() /Users/jknight/Tw/trunk/twisted/web2/channel/http.py(384) processRequest() -> self.request.process() /Users/jknight/Tw/trunk/twisted/web2/server.py(258)process() -> d.addCallback(self._cbFinishRender) /Users/jknight/Tw/trunk/twisted/internet/defer.py(191)addCallback() -> callbackKeywords=kw) /Users/jknight/Tw/trunk/twisted/internet/defer.py(182)addCallbacks() -> self._runCallbacks() /Users/jknight/Tw/trunk/twisted/internet/defer.py(307)_runCallbacks() -> self.result = callback(self.result, *args, **kw) /Users/jknight/Tw/trunk/twisted/web2/server.py(359)_cbFinishRender() -> d.callback(response) /Users/jknight/Tw/trunk/twisted/internet/defer.py(229)callback() -> self._startRunCallbacks(result) /Users/jknight/Tw/trunk/twisted/internet/defer.py(294) _startRunCallbacks() -> self._runCallbacks() /Users/jknight/Tw/trunk/twisted/internet/defer.py(307)_runCallbacks() -> self.result = callback(self.result, *args, **kw) /Users/jknight/Tw/trunk/twisted/web2/http.py(419)writeResponse() -> d = stream.StreamProducer(response.stream).beginProducing (self.chanRequest) /Users/jknight/Tw/trunk/twisted/web2/stream.py(725)beginProducing() -> finishedCallback = self.finishedCallback = Deferred()
/Users/jknight/Tw/trunk/twisted/internet/defer.py(163)__init__() -> self.callbacks = []
James
On Thu, 2006-02-09 at 13:40 -0500, James Y Knight wrote:
Benchmarking is a dangerous activity.
Truly.
I only get 131req/sec from ab on my mac from the following IResource resource, vs 114 from the wsgi resource, vs 706 from apache. Vs 74 from cherrypy's tut01_helloworld.py.
This is encouraging, at least with respect to the relative speed of CherryPy. I'd be quite happy to be wrong about needing API changes to improve performance; these should be independent considerations.
So, shrug, I'm sure my benchmark sucks, and must be measuring something completely different than yours, given the wide variance in relative numbers. Anyhow, I'm happy that someone is benchmarking web2, because I haven't done so any time recently, but what you write above comes quite close to random flaming.
I was hoping JP would publish his actual results before I wrote what I did, but I didn't want the ideas to get too stale. Maybe this will prompt him to - I hope said results are not too different from what I remember. At any rate, don't take it to heart, I'm glad just to get some discussion started about this and have everybody following it on the list publish their own trivial benchmarks to see if we can achieve some consensus as to how much load web2 can actually handle these days. Perhaps I should have been more direct in the first email, though. Full disclosure: I never liked streams as an API, and I have practically been looking for reasons to complain about them. Look for more soon, this benchmark was only the first ;-). Also, performance and aesthetic considerations aside, if a thing says it implements IConsumer, it should *actually* implement IConsumer. Is the mistake in the declaration or the implementation here? I hope it's the implementation, because IConsumer is the API I'd prefer to use, streams or no. Or, is there some funny magic behind the scenes where a Response actually does provide IConsumer somewhere? Even if so, a directlyProvides makes more sense for funny magic than an implements().
Here's all the construction of Deferreds from the native response.
Hmm. It is quite possible that the profile was not accurate - there were some odd things about it, although it's also possible that the WSGI test in question was doing something pathological that yours was not, like generating tons of tiny little write()s.
Only one is from the streams module. Looks like some of the others could be removed without changing any APIs, if that actually helps things. Going through the WSGI interface adds one additional deferred construction in addition to these 4.
Deferreds do have an embarrassingly high overhead. The next step here (in figuring out if those extra deferreds should be eliminated) is probably for me to construct some more meaningful benchmark to measure the performance impact in req/sec of additional deferreds in the response pipeline.
On Thu, 09 Feb 2006 14:20:05 -0500, Glyph Lefkowitz <glyph@divmod.com> wrote:
On Thu, 2006-02-09 at 13:40 -0500, James Y Knight wrote:
Benchmarking is a dangerous activity.
Truly.
Here's the ab2 output for a twisted.web2 server: Server Software: Twisted/SVN-Trunk Server Hostname: kunai.lan Server Port: 8080 Document Path: / Document Length: 26 bytes Concurrency Level: 500 Time taken for tests: 36.267711 seconds Complete requests: 10000 Failed requests: 0 Write errors: 0 Total transferred: 1751750 bytes HTML transferred: 260260 bytes Requests per second: 275.73 [#/sec] (mean) Time per request: 1813.386 [ms] (mean) Time per request: 3.627 [ms] (mean, across all concurrent requests) Transfer rate: 47.15 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 87 718.4 1 9000 Processing: 108 461 122.7 474 1944 Waiting: 40 459 122.7 473 1942 Total: 110 548 766.0 481 10912 Percentage of the requests served within a certain time (ms) 50% 481 66% 505 75% 511 80% 515 90% 546 95% 585 98% 733 99% 3575 100% 10912 (longest request) And here's the output of a run against CherryPy: Server Software: CherryPy/2.2.0beta Server Hostname: kunai.lan Server Port: 8080 Document Path: / Document Length: 26 bytes Concurrency Level: 500 Time taken for tests: 7.855088 seconds Complete requests: 10000 Failed requests: 0 Write errors: 0 Total transferred: 1360136 bytes HTML transferred: 260026 bytes Requests per second: 1273.06 [#/sec] (mean) Time per request: 392.754 [ms] (mean) Time per request: 0.786 [ms] (mean, across all concurrent requests) Transfer rate: 169.06 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 102 545.4 0 3003 Processing: 16 229 169.3 305 3419 Waiting: 15 228 169.4 304 3418 Total: 33 332 623.2 306 6420 Percentage of the requests served within a certain time (ms) 50% 306 66% 329 75% 333 80% 334 90% 340 95% 343 98% 3202 99% 3460 100% 6420 (longest request) The CherryPy numbers might be artificially depressed; from a quick check, it seems the bottleneck may be on the client side. This is the wsgi app being served: def app(environ, start_response): start_response('200 Ok', [('Content-Type', 'text/plain')]) return 'WSGI is a wonderful thing.' Jean-Paul
On Feb 9, 2006, at 3:08 PM, Jean-Paul Calderone wrote:
def app(environ, start_response): start_response('200 Ok', [('Content-Type', 'text/plain')]) return 'WSGI is a wonderful thing.'
There's a problem there: the return value of a wsgi resource is a sequence, so you're returning a sequence of single character strings, thus causing excessively high overhead. Perhaps that ought to be special cased, cause I'm sure more people than you will make that mistake. James
On Thu, 9 Feb 2006 15:24:37 -0500, James Y Knight <foom@fuhm.net> wrote:
On Feb 9, 2006, at 3:08 PM, Jean-Paul Calderone wrote:
def app(environ, start_response): start_response('200 Ok', [('Content-Type', 'text/plain')]) return 'WSGI is a wonderful thing.'
There's a problem there: the return value of a wsgi resource is a sequence, so you're returning a sequence of single character strings, thus causing excessively high overhead. Perhaps that ought to be special cased, cause I'm sure more people than you will make that mistake.
Yes, this is true. It was accidental at first, but I intentionally left it in this form even after I realized what was going on. Performance is roughly equivalent using either ['WSGI ...'] or other similar constructs. Jean-Paul
participants (4)
-
Glyph Lefkowitz
-
glyph@divmod.com
-
James Y Knight
-
Jean-Paul Calderone