[Web-SIG] wsgi and generators (was Re: WSGI and start_response)
P.J. Eby
pje at telecommunity.com
Sat Apr 10 19:52:00 CEST 2010
At 02:04 PM 4/10/2010 +0100, Chris Dent wrote:
>I realize I'm able to build up a complete string or yield via a
>generator, or a whole bunch of various ways to accomplish things
>(which is part of why I like WSGI: that content is just an iterator,
>that's a good thing) so I'm not looking for a statement of what is or
>isn't possible, but rather opinions. Why is yielding lots of moderately
>sized strings *very bad*? Why is it _not_ very bad (as presumably
>others think)?
How bad it is depends a lot on the specific middleware, server
architecture, OS, and what else is running on the machine. The more
layers of architecture you have, the worse the overhead is going to be.
The main reason, though, is that alternating control between your app
and the server means increased request lifetime and worsened average
request completion latency.
Imagine that I have five tasks to work on right now. Let us say each
takes five units of time to complete. If I have five units of time
right now, I can either finish one task now, or partially finish
five. If I work on them in an interleaved way, *none* of the tasks
will be done until twenty-five units have elapsed, and so all tasks
will have a completion latency of 25 units.
If I work on them one at a time, however, then one task will be done
in 5 units, the next in 10, and so on -- for an average latency of
only 15 units. And that is *not* counting any task switching overhead.
But it's *worse* than that, because by multitasking, my task queue
has five things in it the whole time... so I am using more memory
and have more management overhead, as well as task switching overhead.
If you translate this to the architecture of a web application, where
the "work" is the server serving up bytes produced by the
application, then you will see that if the application serves up
small chunks, the web server is effectively forced to multitask, and
keep more application instances simultaneously running, with lowered
latency, increased memory usage, etc.
However, if the application hands either its entire output to the
server, then the "task" is already *done* -- the server doesn't need
the thread or child process for that app anymore, and can have it do
something else while the I/O is happening. The OS is in a better
position to interleave its own I/O with the app's computation, and
the overall request latency is reduced.
Is this a big emergency if your server's mostly idle? Nope. Is it a
problem if you're writing a CGI program or some other direct API that
doesn't automatically flush I/O? Not at all. I/O buffering works
just fine for making sure that the tasks are handed off in bigger chunks.
But if you're coding up a WSGI framework, you don't really want to
have it sending tiny chunks of data up a stack of middleware, because
WSGI doesn't *have* any buffering, and each chunk is supposed to be
sent *immediately*.
Well-written web frameworks usually do some degree of buffering
already, for API and performance reasons, so for simplicity's sake,
WSGI was spec'd assuming that applications would send data in
already-buffered chunks.
(Specifically, the simplicity of not needing to have an explicit
flushing API, which would otherwise have been necessary if middleware
and servers were allowed to buffer the data, too.)
More information about the Web-SIG
mailing list