[Twisted-web] Re: [Web-SIG] WSGI woes
Phillip J. Eby
pje at telecommunity.com
Fri Sep 17 00:37:39 CEST 2004
At 10:41 PM 9/16/04 +0100, Alan Kennedy wrote:
>In this way, there could be a middleware component below the
>rot13_streamer in the stack that, say, does chunked_transfer encoding and
>decoding. It would be the same in form as the above, except that it would
FYI, middleware and apps are now banned from dealing in any kind of
transfer-encodings, per James' very valuable input on that subject. Like
connection properties, these should be the exclusive province of the actual
web server.
>1. Change the environ entry for 'wsgi.async_input_handler' to be its own
>callable that records the callback for the next layer up in the stack, the
>rot13_streamer.input_handler.
This would lead to the unacceptable situation of every middleware component
having to know in principle about extensions. The "Server Extension APIs"
section of the PEP demands that any "bypass" API verify replacement for
this very reason.
>I think that this proposed approach is clean, and not overly complex for
>async or blocking programmers to handle.
Unless of course they're writing middleware that does something with the input.
>But I think we do have to cleanly separate the two. I think there are
>problems associated with trying to run *all* components seamlessly across
>async or blocking servers. I think that middleware components that are
>always going to behave correctly in an async situation will have to be
>designed like that from the ground up. It's dangerous to take components
>written in a blocking environment and run them in an async environment.
It is a non-goal for WSGI to support running multiple requests
simultaneously in a single-threaded asynchronous server, so the issue
doesn't really come up. A WSGI server *must* allow for the fact that WSGI
apps use up a thread while they're running or producing a value: that's the
price of being able to run "traditional" web applications under WSGI.
>And lastly, if it is desired to spin jobs into a different thread, e.g.
>the rot-13 job above, then that should be a middleware concern, not the
>WSGI server's.
I agree with you -- for *asynchronous* applications. Synchronous web
applications are the default case in WSGI and the world in general, so
servers *must* use a thread pool to start applications and to run 'next()'
calls, if they are asynchronous. But, asynchronous applications wish to
yield control, to avoid hogging resources in that thread pool, so they need
to delegate the work to their I/O thread, and then yield an empty string to
pause output, freeing up that thread for another iterable next(), or
application start.
Notice, however, that if the server is *synchronous* (e.g. CGI,
single-threaded FastCGI containers, mod_python under Apache 1.x, etc., ),
then this is a complete waste of time, because you'll only be running one
simultaneous request in this process anyway, so you're spinning off a
second thread to keep from tying up the first thread, but all the first
thread is doing is waiting for the second thread to finish! This is
wasteful, to say the least.
The only case where pausing output (whether for unrelated network I/O, or
because of a need to read from the input stream) is actually useful is when
the server is *also* asynchronous -- hence the value of making such pausing
an optional extension API. The application can then detect when it's
*useful* to pause, and synchronous applications needn't worry about it.
Of course, even if the server and application are *both* asynchronous,
that's no guarantee that they're using compatible event loops! If you try
to run a Twisted app under asyncore or vice versa, you're going to be
spinning off an extra thread to run a second event loop, so there's a bit
of a trade-off to determining whether your asynchrony is going to actually
*gain* anything. But that's a separate question. WSGI will allow you to
be asynchronous if you really want to, no matter how bad an idea it might
be in some cases. :)
>The twisted rot-13 component would then have very thin methods (run from
>the server's main thread) which interact with the twisted space i.e.
>transferring data and receiving data back through queues, and layer WSGI
>semantics on those interactions, i.e. pause_output, yield result, yield
>empty_string, etc.
You're pretty much describing what I suggested earlier: that async app
frameworks like Twisted may want to have a model whereby a generic "thin
wrapper" WSGI application object is used to communicate with an application
that's written using the underlying framework's async idioms. So, for
example, one might perhaps design a Twisted "Transport" that was
implemented as a WSGI application. (I don't know if "Transport" is really
the correct abstraction to use, I'm just giving an example here.)
Anyway, for such a thing to really work, I think you might need
server-specific reactor plugins, to integrate Twisted's event loop with
that of the server.
>When I described your approach as "pulling data up the stack", I saw a
>bigger difference between the two approaches. I'm thinking now that there
>is little difference between our proposals, except that in mine it's the
>bottom component that gets notified of the input by the server, and in
>yours it's the top component. Though I suppose having the top component
>pulling input from an iterator chain mirrors nicely the situation where
>the server pulls output from an iterator chain.
Actually, I'm saying you pull data *down* the stack. The bottom-most
application iterator calls 'read()' on an input stream provided by a parent
middleware component, which then calls read on a higher-level component,
and so on.
>And my approach basically entails a bunch of nested calls, which might be
>less efficient elegant than if, say, generators were used in an input
>processing chain.
>
>You're right again Phillip :-)
Not entirely, actually. For my approach to really work, the middleware
would have to be guaranteed to return something from read(), as long as the
parent's read() returns something. Otherwise, the resumption would block,
unless the middleware were much smarter. I've got to think about it some
more, because right now I'm still not happy with the specifics of any of
the proposals for pausing and resuming output.
More information about the Web-SIG
mailing list