[Twisted-web] Re: [Web-SIG] WSGI woes

Thu Sep 16 07:13:52 CEST 2004

On Sep 15, 2004, at 7:12 PM, Phillip J. Eby wrote:

> At 06:48 PM 9/15/04 -0400, Peter Hunt wrote:
>> It looks like WSGI is not well received over at twisted.web.
>>
>> http://twistedmatrix.com/pipermail/twisted-web/2004-September/ 
>> 000644.html
>
> Excerpting from that post:
>
> """The WSGI spec is unsuitable for use with asynchronous servers and
> applications. Basically, once the application callable returns, the
> server (or "gateway" as wsgi calls it) must consider the page finished
> rendering."""
>
> This is incorrect.

As I said in my original post, I hadn't mentioned anything about this  
yet because I didn't have a solution or proposal to fix the problem,  
which I maintain remains. I will attempt to suggest solutions, but I am  
unsure whether they will work or make sense in all environments. Allow  
me to explain:

>   Here is a simple WSGI application that demonstrates yielding 50 data  
> blocks for transmission *after* the "application callable returns".
>
>     def an_application(environ, start_response):
>         start_response("200 OK", [('Content-Type','text/plain')])
>         for i in range(1,51):
>             yield "Block %d" % i
>
> This has been a valid WSGI application since the August 8th posting of  
> the WSGI pre-PEP.

According to the spec, """The application object must return an  
iterable yielding strings.""" Whether the application callable calls  
write before returning or yields strings to generate content, the  
effect is the same -- there is no way for the application callable to  
say "Wait, hang on a second, I'm not ready to generate more content  
yet. I'll tell you when I am." This means the only way the application  
can pause for network activity is by blocking. For example, a page  
which performed an XML-RPC call and transformed the output into HTML  
would be required to perform the XML-RPC call synchronously. Or a page  
which initiated a telnet session and streamed the results into a web  
page would be required to perform reads on the socket synchronously.  
The server or gateway, by calling next(), is assuming that the call  
will yield a string value, and only a string value.

Of course, Twisted has a canonical way of indicating that a result is  
not yet ready, the Deferred. An asynchronous application could yield a  
Deferred and an asynchronous server would attach a callback to this  
Deferred which invoked the next() method upon resolution. This is how  
Nevow handles Deferreds (in Nevow SVN head at  
nevow.flat.twist.deferflatten).

However, the WSGI spec says nothing about Deferred and indeed, Deferred  
would be useless in the case of another asynchronous server such as  
Medusa. I would suggest that WSGI include a simple Deferred  
implementation, but WSGI is simply a spec which is not intended to have  
any actual code. Thus, one solution would be for the WSGI spec to be  
amended to state:

"""The application object must return an iterable yielding strings or  
objects implementing the following interface:

def addCallback(callable):
	'''Add 'callable' to the list of callables to be invoked when a string
	is available. Callable should take a single argument, which will be a  
string.'''

The application object must invoke the callable passed to addCallback,  
passing a string which will be written to the request.
"""

This places additional burdens upon implementors of WSGI servers or  
gateways. In the case of a threaded HTTP server which uses blocking  
writes, implementing support for these promises would have to look  
something like this:

import Queue

def handle_request(inSocket, outSocket):
     ... read inSocket, parse the request and dispatch ...

     iterable = application(environ, start_response)

     try:
         while True:
             val = iterable.next()
             if isinstance(val, str):
                 outSocket.write(val)
             else:
                 result = Queue.Queue()
                 val.addCallback(result.put)
                 outSocket.write(result.get())
     except StopIteration:
         outSocket.close()

> It may be, however, that Mr. Preston means that applications which  
> want to use 'write()' or a similar push-oriented approach to produce  
> data cannot do so after the application returns.  If so, we should  
> discuss that use case further, preferably on the Web-SIG.

And now we come to my other half-baked proposal.

Instead of merely returning a write callable, start_response could  
return a tuple of (write, finish) callables. The application would be  
free to call write at any time until it calls finish, at which point  
calling either callable becomes illegal. Again, the synchronous server  
support for this would have to use spin locking in a fashion such as  
this:

import threading

def handle_request(inSocket, outSocket):
     ... read request, dispatch ...
     finished = threading.Semaphore()

     def start_response(...):
         ... write headers ...
         return outSocket.write, finished.release

     iterable = application(environ, start_response)
     if iterable is None:
         finished.acquire()
         # Once we get here, the application is done with the request.

Finally, we come to the task of implementing a server or gateway which  
can asynchronously support either asynchronous or blocking  
applications. Since there is no way for the server or gateway to know  
whether the application object it is about to invoke will block,  
starving the main loop and preventing network activity from being  
serviced, it must invoke all applications in a new thread or process. A  
solution to this would be to require application callables to provide  
additional metadata, perhaps via function or object attributes, which  
indicate whether they are capable of running in asynchronous, threaded,  
or multiprocess environments. Since it's getting late and this message  
is getting long I will leave this discussion for another day.

dp