[Web-SIG] Proposal for asynchronous WSGI variant

Christopher Stawarz cstawarz at csail.mit.edu
Tue May 6 03:30:27 CEST 2008


(I'm new to the list, so please forgive me for making my first post a
specification proposal :)

Browsing through the list archives, I see there's been some
inconclusive discussions on adding better support for asynchronous web
servers to the WSGI spec.  Since such support would be very useful for
some upcoming projects of mine, I decided to take a shot at specing
out and implementing it.  I'd be grateful for any feedback you have.
If this seems like something worth pursuing, I would also welcome
collaborators to help develop the spec further.

The name for this proposed specification is the Asynchronous Web
Server Gateway Interface (AWSGI).  As the name suggests, the spec is
closely related to WSGI and is most easily described in terms of how
it differs from WSGI.  AWSGI eliminates the following parts of WSGI:

   - the environment variables wsgi.version and wsgi.input

   - the write() callable returned by start_response()

AWSGI adds the following environment variables:

   - awsgi.version
   - awsgi.input
   - awsgi.readable
   - awsgi.writable
   - awsgi.timeout

In addition, AWSGI allows the application iterable to yield two types
of data:

   - byte strings, handled as in WSGI

   - the result of calling awsgi.readable or awsgi.writable, which
     indicates that the application should be paused and restarted when
     a specified file descriptor is ready for reading or writing

Because of AWSGI's similarity to WSGI, a simple wrapper can be used to
run AWSGI applications on WSGI servers without alteration.

The following example application demonstrates typical usage of AWSGI.
This application simply reads the request body and sends it back to
the client.  Each time it wants to receive data from the client, it
first tests awsgi.input for readability and then calls its recv()
method.  If awsgi.input is not readable after one second, the
application sends a "408 Request Timeout" response to the client and
terminates:


   def echo_request_body(environ, start_response):
       input = environ['awsgi.input']
       readable = environ['awsgi.readable']

       nbytes = int(environ.get('CONTENT_LENGTH') or 0)
       output = ''
       while nbytes:
           yield readable(input, 1.0)  # Time out after 1 second

           if environ['awsgi.timeout']:
               msg = 'The request timed out.'
               start_response('408 Request Timeout',
                              [('Content-Type', 'text/plain'),
                               ('Content-Length', str(len(msg)))])
               yield msg
               return

           data = input.recv(nbytes)
           if not data:
               break
           output += data
           nbytes -= len(data)

       start_response('200 OK', [('Content-Type', 'text/plain'),
                                 ('Content-Length', str(len(output)))])
       yield output


I have rough but functional implementations of a number of AWSGI
components available in a Bazaar branch at
http://pseudogreen.org/bzr/awsgiref/.  The package includes an
asyncore-based AWSGI server and an AWSGI-to-WSGI application wrapper.
In addition, the file spec.txt contains a more detailed description of
the specification (which is also appended below).

Again, I'd very much appreciate comments and criticism.


Thanks,
Chris




Detailed AWSGI Specification
----------------------------

- Required AWSGI environ variables:

   * All variables required by WSGI, except for wsgi.version and
     wsgi.input, which must *not* be present

   * awsgi.version => the tuple (1, 0)

   * awsgi.input

     This is an object with one method, recv(bufsize), which behaves
     like the socket method of the same name (although it doesn't
     support the optional flags parameter).  Before each call to
     recv(), the application must test awsgi.input for readability via
     awsgi.readable.  The result of calling recv() without doing so is
     undefined.

     (XXX: Should recv() handle EINTR for the application?)

   * awsgi.readable
   * awsgi.writable

     These are callables with the signature f(fd, timeout=None).  fd is
     either a file descriptor (i.e. int or long) or an object with a
     fileno() method that returns a file descriptor.

     timeout has the same semantics as the timeout parameter to
     select.select().  If the operation times out, awsgi.timeout will
     be true when the application resumes.

     In addition to checking readiness for reading or writing, servers
     should also monitor file descriptors for "exceptional" conditions
     (e.g. out-of-band data) and restart the application if they occur.

   * awsgi.timeout => boolean indicating whether the most recent read
     or write wait timed out (false if there have been no waits)

- start_response() must *not* return a write() callable, as this
   method of providing application output to the server is incompatible
   with asynchronous execution.

- The server must accept awsgi.input as input to awsgi.readable,
   either by providing an actual socket object or by special-case
   handling (i.e. awsgi.input needn't have a fileno() method, as long
   as the server handles it as if it did).

- Applications return iterators, which can yield:

   * a string => sent to client, just as in standard WSGI

   * the result of a call to awsgi.readable or awsgi.writable =>
     application is resumed when either the file descriptor is ready
     for reading/writing or the wait times out (in which case,
     awsgi.timeout will be true)

- Although AWSGI applications will *not* be directly compatible with
   WSGI servers, middleware will allow them to run as standard WSGI
   apps (with all I/O waits returning immediately).

- AWSGI servers will not support unmodified WSGI applications.  There
   are several reasons for this:

   - If the app does blocking I/O, it will block the entire server.

   - Calls to the read() method of wsgi.input may fail with
     EWOULDBLOCK, which an app expecting synchronous I/O probably won't
     be prepared to deal with.

   - The readline(), readlines(), and __iter__() methods of wsgi.input
     can require multiple network I/O operations, which is incompatible
     with asynchronous execution.

   - The write() callable returned by start_response() is inherently
     incompatible with asynchronous execution.

   Because of these issues, this specification aims for one-way
   compatibility between AWSGI and WSGI (i.e. the ability to run AWSGI
   apps on WSGI servers via middleware, but not vice versa).



More information about the Web-SIG mailing list