[Web-SIG] Server-side async API implementation sketches

Sun Jan 9 18:03:38 CET 2011

At 06:06 AM 1/9/2011 +0200, Alex Grönholm wrote:
>A new feature here is that the application itself yields a (status, 
>headers) tuple and then chunks of the body (or futures).

Hm.  I'm not sure if I like that.  The typical app developer really 
shouldn't be yielding multiple body strings in the first place.  I 
much prefer that the canonical example of a WSGI app just return a 
list with a single bytestring -- preferably in a single statement for 
the entire return operation, whether it's a yield or a return.

IOW, I want it to look like the normal way to do thing is to just 
return the whole request at once, and use the additional difficulty 
of creating a second iterator to discourage people writing iterated 
bodies when they should just write everything to a BytesIO and be done with it.

Also, it makes middleware simpler: the last line can just yield the 
result of calling the app, or a modified version, i.e.:

     yield app(environ)

or:

     s, h, b = app(environ)
     # ... modify or replace s, h, b
     yield s, h, b

In your approach, the above samples have to be rewritten as:

     return app(environ)

or:

     result = app(environ)
     s, h = yield result
     # ... modify or replace s, h
     yield s, h

     for data in result:
          # modify b as we go
          yield result

Only that last bit doesn't actually work, because you have to be able 
to send future results back *into* the result.  Try actually making 
some code that runs on this protocol and yields to futures during the 
body iteration.

Really, this modified protocol can't work with a full async API the 
way my coroutine-based version does, AND the middleware is much more 
complicated.  In my version, your do-nothing middleware looks like this:

class NullMiddleware(object):
     def __init__(self, app):
         self.app = app

     def __call__(environ):
         # ACTION: pre-application environ mangling

         s, h, body = yield self.app(environ)

         # modify or replace s, h, body here

         yield s, h, body

If you want to actually process the body in some way, it looks like:

class NullMiddleware(object):

     def __init__(self, app):
         self.app = app

     def __call__(environ):
         # ACTION: pre-application environ mangling

         s, h, body = yield self.app(environ)

         # modify or replace s, h, body here

         yield s, h, self.process(body)

     def process(self, body_iter):
         while True:
             chunk = yield body_iter
             if chunk is None:
                 break
             # process/modify chunk here
             yield chunk

And that's still a lot simpler than your sketch.

Personally, I would write both of the above as:

     def null_middleware(app):

         def wrapped(environ):
             # ACTION: pre-application environ mangling
             s, h, body = yield app(environ)

             # modify or replace s, h, body here
             yield s, h, process(body)

         def process(body_iter):
             while True:
                 chunk = yield body_iter
                 if chunk is None:
                     break
                 # process/modify chunk here
                 yield chunk

         return wrapped

But that's just personal taste.  Even as a class, it's much easier to 
write.  The above middleware pattern works with the sketches I gave 
on the PEAK wiki, and I've now updated the wiki to include an example 
app and middleware for clarity.

Really, the only hole in this approach is dealing with applications 
that block.  The elephant in the room here is that while it's easy to 
write these example applications so they don't block, in practice 
people read files and do database queries and whatnot in their 
requests, and those APIs are generally synchronous.  So, unless they 
somehow fold their entire application into a future, it doesn't work.

>I liked the idea of having a separate async_read() method in 
>wsgi.input, which would set the underlying socket in nonblocking 
>mode and return a future. The event loop would watch the socket and 
>read data into a buffer and trigger the callback when the given 
>amount of data has been read. Conversely, .read() would set the 
>socket in blocking mode. What kinds of problems would this cause?

That you could never *call* the .read() method outside of a future, 
or else you would block the server, thereby obliterating the point of 
having the async API in the first place.