[Web-SIG] PEP 444 / WSGI 2 Async

Thu Jan 6 05:01:47 CET 2011

[Apologies if this is a double- or triple-post; I seem to be having a 
stupid number of connectivity problems today.]

Howdy!

Apologies for the delay in responding, it’s been a hectic start to the 
new year.  :)

On 2011-01-03, at 6:22 AM, Timothy Farrell wrote:

> You don't know me but I'm the author of the Rocket Web Server 
> (http://pypi.python.org/pypi/rocket) and have, in the past, been 
> involved in the web2py community.  Like you, I'm interested in seeing 
> web development come to Python3.  I'm glad you're taking up WSGI2.  I 
> have a feature-request for it that perhaps we could work in.

Of course; in fact, I hope you don’t mind that I’ve re-posted this 
response to the web-sig mailing list.  Async needs significantly 
broader discussion.  I would appreciate it if you could reply to the 
mailing list thread.

> I would like to see futures added as a server option.  This way, 
> controllers could dispatch emails (or run some other blocking or 
> long-running task) that would not block the web-response.  WSGI2 
> Servers could provide a futures executor as environ['wsgi.executor'] 
> that the app could use to offload processes that need not complete 
> before the web-request is served to the client.

E-mail dispatch is one of the things I solved a long time ago with 
TurboMail; it uses a dedicated thread pool and can deliver > 100 unique 
messages per second (more if you use BCC) in the default configuration, 
so I don’t really see that one use case as one that can benefit from 
the futures module.  Updating TurboMail to use futures would be an 
interesting exercise.  ;)

I was thinking of exposing the executor as 
environ[‘wsgi.async.executor’], with ‘wsgi.async’ being a boolean value 
indicating support.

> What should the server do with the future instances?

The executor returns future instances when running executor.submit/map; 
the application never generates its own Future instances.  The 
application may, however, use whatever executor it sees fit; it can, 
for example, have one thread pool executor and one process pool, used 
for different tasks.

The server itself can utilize any combination of single-threaded 
IO-based async (see further on in this message), and multi-threaded or 
multi-process management of WSGI requests.  Resuming suspended 
applications (ones pending future results) is an implementation detail 
of the server.

> Should future.add_done_callback() be allowed?  I'm not sure how 
> practical/reliable this would be. (By the time the callback is called, 
> the calling environment could be gone.  Is this undefined behavior?)

If you wrap your callback in a partial(my_callback, environ) the 
environ will survive the end of the request/response cycle (due to the 
incremented reference count), and should be allowed to enable 
intelligent behaviour in the callbacks.  (Obviously the callbacks will 
not be able to deliver a response to the client at the time they are 
called; the body iterator can, however, wait for the future instance to 
complete and/or timeout.)

A little bit later in this message I describe a better solution than 
the application registering its own callbacks.

> Do we need to also specify what type of executor is provided (threaded 
> vs. separate process)?

I think that’s an application-specific configuration issue, not really 
the concern of the PEP.

> Do you have any thoughts about this?

I believe that intelligent servers need some way to ‘pause’ a WSGI 
worker rather than relying on the worker executing in a thread and 
blocking while waiting for the return value of a future.  Using 
generator syntax (yield) with the following rules is my initial idea:

* The application may yield None.  This is a polite way to have the 
async reactor (in the WSGI server/gateway) reschedule the worker for 
the next reactor cycle.  Useful as a hint that “I’m about do do 
something that may take a moment”, allowing other workers to get a 
chance to perform work. (Cooperative multi-tasking on single-threaded 
async servers.)

* The application must yield one 3-tuple WSGI response, and must not 
yield additional data afterwords.  This is usually the last thing the 
WSGI application would do, with possible cleanup code afterwords 
(before falling off the bottom / raising StopIteration / returning 
None).

* The application may yield Future instances returned by 
environ[‘wsgi.executor’].submit/map; the worker will then be paused 
pending execution of the future; the return value of the future will be 
returned from the yield statement.  Exceptions raised by the future 
will be re-raised from the yield statement and can thus be captured in 
a natural way.  E.g.:

	try:
	    complex_value = yield environ[‘wsgi.executor’].submit(long_running)
	except:
	    pass # handle exceptions generated from within long_running

Similar rules apply to the response body iterator: it yields 
bytestrings, may yield unicode strings where native strings are unicode 
strings, and may yield Future instances which will pause the body 
iterator as per the application callable.

Servers must:

* Allow configuration of the future implementation for options like 
threading / processes.

* Allow developers to override the executor completely.

* Provide additional attributes on wsgi.input: async_ prefixed versions 
of the read methods, which are factories returning server-specific 
Future instances.  (Allowing a single-threaded async server to handle 
socket IO intelligently with select/epoll/etc.)

To the libraries you use, futures make async pretty much transparent. 
 E.g. libraries (such as a DB layer) must not create their own Future 
objects, but must instead utilize an executor passed to them explicitly 
by the application.

My ideas thus far,

	— Alice.

P.s. a number of these ideas (wsgi.executor, wsgi.async, some of the 
yield syntax described above) have been soundly argued against by a 
co-conspirator over IRC.  I’ll re-read my IRC logs and reply with those 
considerations in mind (and transcribed logs) shortly.

P.p.s. my kernel panicked while I was translating my rewrite into ReST; 
I'll re-do the conversion tonight or tomorrow morning and submit it 
downstream ASAP.