[Twisted-web] Re: [Web-SIG] WSGI woes
Alan Kennedy
py-web-sig at xhaus.com
Thu Sep 16 16:59:15 CEST 2004
[Phillip J. Eby]
> However, an asynchronous server isn't going to sit there in a loop
> calling next()! Presumably, it's going to wait until the previous
> string gets sent to the client, before calling next() again. And,
> it's presumably going to round-robin the active iterables through the
> threadpool, so that it doesn't keep blocking on iterables that aren't
> likely to have any data to produce as yet.
>
> Yes, this arrangement can still block threads sometimes, if there are
> only a few iterables active and they are waiting for some very slow
> async I/O. But the frequency of such blockages can be further reduced
> with a couple of extensions. Suppose there was an
> 'environ["async.sleep"]' and 'environ["async.wake"]'. A call to
> 'sleep' would mean, "don't bother iterating over me again until you
> get a 'wake' call".
and
> Anyway, my point here is that it's possible to get a pretty decent
> setup for async applications, without any need to actually modify the
> base WSGI spec. And, if you add some optional extensions, you can get
> an even smoother setup for async I/O.
>
> Finally, I'm open to trying to define the 'sleep/wake' facilities as
> "standard options" in WSGI, as well as clarifying the middleware
> control flow to support this better.
What would be really nice would be if there were some way for the
application to return, to event-based servers or gateways, an object
that could be included in the server's event loop, e.g. its select/poll
loop.
For example, if an application were waiting on return data from a
database, through a network socket, it could return that
database-connection-socket descriptor to the server. The server would
then check for activity on the database socket in its event loop, i.e.
select.poll.POLLIN. When this event, i.e. database data, appears, the
server can have *reasonable* confidence that a call to the applications
iterator will then yield data. Of course, it is not guaranteed that the
application will have data available (e.g. the database socket contains
half the data required by the app, or the database connection is shared
between multiple apps). But it's better than the application blocking.
But I can't think of any unified way to generalise this solution to
non-descriptor based event loops or applications. For example, what if
the application is waiting for data on a Queue.Queue? Or a
threading.Event? How could the application enable the server to check
for the Queue.Queue or threading.Event it awaits?
Perhaps the server could maintain an extra event loop for checking such
threaded event notification mechanisms? Or it could associate an "app
ready" flag with each client connection? It could go something like this:-
1. The application returns to the server an instance of a class that
indicates it will only generate content when a thread notification
primitive is set. Or perhaps the thread notification primitive has an
optional attribute of the returned iterable, e.g. if hasattr(iterable,
'ready_to_go'): etc
2. The server adds this thread notification primitive to its
lists/"event loop", or associates the notification primitive with the
descriptor for the incoming/outgoing client socket.
3. When the client socket becomes ready for output, the server checks
the ready_to_go flag on the application. If the flag is not set, it
simply passes over that individual socket to the next.
4. When the client socket is ready to consume output *and* the
application is ready to produce output, i.e. it's ready flag is set, the
server gets the data from the app's iterator and transmits it down the
client socket. The server could conceivably loop until either the client
socket is full or the application iterator is empty, and then just
suspend that client/application pair. Or it could spin that app->client
transfer into a separate dedicated thread.
I don't like the idea of adding callbacks to WSGI: that's too twisted
specific. I can picture, for example, a very simple coroutine based
async server that would not need to have callbacks. Instead, they would
simply yield a NO-OP state to the server/scheduler/dispatcher,
indicating they have no data ready right now.
And, of course, that's what we're really discussing here: server
scheduling, and how servers ensure that application output gets
transmitted to clients with maximum efficiency and timeliness. IMHO,
asynchronous server scheduling algorithms and concerns have no place in
core WSGI, although a well-designed optional extension to support
effiency might have a nice unification effect on python asynchronous
server architectures.
Just my €0,02
Regards,
Alan.
More information about the Web-SIG
mailing list