[Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?)

Fri Mar 7 01:29:07 CET 2008

Manlio Perillo wrote:
> Brian Smith ha scritto:
> > We already have standard mechanisms for doing something
> > similar in WSGI: multi-threaded and multi-process WSGI
> > gateways that let applications block indefinitely while
> > letting other applications run.
> 
> Ok, but this is not the best solution to the problem!

Why not?

> > I think it is safe to say that multi-threaded or multi-process 
> > execution is something that is virtually required for WSGI.
> 
> but only if the application is synchronous and heavy I/O bound.

Isn't that almost every WSGI application?

> Note that Nginx is multi-process, but it only executes a 
> fixed number of worker processes, so if an I/O request can 
> block for a significative amount of time, you can not afford 
> to let it block.

Can't you just increase the number of processes?

> Moreover with an asynchronous gateway it is possible to 
> implement a "middleware" that can execute an application 
> inside a thread.
> 
> This is possible by creating a pipe, starting a new thread, 
> having the main thread polling the pipe, and having the 
> thread write some data in the pipe to "wake" the main thread 
> when finish its job.

Right. This is exactly what I was saying. By using
multiprocessing/multithreading, each application can block as much as it
wants. 

> > Again, I like the simplification that WSGI 2.0 applications 
> > are always functions or function-like callables, and never
> > iterables. 
> 
> Where is the simplification?

My understanding is that the application callable never returns an
interator (it never yields, it only returns). This is simpler to explain
to people that are new to WSGI. It also simplifies the language in the
specification. The difference is basically immaterial to WSGI gateway
implementers, but that is because the WSGI specification is biased
towards making gateways simple to implement.

> Unfortunately right now Nginx does not supports trailing 
> headers, and I don't know if common browsers support them.

Right, trailers are not really that useful right now. Too many
applications expect to get all header fields first, and most people
don't even know about trailers in the first place.

> > We can just say that WSGI-2.0-style applications must 
> > support chunked request bodies, but gateways are not
> > required to support them.
> > WSGi-2.0-style applications would have to check for 
> > CONTENT_LENGTH, and if that is missing, check to see if 
> > environ['HTTP_TRANSFER_ENCODING'] includes the "chunked"
> > token. wsgi_input.read() would have to stop at the end
> > of the request; applications would not restricted from
> > attempting to read more than CONTENT_LENGTH bytes.
> > 
> > WSGI gateways would have to support an additional 
> > (keyword?) argument to wsgi.input.read() that
> > controls whether it is blocking or non-blocking.
> > It seems pretty simple.
> 
> How should be written an application to use this feature?

For chunked request bodies: instead of reading until exactly
CONTENT_LENGTH bytes have been read, keep reading until
environ["wsgi.input"].read(chunk_size) returns "".

For "non-blocking reads", given environ["wsgi.input"].read(64000,
min=8000):

1. If more than 64000 bytes are available without blocking, 8192 bytes
are returned.
2. If less than 8000 bytes are available without blocking, then the
gateway blocks until at least 1024 bytes are available.
3. When 8000-63999 bytes are available, then all those bytes are
returned.

The non-blocking behavior is useful when the application can process
arbitrary chunks of input without having all the input available. For
example, if you are transcoding a POSTed video, you probably can
transcode the video with arbitrarily-sized chunks of input. If you
already have 32K of input available, you don't really need to wait
around for 32K more input before you start processing. But, if you have
64K of input ready to process, then you might as well process all of it
at once.

My understanding is that nginx completely buffers all input, so that all
reads from wsgi.input are basically non-blocking.

- Brian