[Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?)

Manlio Perillo manlio_perillo at libero.it
Fri Mar 7 11:30:29 CET 2008


Brian Smith ha scritto:
> Manlio Perillo wrote:
>> Brian Smith ha scritto:
>>> We already have standard mechanisms for doing something
>>> similar in WSGI: multi-threaded and multi-process WSGI
>>> gateways that let applications block indefinitely while
>>> letting other applications run.
>> Ok, but this is not the best solution to the problem!
> 
> Why not?
> 
>>> I think it is safe to say that multi-threaded or multi-process 
>>> execution is something that is virtually required for WSGI.
>> but only if the application is synchronous and heavy I/O bound.
> 
> Isn't that almost every WSGI application?
> 

I'm not sure that a generic application that uses a database can be 
considered *heavy* I/O bound.

Compare, as an example, a query to a database that can take up to 0.2 
seconds with an HTTP request to a web service that can take up to 2 seconds.


>> Note that Nginx is multi-process, but it only executes a 
>> fixed number of worker processes, so if an I/O request can 
>> block for a significative amount of time, you can not afford 
>> to let it block.
> 
> Can't you just increase the number of processes?
> 

Yes, but you should agree withe me that the asynchronous solution is 
more optimized.

Moreover my application needs to run in a shared hosting, where there is 
a limit on the mumber of processes an user can execute.

I can not run too many worker processes.


>> Moreover with an asynchronous gateway it is possible to 
>> implement a "middleware" that can execute an application 
>> inside a thread.
>>
>> This is possible by creating a pipe, starting a new thread, 
>> having the main thread polling the pipe, and having the 
>> thread write some data in the pipe to "wake" the main thread 
>> when finish its job.
> 
> Right. This is exactly what I was saying. By using
> multiprocessing/multithreading, each application can block as much as it
> wants. 
> 

Ok, but the middleware *needs* the poll extension :).

So the best solution, IMHO, is to implement the WSGI 1.0 spec for Nginx, 
and then implement a pure Python middleware/adapter that will execute a 
WSGI 2.0 application in a thread.

However if some corrections are going to be implemented in WSGI 2.0, I 
would like to have them "backported" to WSGI 1.1, as an example.

>>> Again, I like the simplification that WSGI 2.0 applications 
>>> are always functions or function-like callables, and never
>>> iterables. 
>> Where is the simplification?
> 
> My understanding is that the application callable never returns an
> interator (it never yields, it only returns). This is simpler to explain
> to people that are new to WSGI. 

This is indeed true.
I too found some problems when I first read the WSGI specification.

*However* now it seems to me the most natural API.
It only needs some practice.

> It also simplifies the language in the
> specification. The difference is basically immaterial to WSGI gateway
> implementers, but that is because the WSGI specification is biased
> towards making gateways simple to implement.
> 

No, it also make it simpler to implement.

 > [...]
>>> We can just say that WSGI-2.0-style applications must 
>>> support chunked request bodies, but gateways are not
>>> required to support them.
>>> WSGi-2.0-style applications would have to check for 
>>> CONTENT_LENGTH, and if that is missing, check to see if 
>>> environ['HTTP_TRANSFER_ENCODING'] includes the "chunked"
>>> token. wsgi_input.read() would have to stop at the end
>>> of the request; applications would not restricted from
>>> attempting to read more than CONTENT_LENGTH bytes.
>>>
>>> WSGI gateways would have to support an additional 
>>> (keyword?) argument to wsgi.input.read() that
>>> controls whether it is blocking or non-blocking.
>>> It seems pretty simple.
>> How should be written an application to use this feature?
> 
> For chunked request bodies: instead of reading until exactly
> CONTENT_LENGTH bytes have been read, keep reading until
> environ["wsgi.input"].read(chunk_size) returns "".
> 
> For "non-blocking reads", given environ["wsgi.input"].read(64000,
> min=8000):
> 
> 1. If more than 64000 bytes are available without blocking, 8192 bytes
> are returned.
> 2. If less than 8000 bytes are available without blocking, then the
> gateway blocks until at least 1024 bytes are available.
> 3. When 8000-63999 bytes are available, then all those bytes are
> returned.
> 

Ok.

> [...]
 >

> My understanding is that nginx completely buffers all input, so that all
> reads from wsgi.input are basically non-blocking.
> 

Right.
This makes my life easier, since I can just use a cStringIO of File 
object :).

However in future the Nginx author is planning to add support for input 
filters and chunked request bodies.

At that time, I will implement an extension that will allow a non 
blocking (asynchronous) reading from wsgi.input.


> - Brian
> 


Manlio Perillo


More information about the Web-SIG mailing list