[Web-SIG] ngx.poll extension (was Re: Are you going to convert Pylons code into Python 3000?)
Manlio Perillo
manlio_perillo at libero.it
Fri Mar 7 11:30:29 CET 2008
Brian Smith ha scritto:
> Manlio Perillo wrote:
>> Brian Smith ha scritto:
>>> We already have standard mechanisms for doing something
>>> similar in WSGI: multi-threaded and multi-process WSGI
>>> gateways that let applications block indefinitely while
>>> letting other applications run.
>> Ok, but this is not the best solution to the problem!
>
> Why not?
>
>>> I think it is safe to say that multi-threaded or multi-process
>>> execution is something that is virtually required for WSGI.
>> but only if the application is synchronous and heavy I/O bound.
>
> Isn't that almost every WSGI application?
>
I'm not sure that a generic application that uses a database can be
considered *heavy* I/O bound.
Compare, as an example, a query to a database that can take up to 0.2
seconds with an HTTP request to a web service that can take up to 2 seconds.
>> Note that Nginx is multi-process, but it only executes a
>> fixed number of worker processes, so if an I/O request can
>> block for a significative amount of time, you can not afford
>> to let it block.
>
> Can't you just increase the number of processes?
>
Yes, but you should agree withe me that the asynchronous solution is
more optimized.
Moreover my application needs to run in a shared hosting, where there is
a limit on the mumber of processes an user can execute.
I can not run too many worker processes.
>> Moreover with an asynchronous gateway it is possible to
>> implement a "middleware" that can execute an application
>> inside a thread.
>>
>> This is possible by creating a pipe, starting a new thread,
>> having the main thread polling the pipe, and having the
>> thread write some data in the pipe to "wake" the main thread
>> when finish its job.
>
> Right. This is exactly what I was saying. By using
> multiprocessing/multithreading, each application can block as much as it
> wants.
>
Ok, but the middleware *needs* the poll extension :).
So the best solution, IMHO, is to implement the WSGI 1.0 spec for Nginx,
and then implement a pure Python middleware/adapter that will execute a
WSGI 2.0 application in a thread.
However if some corrections are going to be implemented in WSGI 2.0, I
would like to have them "backported" to WSGI 1.1, as an example.
>>> Again, I like the simplification that WSGI 2.0 applications
>>> are always functions or function-like callables, and never
>>> iterables.
>> Where is the simplification?
>
> My understanding is that the application callable never returns an
> interator (it never yields, it only returns). This is simpler to explain
> to people that are new to WSGI.
This is indeed true.
I too found some problems when I first read the WSGI specification.
*However* now it seems to me the most natural API.
It only needs some practice.
> It also simplifies the language in the
> specification. The difference is basically immaterial to WSGI gateway
> implementers, but that is because the WSGI specification is biased
> towards making gateways simple to implement.
>
No, it also make it simpler to implement.
> [...]
>>> We can just say that WSGI-2.0-style applications must
>>> support chunked request bodies, but gateways are not
>>> required to support them.
>>> WSGi-2.0-style applications would have to check for
>>> CONTENT_LENGTH, and if that is missing, check to see if
>>> environ['HTTP_TRANSFER_ENCODING'] includes the "chunked"
>>> token. wsgi_input.read() would have to stop at the end
>>> of the request; applications would not restricted from
>>> attempting to read more than CONTENT_LENGTH bytes.
>>>
>>> WSGI gateways would have to support an additional
>>> (keyword?) argument to wsgi.input.read() that
>>> controls whether it is blocking or non-blocking.
>>> It seems pretty simple.
>> How should be written an application to use this feature?
>
> For chunked request bodies: instead of reading until exactly
> CONTENT_LENGTH bytes have been read, keep reading until
> environ["wsgi.input"].read(chunk_size) returns "".
>
> For "non-blocking reads", given environ["wsgi.input"].read(64000,
> min=8000):
>
> 1. If more than 64000 bytes are available without blocking, 8192 bytes
> are returned.
> 2. If less than 8000 bytes are available without blocking, then the
> gateway blocks until at least 1024 bytes are available.
> 3. When 8000-63999 bytes are available, then all those bytes are
> returned.
>
Ok.
> [...]
>
> My understanding is that nginx completely buffers all input, so that all
> reads from wsgi.input are basically non-blocking.
>
Right.
This makes my life easier, since I can just use a cStringIO of File
object :).
However in future the Nginx author is planning to add support for input
filters and chunked request bodies.
At that time, I will implement an extension that will allow a non
blocking (asynchronous) reading from wsgi.input.
> - Brian
>
Manlio Perillo
More information about the Web-SIG
mailing list