[Web-SIG] Proposed specification: waiting for file descriptor events
Manlio Perillo
manlio_perillo at libero.it
Fri May 23 00:21:13 CEST 2008
Christopher Stawarz ha scritto:
> On May 21, 2008, at 1:34 PM, Manlio Perillo wrote:
>
>>> Instead, the spec recommends that async servers pre-read the request
>>> body
>>> before invoking the app (either by default or as a configurable
>>> option).
>>
>> This is the best solution most of the time (but not for all of the
>> time), especially if the "server" can do some "pre-parsing" of
>> multipart/form-data request body.
>>
>> In fact I plan to write a custom function (in C for Nginx) that will
>> "reduce", as an example:
>>
>> Content-Type: multipart/form-data; boundary=AaB03x
>>
>> --AaB03x
>> Content-Disposition: form-data; name="submit-name"
>>
>> Larry
>> --AaB03x
>> Content-Disposition: form-data; name="files"; filename="file1.txt"
>> Content-Type: text/plain
>>
>> ... contents of file1.txt ...
>> --AaB03x--
>>
>> to (not properly escaped):
>>
>> Content-Type: application/x-www-form-urlencoded
>>
>> submit-name=Larry&files.filename=file1.txt&files.ctype=text/plain&files.path=xxx
>>
>>
>>
>> and the contents of file1.txt will be saved to a temporary file 'xxx'.
>
> It seems like you're making this more complicated than it needs to be.
> Why not just store the entire request body in a temporary file, and then
> pass an open handle to it as wsgi.input?
Because if you have a big file (like a video of > 100 MB), your
application will block everything while parsing the request body.
Parsing the body incrementally is far more efficient (although it is
more hard).
> That way, the server doesn't
> have to rewrite the request, and the application doesn't need to know
> how to interpret the files.* parameters.
>
How to interpret the files.* parameters is not really a problem.
>> 1) Why not add a more generic poll like interface?
>
> Because such an interface would be more complicated than what I've
> proposed and harder for server authors to implement. Also, I'm not sure
> that it gains you much.
>
Well, I have modelled my extension so that it has a "well know"
interface and that it is not hard to implement.
But I have to say that I'm not sure if one want to poll multiple sockets.
Moreover in my implementation ngx.poll only returns one "ready" socket
at a time.
By the way: I see a problem with you API.
What happens if an application do:
read, write, exc = m.fdset()
environ['x-wsgiorg.fdevent.readable'](read[0], 1.0)
environ['x-wsgiorg.fdevent.writable'](write[0], 1.0)
yield ''
There is no way to know, when the application is resumed, if the socket
is ready for read or write.
This probabily should not be a problem, but I'm not sure.
> Note that I'm not 100% sure on this, as I tried to indicate in the "Open
> Issues" section of my proposal. The approach I'd like to take is to try
> writing apps with my interface for a while, and if real-world usage
> shows that a poll-like interface would be very useful (or necessary),
> then the spec could be extended to add one. I think this is a safe
> route, since the readable/writable functions could easily be implemented
> in terms of a more generic poll-like interface, so existing apps that
> use the fdevent extensions would continue to work.
>
>> Moreover IMHO storing a timeout variable in the environ to check if
>> the previous call timedout, is not the best solution.
>
> I think it's a simple and effective solution. Server authors don't need
> to implement any new functions or data types. They just create and hold
> on to a mutable object instance (the simplest being a list instance) for
> each app instance and toggle its truth value as required.
>
>> In my implementation I return a function, but with generators in
>> Python 2.5 this can be done in a better way.
>
> What advantage does this have over what I've proposed?
>
You don't need to store a mutable variable in the environ.
>> 2) In Nginx it is not possible to simply handle "plain" file
>> descriptors, since these are wrapped in a connection structure.
>>
>> This is the reason why I had to add a connection_wrapper function in
>> my WSGI module for Nginx.
>
> But the connection structure just wraps an integer file descriptor,
> right? So the readable/writable functions can create the required
> wrapper to register with nginx. There's no reason to make the
> application author do it.
>
The "problem" is that Ninx keeps a list of preallocated connection
objects (the size of the list being controlled by worker_connections).
This means that a newly constructed connection *must* be freed as soon
as it is no more used, otherwise it can limit the number of concurrent
connections that can be handled by Nginx.
Since with my API (register/unregister) a connection should be kept
alive until is is unregistered, I have choosen to create a wrapper for
the Nginx connection object.
Probabily with your API it can be possible to create temporary wrappers.
But I don't know if this is a good idea.
> [...]
> Chris
>
Manlio Perillo
More information about the Web-SIG
mailing list