[Web-SIG] Proposed specification: waiting for file descriptor events

Manlio Perillo manlio_perillo at libero.it
Fri May 23 00:21:13 CEST 2008


Christopher Stawarz ha scritto:
> On May 21, 2008, at 1:34 PM, Manlio Perillo wrote:
> 
>>>  Instead, the spec recommends that async servers pre-read the request 
>>> body
>>>  before invoking the app (either by default or as a configurable 
>>> option).
>>
>> This is the best solution most of the time (but not for all of the 
>> time), especially if the "server" can do some "pre-parsing" of 
>> multipart/form-data request body.
>>
>> In fact I plan to write a custom function (in C for Nginx) that will 
>> "reduce", as an example:
>>
>>   Content-Type: multipart/form-data; boundary=AaB03x
>>
>>   --AaB03x
>>   Content-Disposition: form-data; name="submit-name"
>>
>>   Larry
>>   --AaB03x
>>   Content-Disposition: form-data; name="files"; filename="file1.txt"
>>   Content-Type: text/plain
>>
>>   ... contents of file1.txt ...
>>   --AaB03x--
>>
>> to (not properly escaped):
>>
>> Content-Type: application/x-www-form-urlencoded
>>
>> submit-name=Larry&files.filename=file1.txt&files.ctype=text/plain&files.path=xxx 
>>
>>
>>
>> and the contents of file1.txt will be saved to a temporary file 'xxx'.
> 
> It seems like you're making this more complicated than it needs to be.  
> Why not just store the entire request body in a temporary file, and then 
> pass an open handle to it as wsgi.input?  

Because if you have a big file (like a video of > 100 MB), your 
application will block everything while parsing the request body.

Parsing the body incrementally is far more efficient (although it is 
more hard).


> That way, the server doesn't 
> have to rewrite the request, and the application doesn't need to know 
> how to interpret the files.* parameters.
> 

How to interpret the files.* parameters is not really a problem.

>> 1) Why not add a more generic poll like interface?
> 
> Because such an interface would be more complicated than what I've 
> proposed and harder for server authors to implement.  Also, I'm not sure 
> that it gains you much.
> 

Well, I have modelled my extension so that it has a "well know" 
interface and that it is not hard to implement.

But I have to say that I'm not sure if one want to poll multiple sockets.

Moreover in my implementation ngx.poll only returns one "ready" socket 
at a time.


By the way: I see a problem with you API.
What happens if an application do:

     read, write, exc = m.fdset()

     environ['x-wsgiorg.fdevent.readable'](read[0], 1.0)
     environ['x-wsgiorg.fdevent.writable'](write[0], 1.0)

     yield ''


There is no way to know, when the application is resumed, if the socket 
is ready for read or write.

This probabily should not be a problem, but I'm not sure.

> Note that I'm not 100% sure on this, as I tried to indicate in the "Open 
> Issues" section of my proposal.  The approach I'd like to take is to try 
> writing apps with my interface for a while, and if real-world usage 
> shows that a poll-like interface would be very useful (or necessary), 
> then the spec could be extended to add one.  I think this is a safe 
> route, since the readable/writable functions could easily be implemented 
> in terms of a more generic poll-like interface, so existing apps that 
> use the fdevent extensions would continue to work.
> 
>>   Moreover IMHO storing a timeout variable in the environ to check if
>>   the previous call timedout, is not the best solution.
> 
> I think it's a simple and effective solution.  Server authors don't need 
> to implement any new functions or data types.  They just create and hold 
> on to a mutable object instance (the simplest being a list instance) for 
> each app instance and toggle its truth value as required.
> 
>>   In my implementation I return a function, but with generators in
>>   Python 2.5 this can be done in a better way.
> 
> What advantage does this have over what I've proposed?
> 

You don't need to store a mutable variable in the environ.

>> 2) In Nginx it is not possible to simply handle "plain" file
>>   descriptors, since these are wrapped in a connection structure.
>>
>>   This is the reason why I had to add a connection_wrapper function in
>>   my WSGI module for Nginx.
> 
> But the connection structure just wraps an integer file descriptor, 
> right?  So the readable/writable functions can create the required 
> wrapper to register with nginx. There's no reason to make the 
> application author do it.
> 

The "problem" is that Ninx keeps a list of preallocated connection 
objects (the size of the list being controlled by worker_connections).

This means that a newly constructed connection *must* be freed as soon 
as it is no more used, otherwise it can limit the number of concurrent 
connections that can be handled by Nginx.

Since with my API (register/unregister) a connection should be kept 
alive until is is unregistered, I have choosen to create a wrapper for 
the Nginx connection object.


Probabily with your API it can be possible to create temporary wrappers.
But I don't know if this is a good idea.

> [...]


> Chris
> 


Manlio Perillo


More information about the Web-SIG mailing list