[Web-SIG] Implementing File Upload Size Limits

Robert Brewer fumanchu at aminus.org
Fri Nov 28 06:58:25 CET 2008


Graham Dumpleton wrote:
> 2008/11/28 Robert Brewer <fumanchu at aminus.org>:
> > CherryPy's wsgiserver will read any remaining request body (which
the
> > application hasn't read) before sending response headers.
> 
> A WSGI application could technically want to send response headers and
> only then read remaining request content. I don't believe there is
> anything in the WSGI specification which prevents that. If you are
> discarding the request content as soon as response headers are
> generated, that could technically be a problem for some use cases,
> even if they may be obscure.

I'll look into that further.

> I cant tell from looking at latest CherryPy WSGI server code as has
> been changed since last I looked at it and haven't yet had time to
> grok it and run some tests, but previously in respect of where WSGI
> specification says:
> 
> """The server is not required to read past the client's specified
> Content-Length, and is allowed to simulate an end-of-file condition if
> the application attempts to read past that point."""
> 
> the CherryPy WSGI server code chose NOT to simulate an end-of-file
> condition. This was the case as the amount of data read from
> wsgi.input was never tracked. This meant that if application did try
> and read more content than available and request pipelining occurring
> then the read would hang as would not get an empty string returned as
> would be normal for end-of-file condition for file like object.
> 
> If the code is still behaving this way, then it wouldn't be possible
> for it to discard remaining input as how much was read wasn't tracked.
> 
> Looking at latest code I do note the presence of a wrapper around
> socket used for wsgi.input, but haven't been able to work out yet
> whether it returns a traditional empty string as end-of-file
> condition, or whether it is going to instead raise your
> MaxSizeExceeded exception and thus not be file like in it behaviour.

It still raises MaxSizeExceeded.

> Can you perhaps explain what is going to happen when an attempt is
> made to read more content than what was available and whether it is
> actually going to raise an exception rather than just return an empty
> string like file like objects would.
> 
> Personally I think that that part of WSGI specification should be
> amended such that it is required that an end-of-file condition MUST be
> indicated using an empty string just like with normal file like
> objects. Just this one change would mean that one could call read()
> with no arguments and have it return all input, whereas at the moment
> WSGI specification does allow argument to read() be optional.
> 
> This would actually negate the whole need for applications to even
> check/use CONTENT_LENGTH except for situations where it mattered such
> as 413 response or where how it decided to process it was dependent on
> size. That is, to get all request content you would just call read()
> with no argument. If you wanted to process it in chunks, then it would
> just loop reading a set chunk size until empty string returned and it
> wouldn't need to track how much it read and short read the last chunk.
> If applications worked this way then one could handle mutating input
> filters that changed amount of request content, ie., decompression of
> data, plus could handle chunked transfer encoding on request content
> in a reasonable way without having to read it all in and buffer it
> just to work out CONTENT_LENGTH.
> 
> Up till now, the only major WGSI server (ignoring wsgiref perhaps) I
> knew of which didn't allow read() with no argument or which didn't
> simulate end-of-file through empty string being returned was CherryPy
> WSGI server. Now its code has been changed, but not sure if it still
> does that or whether it has done something totally different to
> everything else by raising an exception instead.

I'd be open to changing it to EOF instead of error; amending the WSGI
spec would be nice too.


Robert Brewer
fumanchu at aminus.org



More information about the Web-SIG mailing list