[Web-SIG] WSGI input filter that changes content length.

Mon Jan 15 11:56:37 CET 2007

[Graham Dumpleton]
> How does one implement in WSGI an input filter that manipulates the request
> body in such a way that the effective content length would be changed?

> The problem I am trying to address here is how one might implement using WSGI a
> decompression filter for the body of a request. Ie., where "Content-Encoding:
> gzip" has been specified.

> So, how is one meant to deal with this in WSGI?

The usual approach to modifying something something in the WSGI
environment, in this case the wsgi.input file-like object, is to wrap
it or replace it with an object that behaves as desired.

In this case, the approach I would take would be to wrap the
wsgi.input object with a gzip.GzipFile object, which should only read
the input stream data on demand. The code would look like this

import gzip
wsgi_env['wsgi.input'] = gzip.GzipFile(wsgi_env['wsgi.input'])

Notes.

1. The application should be completely unaware that it is dealing
with a compressed stream: it simply reads from wsgi.input, unaware
that reading from what it thinks the input stream is actually causing
cascading reads down a series of file-like objects.

2. The GzipFile object will decompress on the fly, meaning that it
will only read from the wrapped input stream when it needs input.
Which means that if the application does not read data from
wsgi.input, then no data will be read from the client connection.

3. The GzipFile should not be responsible for enforcement of the
incoming Content-Length boundary. Instead, this should be enforced by
the original server-provided file-like input stream that it wraps. So
if the application attempts to read past Content-Length bytes, the
server-provided input stream "is allowed to simulate an end-of-file
condition". Which would cause the GzipFile to return an EOF to the
application, or possibly an exception.

4. Because of the on-the-fly nature of the GzipFile decompression, it
would not be possible to provide a meaningful Content-Length value to
the application. To do so would require buffering and decompressing
the entire input data stream. But the application should still be able
to operate without knowing Content-Length.

5. The wrapping can NOT be done in middleware. PEP 333, Section "Other
HTTP Features" has this to say: "WSGI applications must not generate
any "hop-by-hop" headers [4], attempt to use HTTP features that would
require them to generate such headers, or rely on the content of any
incoming "hop-by-hop" headers in the environ dictionary. WSGI servers
must handle any supported inbound "hop-by-hop" headers on their own,
such as by decoding any inbound Transfer-Encoding, including chunked
encoding if applicable." So the wrapping and replacement of wsgi.input
should happen in the server or gateway, NOT in middleware.

6. Exactly the same principles should apply to decoding incoming
Transfer-Encoding: chunked.

HTH,

Alan.

P.S. Thanks for all your great work on mod_python Graham!