[Web-SIG] WSGI & transfer-encodings
James Y Knight
foom at fuhm.net
Thu Sep 16 21:03:53 CEST 2004
On Sep 16, 2004, at 2:30 PM, Phillip J. Eby wrote:
> Hm. An interesting conundrum. Do any Python servers or applications
> exist today that *work* when there's no content-length?
Unknown.
> Personally, I'm thinking that WSGI should follow CGI here, and decode
> incoming transfer encodings. If this means HTTP/1.1 servers have to
> dump the incoming data to a file first, so be it.
Following CGI means: do not allow requests without a Content-Length. No
servers I know of will dump the data to a file to determine the length
first before sending to a CGI. I would not ask them to either: that's
like saying "Pleeease denial of service me!". And, really, the only
place I've seen incoming chunked requests used is for streaming data --
and that will "never" finish.
>> The only way to tell if there's incoming data is therefore to attempt
>> to read() the input stream. read() will either immediately return an
>> EOF condition (returning '') or else read the data. Also, it seems
>> that read() with no args isn't allowed? Perhaps it should be.
>
> A no-argument read would be problematic in some environments -- CGI
> for example.
No -- CGI requires CONTENT_LENGTH, so in the CGI environment it is
perfectly possible to simulate EOF at the end of the data. read could
look something like this:
class CGIReq:
def __init__(self):
self.maxlength = int(environ.get('CONTENT_LENGTH', 0))
def read(self, length=None):
if length is None:
length = self.maxlength
else:
length = min(self.maxlength, length)
data = sys.stdin.read(length)
self.maxlength -= len(data)
return data
>> - Wouldn't providing pre-encoded data screw up middleware that is
>> expecting to do something useful with the data going through it?
>
> Yes, it would. There are at least two ways to handle it, though:
>
> 1. Don't use middleware that's not smart enough to handle your app's
> output
>
> 2. Have the server or middleware munge HTTP_ACCEPT_ENCODING or other
> parameters on the way in to the application, so that the application
> (if written correctly) won't send data the server or middleware can't
> handle.
You've confused Content-Encoding with Transfer-Encoding. TE is the
request header that goes with Transfer-Encoding response header. And
according to HTTP 1.1, chunked is always acceptable, so no amount of
header munging can change that. So under the "WSGI application is a
HTTP origin server" interpretation, all pieces of middleware must be
prepared to deal with chunked output. I think that's silly -- there is
no reason for a WSGI application to produce chunked-encoded strings, as
it already has a way to produce chunks via the iterator.
>> I would suggest that that the correct answer is: the application
>> should have nothing to do with any connection oriented behavior. It
>> should not send a Connection or Transfer-Encoding header and should
>> not expect to receive the Connection, Keep-Alive, TE, Trailers,
>> Transfer-Encoding, or Upgrade headers, although it is optional for
>> the server to strip them. The application should not apply a
>> transfer-encodng to its output and the server should not give it a
>> transfer-encoded input.
>
> I like most of this, *except* that I'd like to leave open the option
> of an application providing transfer-encoding on its output. I'd
> rather have servers and middleware set HTTP_ACCEPT_ENCODING to
> "identity;q=1.0, *;q=0" (or an empty string, or delete the entry), if
> they interpret content, and have applications be required to respect
> this. Specifically, an application can only apply a content-encoding
> if it matches a non-zero quality in HTTP_ACCEPT_ENCODING.
Again: I'm talking only about Transfer-Encoding, not Content-Encoding.
Content-Encoding is an end-to-end function and thus properly belongs to
the application. Transfer-Encoding is a hop-by-hop header, and properly
belongs to the server. If you want a transfer-encoded output, you can
always request it via a server-specific extension or configuration
mechanism.
Both Transfer-Encoding and Content-Encoding have a gzip argument, but
these mean significantly different things. The first is connection
compression, the second is transferring a compressed file over an
uncompressed connection.
James
More information about the Web-SIG
mailing list