[Web-SIG] WSGI & transfer-encodings

Thu Sep 16 22:22:04 CEST 2004

At 03:03 PM 9/16/04 -0400, James Y Knight wrote:

>On Sep 16, 2004, at 2:30 PM, Phillip J. Eby wrote:
>
>>Hm.  An interesting conundrum.  Do any Python servers or applications 
>>exist today that *work* when there's no content-length?
>
>Unknown.
>
>>Personally, I'm thinking that WSGI should follow CGI here, and decode 
>>incoming transfer encodings.  If this means HTTP/1.1 servers have to dump 
>>the incoming data to a file first, so be it.
>
>Following CGI means: do not allow requests without a Content-Length. No 
>servers I know of will dump the data to a file to determine the length 
>first before sending to a CGI. I would not ask them to either: that's like 
>saying "Pleeease denial of service me!". And, really, the only place I've 
>seen incoming chunked requests used is for streaming data -- and that will 
>"never" finish.

Hm.  I suppose it's in theory possible that one could write some kind of 
streaming-over-HTTP application with WSGI.  So I guess we should consider 
allowing it.

>>>The only way to tell if there's incoming data is therefore to attempt to 
>>>read() the input stream. read() will either immediately return an EOF 
>>>condition (returning '') or else read the data. Also, it seems that 
>>>read() with no args isn't allowed? Perhaps it should be.
>>
>>A no-argument read would be problematic in some environments -- CGI for 
>>example.
>
>No -- CGI requires CONTENT_LENGTH, so in the CGI environment it is 
>perfectly possible to simulate EOF at the end of the data.

I mainly meant that environments like CGI already have a suitable file-like 
object for use as 'wsgi.input', and that supporting 'read()' with no 
arguments requires implementing a replacement 'wsgi.input'.

>>>- Wouldn't providing pre-encoded data screw up middleware that is 
>>>expecting to do something useful with the data going through it?
>>
>>Yes, it would.  There are at least two ways to handle it, though:
>>
>>1. Don't use middleware that's not smart enough to handle your app's output
>>
>>2. Have the server or middleware munge HTTP_ACCEPT_ENCODING or other 
>>parameters on the way in to the application, so that the application (if 
>>written correctly) won't send data the server or middleware can't handle.
>
>You've confused Content-Encoding with Transfer-Encoding. TE is the request 
>header that goes with Transfer-Encoding response header. And according to 
>HTTP 1.1, chunked is always acceptable, so no amount of header munging can 
>change that. So under the "WSGI application is a HTTP origin server" 
>interpretation, all pieces of middleware must be prepared to deal with 
>chunked output. I think that's silly -- there is no reason for a WSGI 
>application to produce chunked-encoded strings, as it already has a way to 
>produce chunks via the iterator.

Fair enough; the only parts that has any business reading or writing 
chunked encoding is the "real" server; I'll update the PEP 333 "Other HTTP 
Features" section accordingly.

>>I like most of this, *except* that I'd like to leave open the option of 
>>an application providing transfer-encoding on its output.  I'd rather 
>>have servers and middleware set HTTP_ACCEPT_ENCODING to "identity;q=1.0, 
>>*;q=0" (or an empty string, or delete the entry), if they interpret 
>>content, and have applications be required to respect 
>>this.  Specifically, an application can only apply a content-encoding if 
>>it matches a non-zero quality in HTTP_ACCEPT_ENCODING.
>
>Again: I'm talking only about Transfer-Encoding, not Content-Encoding. 
>Content-Encoding is an end-to-end function and thus properly belongs to 
>the application. Transfer-Encoding is a hop-by-hop header, and properly 
>belongs to the server. If you want a transfer-encoded output, you can 
>always request it via a server-specific extension or configuration mechanism.
>
>Both Transfer-Encoding and Content-Encoding have a gzip argument, but 
>these mean significantly different things. The first is connection 
>compression, the second is transferring a compressed file over an 
>uncompressed connection.

Thanks for clearing up my confusion; between your explanation and RFC 2616 
I think I can now see how to clarify this.  In effect, WSGI applications 
*must not* send hop-by-hop headers or interpret them, and servers *should 
not* provide them to applications.  And WSGI middleware *must* follow RFC 
2616, section 13.5, regarding what headers may be changed in transit when.

One way of looking at it is that WSGI servers and middleware are like HTTP 
proxy servers, but using a private inter-server transport mechanism that 
effectively replaces any normal HTTP hop-by-hop control mechanisms.