[Python-Dev] CGIHTTPServer interactions with Internet Explorer

Thu, 15 Aug 2002 22:36:01 -0400

I'm currently researching some changes needed to solve a couple of bugs
(430160 and 428345) where Internet Explorer (ironically in the name of
Netscape compatibility, though as far as I can see Netscape stopped doing
this at about release 2) will send an extra CRLF over and above the
advertised Content-Length in a POST method input stream.

If the server closes the socket before removing this input, IE somehow gets
confused, and will (usually) send a second POST request, (most often)
followed by a GET request. [This had me tearing my hair out for three days
when writing PWP].

With Kevin Altis' help I have what appears to be a basic fix for
CGIHTTPServer, but there are a couple of points I'd appreciate some advice
on.

1) Although the basic code can use select() to ensure the input stream is no
longer readable (and therefore presumably flushed), I'm not confident enough
about the modifications to assert that they'll work when assembled with
Forking or Threading mixins. If anyone knows the code well enough to offer
an opinion it would be helpful.

2) I understand that the appropriate RFC mandates that SCRIPTS must not read
more than Content-Length bytes and believe this is the relevant quote:

> > When a CGI gets a POSTed request, the "message-body" appears on standard
> > input:
> >
> >   6.2. Request Message-Bodies
> >
> >    As there may be a data entity attached to the request, there
> MUST be a
> >    system defined method for the script to read these data.
> Unless defined
> >    otherwise, this will be via the 'standard input' file descriptor.
> >
> >    If the CONTENT_LENGTH value (see section 6.1.2) is non-NULL,
> the server
> >    MUST supply at least that many bytes to scripts on the standard input
> >    stream. Scripts are not obliged to read the data. Servers MAY signal
> >    an EOF condition after CONTENT_LENGTH bytes have been read, but are
> >    not obligated to do so. Therefore, scripts MUST NOT attempt to read
> >    more than CONTENT_LENGTH bytes, even if more data are available.

Clearly this would be significant for HTTP/1.1. Technically the change would
be the *server* reading the extra bytes and not the *script*. Under HTTP/1.0
I suspect I can assume nothing will break. I'm less happy if a persistent
connection is invoked, since I'm just sucking on the socket until it comes
up empty. This could clearly interfere with a request with a "Connection:
Keep-Alive" header. Does anyone know whether IE uses this header when it's
indulging in the error behavior?

The current first-round  patch is available under

https://sourceforge.net/tracker/?func=detail&aid=430160&group_id=5470&atid=1
05470

if anyone wants to test it and let me know of any problems or suggestions
for improvement.

regards
-----------------------------------------------------------------------
Steve Holden                                 http://www.holdenweb.com/
Python Web Programming                http://pydish.holdenweb.com/pwp/
-----------------------------------------------------------------------