Simple Python web proxy stalls for some web sites

bryanjugglercryptographer at yahoo.com bryanjugglercryptographer at yahoo.com
Thu Oct 7 23:10:35 CEST 2004


Carl Waldbieser wrote:
[...]
> I have written a simple web proxy using the Python standard library
> BaseHTTPRequestHandler.
[...]
> Some web sites work fine (e.g. www.python.org).  However, some web
> sites simply seem to stall indefinitely (e.g. www.google.com).  If I
set
> the same browser to connect directly to the Internet, the site comes
up
> close to immediately.
>
> If anybody has any ideas about why this happens, or any coding
mistakes I
> may have made, I would appreciate the feedback.

[...]
>             f = urllib2.urlopen(request)
[...]
>             print "Reading..."
>             s = f.read()

This is trying to read until the connection closes, but it's an
HTTP/1.1 connection (and Google usually even sends "connection:
Keep-Alive"), so it won't close after it responds to this one
request.  The "content-length" header tells you how much to read
in this case.

Google HTTP/1.1 query results are even trickier; they typically
come back with "Transfer-Encoding: chunked", and, if you sent
the right Accept-Encoding header, will also usually have
"Content-Encoding: gzip".

See RFC 2616 for the requirements to be a true HTTP/1.1 proxy.
BaseHTTPRequestHandler and urllib2 are not really up to the job.
-- 
--Bryan




More information about the Python-list mailing list