[issue8035] urllib.request.urlretrieve hangs

Charles-Francois Natali report at bugs.python.org
Sun Apr 4 23:01:25 CEST 2010


Charles-Francois Natali <neologix at free.fr> added the comment:

Alright, what happens is the following:
- the file you're trying to retrieve is actually redirected, so the server send a HTTP/1.X 302 Moved Temporarily
- in urllib, when we get a redirection, we call redirect_internal:
    def redirect_internal(self, url, fp, errcode, errmsg, headers, data):
        if 'location' in headers:
            newurl = headers['location']
        elif 'uri' in headers:
            newurl = headers['uri']
        else:
            return
        void = fp.read()
        fp.close()
        # In case the server sent a relative URL, join with original:
        newurl = basejoin(self.type + ":" + url, newurl)
        return self.open(newurl)

the fp.read() is there to wait for the remote end to close connection
The problem, in this case, is that with Python 3.1, httplib uses HTTP/1.1 instead of HTTP/1.0 in version 2.6, and with HTTP/1.1 the server doesn't close the connection after sending the redirect (shown by tcpdump).
So, the process remains stuck on fp.read().
Now, in version 3.1, if we simply change Lib/http/client.py:628
from 
class HTTPConnection:

    _http_vsn = 11
    _http_vsn_str = 'HTTP/1.1'

to
class HTTPConnection:

    _http_vsn = 11
    _http_vsn_str = 'HTTP/1.0'

to use HTTP/1.0 instead, the retrieval works fine.

Obviously, this is not a good solution. Since the RFC doesn't seem to require the server to close the connection after sending a redirect, we'd probably better close the connection ourselves.

That's what the attached patch does, it simply removes the call to fp.read() before closing the connection. It also removes this for http_error_default, since if an error occurs, we probably want to close the connection as soon as possible instead of waiting for server to do so.

----------
keywords: +patch
nosy: +neologix
Added file: http://bugs.python.org/file16758/urllib_redirect.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8035>
_______________________________________


More information about the Python-bugs-list mailing list