Regarding socket timeouts in httplib

From the top of my head, I can come up with three (four) ways of
Consider the following code for retreieving a web page using httplib: def get_url(hostname, port, url, timeout=5): con = httplib.HTTPConnection(hostname, port, timeout) con.request("GET", url) res = con.getresponse() data = res.read() return res, data As expected, this will raise a socket.error if the client is unable to connect before the timeout has expired. However, once the connection has been made, the timeout parameter no longer has any effect and con.getresponse() will block forever if the server does not send any data. I think the reason for this is that the socket object created in HTTPConnection.connect() has a default timeout of 0 (i.e. it is never set explicitly): def connect(self): """Connect to the host and port specified in __init__.""" self.sock = socket.create_connection((self.host,self.port), self.timeout) For now I have been able to work around this by manually setting the timeout of the (private) socket object after calling con.request(), like this: ... con.request("GET", url) con.sock.settimeout(timeout) res = con.getresponse() ... However, I don't think this is a very good solution as it relies on knowledge about the inner workings of httplib (and I had to read the library source code to come up with it). properly solving the issue: 1) Documenting the timeout behavior and describing the above hack in the httplib documentation. 2) Modify HTTPConnection.connect() to set the timeout on the socket object after it has been created (using the same timeout as given on the HTTPConnection constructor). 3) Adding (optional) timeout parameters to HTTPConnection.getresponse() and HTTPResponse.read() (and possibly other functions with the same problem). 4) A combination of 2) and 3). Any thoughts on this? BTW: Once I figure out how, I wouldn't mind submitting a patch for either 2), 3) or 4), but personally I don't like 1). Anders

On Thu, Jul 1, 2010 at 10:33 AM, Anders Sandvig <anders.sandvig@gmail.com> wrote:
2) Modify HTTPConnection.connect() to set the timeout on the socket object after it has been created (using the same timeout as given on the HTTPConnection constructor).
It looks like urllib2 in trunk and urllib.request in py3k are also affected by this oddity. My vote is for option 2 -- i.e. consider it a bug that the timeout wasn't consistently applied. Schiavo Simon

On Thu, 01 Jul 2010 10:33:30 +0200, Anders Sandvig <anders.sandvig@gmail.com> wrote:
From the top of my head, I can come up with three (four) ways of properly solving the issue:
1) Documenting the timeout behavior and describing the above hack in the httplib documentation.
2) Modify HTTPConnection.connect() to set the timeout on the socket object after it has been created (using the same timeout as given on the HTTPConnection constructor).
3) Adding (optional) timeout parameters to HTTPConnection.getresponse() and HTTPResponse.read() (and possibly other functions with the same problem).
4) A combination of 2) and 3).
Any thoughts on this?
BTW: Once I figure out how, I wouldn't mind submitting a patch for either 2), 3) or 4), but personally I don't like 1).
FYI there's an open bug about this (or at least related to it): http://bugs.python.org/issue8595 -- R. David Murray www.bitdance.com
participants (3)
-
Anders Sandvig
-
R. David Murray
-
Simon Cross