[issue8595] Unexpected default timeout in http-client-related libraries
New submission from Julian <python_org@somethinkodd.com>: Since Python 2.6, httplib has offered a timeout parameter for fetches. As the documentation explains, if this parameter is not provided, it uses the global default. What the document doesn't explain is httplib builds on top of the socket library. The socket library has a default timeout of None (i.e. forever). This may be an appropriate default for general sockets, but it is a poor default for httplib; typical http clients would use a timeout in the 2-10 second range. This problem is propagated up to urllib2, which sits on httplib, and further obscures that the default might be unsuitable.
From an inspection of the manuals, Python 3.0.1 suffers from the same problem except, the names have changed. urllib.response sits on http.client.
I, for one, made a brutal mistake of assuming that the "global default" would be some reasonable default for fetching web pages; I didn't have any specific timeout in mind, and was happy for the library to take care of it. Several million successful http downloads later, my server application thread froze waiting forever when talking to a recalcitrant web-server. I imagine others have fallen for the same trap. While an ideal solution would be for httplib and http.client to use a more generally acceptable default, I can see it might be far too late to make such a change without breaking existing applications. Failing that, I would recommend that the documentation for httplib, urllib, urllib2, http.client and urllib.request (+ any other similar libraries sitting on socket? FTP, SMTP?) be changed to highlight that the default global timeout, sans deliberate override, is to wait a surprisingly long time. ---------- assignee: docs@python components: Documentation, Library (Lib) messages: 104763 nosy: docs@python, oddthinking priority: normal severity: normal status: open title: Unexpected default timeout in http-client-related libraries type: behavior versions: Python 2.6, Python 2.7, Python 3.1 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue8595> _______________________________________
Senthil Kumaran <orsenthil@gmail.com> added the comment: I am not sure, there can be a default timeout value for client libraries like httplib and urllib2. Socket connection do have timeout and as you may have figured out already, the option in httplib and urllib methods is to set/override the socket._GLOBAL_DEFALT_TIMEOUT which is None by default (Wait indefinitely). Since client libraries are using a global, setting it at once place (say at httplib) has same timeout applicable for other modules within the same process. I see docs can highlight it more or perhaps link to sockets timeout information. ---------- nosy: +orsenthil title: Unexpected default timeout in http-client-related libraries -> Explain the default timeout in http-client-related libraries _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue8595> _______________________________________
Julian <python_org@somethinkodd.com> added the comment: @orsenthil: Consider the definition of httplib.HTTPConnection.__init__(), in Python 2.6. def __init__(self, host, port=None, strict=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT): This could be replaced with: def __init__(self, host, port=None, strict=None, timeout=10): or, perhaps better, def __init__(self, host, port=None, strict=None, timeout=httplib._HTTP_DEFAULT_TIMEOUT): This timeout value is passed to the call in socket.create_connection, so I believe if it is overriden, it only applies to the relevant sockets and not to all sockets globally. Note: I am not arguing here that this SHOULD be done - it would break existing applications, especially those that were written before Python 2.6 - merely that it COULD be done. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue8595> _______________________________________
Senthil Kumaran <orsenthil@gmail.com> added the comment: On Sun, May 02, 2010 at 03:45:09AM +0000, Julian wrote:
Note: I am not arguing here that this SHOULD be done - it would break existing applications, especially those that were written before Python 2.6 - merely that it COULD be done.
I get your point, Julian. What I was worried about is, is it the "correct thing" to do? Which I am not sure and I believe it is not as httplib and urllib are not client themselves but are libraries to build clients. httplib and urllib can be considered as convenient interfaces over underlying sockets. Breaking of existing apps is the next question, which should definitely be avoided. And an explanation in the docs certainly seems to be the way to go. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue8595> _______________________________________
Eric Smith <eric@trueblade.com> added the comment: I think you could preserve backward compatibility by doing something like the following (in httplib): _sentinel = object() __HTTP_DEFAULT_TIMEOUT = _sentinel In httplib.HTTPConnection.__init__(), in Python 2.6. def __init__(self, host, port=None, strict=None, timeout=None): if timeout is None: if _HTTP_DEFAULT_TIMEOUT is _sentinel: timeout = socket._GLOBAL_DEFAULT_TIMEOUT else: timeout = _HTTP_DEFAULT_TIMEOUT That way, if _HTTP_DEFAULT_TIMEOUT is never set, it will use the the socket timeout. Admittedly I'd rather see all uses of module globals go away, but I think this would be a good compromise. ---------- nosy: +eric.smith _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue8595> _______________________________________
Changes by Anders Sandvig <anders.sandvig@gmail.com>: ---------- nosy: +anders.sandvig _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue8595> _______________________________________
Antoine Pitrou <pitrou@free.fr> added the comment:
That way, if _HTTP_DEFAULT_TIMEOUT is never set, it will use the the socket timeout. Admittedly I'd rather see all uses of module globals go away, but I think this would be a good compromise.
Why not provide {httplib,urllib}.{set,get}defaulttimeout() instead? ---------- nosy: +pitrou _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue8595> _______________________________________
Eric Smith <eric@trueblade.com> added the comment: On 8/19/2010 9:14 AM, Antoine Pitrou wrote:
Why not provide {httplib,urllib}.{set,get}defaulttimeout() instead?
Yes, I'm assuming that's how _HTTP_DEFAULT_TIMEOUT would be set and queried. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue8595> _______________________________________
Change by Senthil Kumaran <senthil@uthcode.com>: ---------- assignee: docs@python -> orsenthil versions: +Python 3.10 -Python 2.6, Python 2.7, Python 3.1 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue8595> _______________________________________
participants (5)
-
Anders Sandvig
-
Antoine Pitrou
-
Eric Smith
-
Julian
-
Senthil Kumaran