[ python-Bugs-1601399 ] urllib2 does not close sockets properly

Thu Jan 4 00:54:26 CET 2007

Bugs item #1601399, was opened at 2006-11-22 21:04
Message generated for change (Comment added) made by jjlee
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1601399&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Brendan Jurd (direvus)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib2 does not close sockets properly

Initial Comment:
Python 2.5 (release25-maint, Oct 29 2006, 12:44:11)
[GCC 4.1.2 20061026 (prerelease) (Debian 4.1.1-18)] on linux2

I first noticed this when a program of mine (which makes a brief HTTPS connection every 20 seconds) started having some weird crashes.  It turned out that the process had a massive number of file descriptors open.  I did some debugging, and it became clear that the program was opening two file descriptors for every HTTPS connection it made with urllib2, and it wasn't closing them, even though I was reading all data from the response objects and then explictly calling close() on them.

I found I could easily reproduce the behaviour using the interactive console.  Try this while keeping an eye on the file descriptors held open by the python process:

To begin with, the process will have the usual FDs 0, 1 and 2 open for std(in|out|err), plus one other.

>>> import urllib2
>>> f = urllib2.urlopen("http://www.google.com")

Now at this point the process has opened two more sockets.

>>> f.read()
[... HTML ensues ...]
>>> f.close()

The two extra sockets are still open.

>>> del f

The two extra sockets are STILL open.

>>> f = urllib2.urlopen("http://www.python.org")
>>> f.read()
[...]
>>> f.close()

And now we have a total of four abandoned sockets open.

It's not until you terminate the process entirely, or the OS (eventually) closes the socket on idle timeout, that they are closed.

Note that if you do the same thing with httplib, the sockets are properly closed:

>>> import httplib
>>> c = httlib.HTTPConnection("www.google.com", 80)
>>> c.connect()

A socket has been opened.

>>> c.putrequest("GET", "/")
>>> c.endheaders()
>>> r = c.getresponse()
>>> r.read()
[...]
>>> r.close()

And the socket has been closed.

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2007-01-03 23:54

Message:
Logged In: YES 
user_id=261020
Originator: NO

Confirmed.  The cause is the (ab)use of socket._fileobject by
urllib2.AbstractHTTPHandler to provide .readline() and .readlines()
methods.  _fileobject simply does not close the socket on
_fileobject.close() (since in the original intended use of _fileobject,
_socketobject "owns" the socket, and _fileobject only has a reference to
it).  The bug was introduced with the upgrade to HTTP/1.1 in revision
36871.

The patch here fixes it:

http://python.org/sf/1627441

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1601399&group_id=5470