[ python-Bugs-1208304 ] urllib2's urlopen() method causes a memory leak

SourceForge.net noreply at sourceforge.net
Wed Jun 29 05:52:17 CEST 2005


Bugs item #1208304, was opened at 2005-05-25 09:20
Message generated for change (Comment added) made by jafo
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1208304&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Extension Modules
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Petr Toman (manekcz)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib2's urlopen() method causes a memory leak

Initial Comment:
It seems that the urlopen(url) methd of the urllib2 module 
leaves some undestroyable objects in memory.

Please try the following code:
==========================
if __name__ == '__main__':
  import urllib2
  a = urllib2.urlopen('http://www.google.com')
  del a # or a = None or del(a)
  
  # check memory on memory leaks
  import gc
  gc.set_debug(gc.DEBUG_SAVEALL)
  gc.collect()
  for it in gc.garbage:
    print it
==========================

In our code, we're using lots of urlopens in a loop and 
the number of unreachable objects grows beyond all 
limits :) We also tried a.close() but it didn't help.

You can also try the following:
==========================
def print_unreachable_len():
  # check memory on memory leaks
  import gc
  gc.set_debug(gc.DEBUG_SAVEALL)
  gc.collect()
  unreachableL = []
  for it in gc.garbage:
    unreachableL.append(it)
  return len(str(unreachableL))
  
if __name__ == '__main__':
  print "at the beginning", print_unreachable_len()

  import urllib2
  print "after import of urllib2", print_unreachable_len()

  a = urllib2.urlopen('http://www.google.com')
  print 'after urllib2.urlopen', print_unreachable_len()

  del a
  print 'after del', print_unreachable_len()
==========================

We're using WindowsXP with latest patches, Python 2.4
(ActivePython 2.4 Build 243 (ActiveState Corp.) based on
Python 2.4 (#60, Nov 30 2004, 09:34:21) [MSC v.1310 
32 bit (Intel)] on win32).

----------------------------------------------------------------------

>Comment By: Sean Reifschneider (jafo)
Date: 2005-06-29 03:52

Message:
Logged In: YES 
user_id=81797

I give up, this code is kind of a maze of twisty little
passages.  I did try doing "a.fp.close()" and that didn't
seem to help at all.  Couldn't really make any progress on
that though.  I also tried doing a "if a.headers.fp:
a.headers.fp.close()", which didn't do anything noticable.

----------------------------------------------------------------------

Comment By: Sean Reifschneider (jafo)
Date: 2005-06-29 03:27

Message:
Logged In: YES 
user_id=81797

I can reproduce this in both the python.org 2.4 RPM and in a
freshly built copy from CVS.  If I run a few thousand
urlopen()s, I get:

Traceback (most recent call last):
  File "/tmp/mt", line 26, in ?
  File "/tmp/python/dist/src/Lib/urllib2.py", line 130, in
urlopen
  File "/tmp/python/dist/src/Lib/urllib2.py", line 361, in open
  File "/tmp/python/dist/src/Lib/urllib2.py", line 379, in _open
  File "/tmp/python/dist/src/Lib/urllib2.py", line 340, in
_call_chain
  File "/tmp/python/dist/src/Lib/urllib2.py", line 1026, in
http_open
  File "/tmp/python/dist/src/Lib/urllib2.py", line 1001, in
do_open
urllib2.URLError: <urlopen error (24, 'Too many open files')>

Even if I do a a.close().  I'll investigate a bit further.

Sean

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2005-06-01 23:13

Message:
Logged In: YES 
user_id=11375

Confirmed.  The objects involved seem to be an HTTPResponse and the 
socket._fileobject wrapper; the assignment 'r.recv=r.read' around line 1013 
of urllib2.py seems to be critical to creating the cycle.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1208304&group_id=5470


More information about the Python-bugs-list mailing list