[issue1208304] urllib2's urlopen() method causes a memory leak

BULOT report at bugs.python.org
Wed Jun 3 15:31:31 CEST 2009


BULOT <stephane at bulot.org> added the comment:

Hello, 

I'm facing a urllib2 memory leak issue in one of my scripts that is not
threaded. I made a few tests in order to check what was going on and I
found this already existing bug thread (but old).

I'm not able to figure out what is the issue yet, but here are a few
informations:
Platform: Debian
Python version 2.5.4

I made a script (2 attached files) in order to make access to a web page
(http://www.google.com) every second, that monitors number of file
descriptors and memory footprint.
I also introduced the gc module (Garbage Collector) in order to retrieve
numbers of objects that are not freed (like already proposed in this
thread but more focussed on gc.DEBUG_LEAK flag)

Here are my results:
First acces output:
gc: collectable <dict 0xb793c604>
gc: collectable <HTTPResponse instance at 0xb7938f6c>
gc: collectable <dict 0xb793c4f4>
gc: collectable <HTTPMessage instance at 0xb793d0ec>
gc: collectable <dict 0xb793c02c>
gc: collectable <list 0xb7938e8c>
gc: collectable <list 0xb7938ecc>
gc: collectable <instancemethod 0xb79cf824>
gc: collectable <dict 0xb793c79c>
gc: collectable <HTTPResponse instance at 0xb793d2cc>
gc: collectable <instancemethod 0xb79cf874>
unreachable objects:  11
File descriptors number: 32
Memory: 4612

Thenth access:
gc: collectable <dict 0xb78f14f4>
gc: collectable <HTTPResponse instance at 0xb78f404c>
gc: collectable <dict 0xb78f13e4>
gc: collectable <HTTPMessage instance at 0xb78f462c>
gc: collectable <dict 0xb78e5f0c>
gc: collectable <list 0xb78eeb4c>
gc: collectable <list 0xb78ee2ac>
gc: collectable <instancemethod 0xb797b7fc>
gc: collectable <dict 0xb78f168c>
gc: collectable <HTTPResponse instance at 0xb78f442c>
gc: collectable <instancemethod 0xb78eaa7c>
unreachable objects:  110
File descriptors number: 32
Memory: 4680

After hundred access:
gc: collectable <dict 0x89e2e84>
gc: collectable <HTTPResponse instance at 0x89e3e2c>
gc: collectable <dict 0x89e2d74>
gc: collectable <HTTPMessage instance at 0x89e3ccc>
gc: collectable <dict 0x89db0b4>
gc: collectable <list 0x89e3cac>
gc: collectable <list 0x89e32ec>
gc: collectable <instancemethod 0x89d8964>
gc: collectable <dict 0x89e60b4>
gc: collectable <HTTPResponse instance at 0x89e50ac>
gc: collectable <instancemethod 0x89ddb1c>
unreachable objects:  1100
File descriptors number: 32
Memory: 5284

Each call to urllib2.urlopen() gives 11 new unreachable objects,
increases memory footprint without giving new open files.

Do you have any idea?
With the hack proposed in message
http://bugs.python.org/issue1208304#msg60751, number of unreachable
objects goes down to 8 unreachable objects remaining, but still memory
increases.

Regards.

stephbul

PS
My urlib2leak.py test calls monitor script (not able to attach it):
#! /bin/sh

PROCS='urllib2leak.py'

RUNPID=`ps aux | grep "$PROCS" | grep -v "grep" | awk '{printf $2}'`
FDESC=`lsof -p $RUNPID | wc -l`
MEM=`ps aux | grep "$PROCS" | grep -v "grep" | awk '{printf $6 }'`

echo "File descriptors number: "$FDESC
echo "Memory: "$MEM

----------
nosy: +stephbul
versions:  -Python 2.6, Python 2.7
Added file: http://bugs.python.org/file14171/urllib2leak.py

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue1208304>
_______________________________________


More information about the Python-bugs-list mailing list