[ python-Bugs-1294453 ] email.Parser.FeedParser leak
SourceForge.net
noreply at sourceforge.net
Sun Sep 18 23:20:10 CEST 2005
Bugs item #1294453, was opened at 2005-09-18 04:46
Message generated for change (Comment added) made by montanaro
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1294453&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: George Giannakopoulos (pckid)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: email.Parser.FeedParser leak
Initial Comment:
It seems there is a reference cycle within the
FeedParser class.
I discovered it while implementing a mail
categorization app. It seems that the problem lies in
the line:
self._parse = self._parsegen().next
of the FeedParser __init__ method.
The object cannot be deleted and I was forced to add
the line:
self._parse = None
in the close() method of the class just before the
return call.
It seems it actually corrects the situation, BUT the
_parse method is no longer valid, and the object should
no longer be used.
If it makes any difference, the FeedParser was called
by a use of the Parser class:
pParser = email.Parser.Parser()
mMessage = pParser.parsestr(sMessageString)
del pParser
----------------------------------------------------------------------
>Comment By: Skip Montanaro (montanaro)
Date: 2005-09-18 16:20
Message:
Logged In: YES
user_id=44345
Try running top as the loop executes. Let it run for a couple minutes...
----------------------------------------------------------------------
Comment By: Barry A. Warsaw (bwarsaw)
Date: 2005-09-18 16:08
Message:
Logged In: YES
user_id=12800
Hmm, in Python 2.4 CVS, this always returns 0:
import gc
import email.Parser
s = open('/tmp/msg.txt').read()
try:
while True:
parser = email.Parser.Parser()
msg = parser.parsestr(s)
del parser
except KeyboardInterrupt:
print len(gc.garbage)
Same thing In Python 2.5 CVS. So where's the leak?
Note that it's undefined what the FeedParser does after you
call its close. It doesn't seem like a problem to set
self._parser = None in the close, if that fixes a problem,
but it's a little odd that the above program doesn't
reproduce the reported bug.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2005-09-18 14:13
Message:
Logged In: YES
user_id=44345
Using Python built from CVS and from the 2.4
maintenance branch I executed:
s = open("... some file containing a message ...").read()
while True:
parser = email.Parser.Parser()
msg = parser.parsestr(s)
and let it accumulate a couple minutes of CPU time. It leaks
in the 2.4 version, but not the head (2.5) version. The two
versions of the email package appear identical (diff -ru). I
made a slightly different change. Instead of using
self._parse at all, I just replaced it with self._parsegen().next.
Memory consumption continues to grow for me.
Oddly enough, if I break out of the above loop, do a gc.collect()
and then check gc.garbage, the CVS HEAD version shows a
list of one element containing a generator. In the 2.4 release
branch version gc.garbage is empty.
Assigning to Barry as the email wiz...
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1294453&group_id=5470
More information about the Python-bugs-list
mailing list