[ python-Bugs-1294453 ] email.Parser.FeedParser leak

SourceForge.net noreply at sourceforge.net
Sun Sep 18 23:20:10 CEST 2005


Bugs item #1294453, was opened at 2005-09-18 04:46
Message generated for change (Comment added) made by montanaro
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1294453&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: George Giannakopoulos (pckid)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: email.Parser.FeedParser leak

Initial Comment:
It seems there is a reference cycle within the
FeedParser class.
I discovered it while implementing a mail
categorization app. It seems that the problem lies in
the line:
        self._parse = self._parsegen().next
of the FeedParser __init__ method.

The object cannot be deleted and I was forced to add
the line:
    self._parse = None
in the close() method of the class just before the
return call.

It seems it actually corrects the situation, BUT the
_parse method is no longer valid, and the object should
no longer be used.

If it makes any difference, the FeedParser was called
by a use of the Parser class:
    pParser = email.Parser.Parser()
    mMessage = pParser.parsestr(sMessageString)
    del pParser


----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2005-09-18 16:20

Message:
Logged In: YES 
user_id=44345

Try running top as the loop executes.  Let it run for a couple minutes...

----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2005-09-18 16:08

Message:
Logged In: YES 
user_id=12800

Hmm, in Python 2.4 CVS, this always returns 0:

import gc
import email.Parser

s = open('/tmp/msg.txt').read()
try:
    while True:
        parser = email.Parser.Parser()
        msg = parser.parsestr(s)
        del parser
except KeyboardInterrupt:
    print len(gc.garbage)

Same thing In Python 2.5 CVS.  So where's the leak?

Note that it's undefined what the FeedParser does after you
call its close.  It doesn't seem like a problem to set
self._parser = None in the close, if that fixes a problem,
but it's a little odd that the above program doesn't
reproduce the reported bug.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2005-09-18 14:13

Message:
Logged In: YES 
user_id=44345

Using Python built from CVS and from the 2.4
maintenance branch I executed:
  s = open("... some file containing a message ...").read()
  while True:
    parser = email.Parser.Parser()
    msg = parser.parsestr(s)

and let it accumulate a couple minutes of CPU time.  It leaks
in the 2.4 version, but not the head (2.5) version.  The two
versions of the email package appear identical (diff -ru).  I
made a slightly different change.  Instead of using
self._parse at all, I just replaced it with self._parsegen().next.
Memory consumption continues to grow for me.

Oddly enough, if I break out of the above loop, do a gc.collect()
and then check gc.garbage, the CVS HEAD version shows a
list of one element containing a generator.  In the 2.4 release
branch version gc.garbage is empty.

Assigning to Barry as the email wiz...


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1294453&group_id=5470


More information about the Python-bugs-list mailing list