[ python-Bugs-1294453 ] email.Parser.FeedParser leak

SourceForge.net noreply at sourceforge.net
Mon Sep 19 04:57:54 CEST 2005


Bugs item #1294453, was opened at 2005-09-18 04:46
Message generated for change (Comment added) made by montanaro
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1294453&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: George Giannakopoulos (pckid)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: email.Parser.FeedParser leak

Initial Comment:
It seems there is a reference cycle within the
FeedParser class.
I discovered it while implementing a mail
categorization app. It seems that the problem lies in
the line:
        self._parse = self._parsegen().next
of the FeedParser __init__ method.

The object cannot be deleted and I was forced to add
the line:
    self._parse = None
in the close() method of the class just before the
return call.

It seems it actually corrects the situation, BUT the
_parse method is no longer valid, and the object should
no longer be used.

If it makes any difference, the FeedParser was called
by a use of the Parser class:
    pParser = email.Parser.Parser()
    mMessage = pParser.parsestr(sMessageString)
    del pParser


----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2005-09-18 21:57

Message:
Logged In: YES 
user_id=44345

Here's what I see on my Mac laptop (10.3.9) with Python 2.4.1:

montanaro:skip% ps auxww | egrep python2.4
skip    10914  97.1  0.6    37980   6692  p8  R+    9:54PM   0:15.50 
python2.4
skip    10926   0.0  0.0    18644    268 std  U+    9:55PM   0:00.00 egrep 
python2.4
montanaro:skip% ps auxww | egrep python2.4
skip    10914  94.7  0.6    37980   6724  p8  R+    9:54PM   0:20.75 
python2.4
skip    10928   0.0  0.0    18644     92 std  R+    9:55PM   0:00.00 egrep 
python2.4
montanaro:skip% ps auxww | egrep python2.4
skip    10914  91.7  0.6    37980   6748  p8  R+    9:54PM   0:24.36 
python2.4
skip    10930   0.0  0.0    18644     92 std  R+    9:55PM   0:00.00 egrep 
python2.4
montanaro:skip% ps auxww | egrep python2.4
skip    10914  90.4  0.6    37980   6780  p8  R+    9:54PM   0:29.36 
python2.4
skip    10932   0.0  0.0    18644     92 std  R+    9:55PM   0:00.00 egrep 
python2.4
montanaro:skip% ps auxww | egrep python2.4
skip    10914  75.6  0.6    37980   6808  p8  R+    9:54PM   0:33.21 
python2.4
skip    10934   0.0  0.0    18644     92 std  R+    9:55PM   0:00.00 egrep 
python2.4
montanaro:skip% ps auxww | egrep python2.4
skip    10914  91.9  0.7    37980   6848  p8  R+    9:54PM   0:36.86 
python2.4
skip    10939   0.0  0.0    18644     92 std  R+    9:55PM   0:00.00 egrep 
python2.4
montanaro:skip% ps auxww | egrep python2.4
skip    10914  90.0  0.7    37980   6928  p8  R+    9:54PM   1:34.41 
python2.4
skip    10998   0.0  0.0    18644     92 std  R+    9:57PM   0:00.01 egrep 
python2.4
montanaro:skip% ps auxww | egrep python2.4
skip    10914  95.3  0.7    37980   6952  p8  R+    9:54PM   1:46.65 
python2.4
skip    11000   0.0  0.0    18644     92 std  R+    9:57PM   0:00.00 egrep 
python2.4


----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2005-09-18 21:32

Message:
Logged In: YES 
user_id=12800

Done.  I never see memory usage get about 0.8% (py2.4) or
0.7% (py2.5) after running for several minutes.  It
certainly doesn't appear to be leaking memory of any
detectable amount.

If it matters, I tested on Linux (Gentoo) 2.6.12 kernel.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2005-09-18 16:20

Message:
Logged In: YES 
user_id=44345

Try running top as the loop executes.  Let it run for a couple minutes...

----------------------------------------------------------------------

Comment By: Barry A. Warsaw (bwarsaw)
Date: 2005-09-18 16:08

Message:
Logged In: YES 
user_id=12800

Hmm, in Python 2.4 CVS, this always returns 0:

import gc
import email.Parser

s = open('/tmp/msg.txt').read()
try:
    while True:
        parser = email.Parser.Parser()
        msg = parser.parsestr(s)
        del parser
except KeyboardInterrupt:
    print len(gc.garbage)

Same thing In Python 2.5 CVS.  So where's the leak?

Note that it's undefined what the FeedParser does after you
call its close.  It doesn't seem like a problem to set
self._parser = None in the close, if that fixes a problem,
but it's a little odd that the above program doesn't
reproduce the reported bug.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2005-09-18 14:13

Message:
Logged In: YES 
user_id=44345

Using Python built from CVS and from the 2.4
maintenance branch I executed:
  s = open("... some file containing a message ...").read()
  while True:
    parser = email.Parser.Parser()
    msg = parser.parsestr(s)

and let it accumulate a couple minutes of CPU time.  It leaks
in the 2.4 version, but not the head (2.5) version.  The two
versions of the email package appear identical (diff -ru).  I
made a slightly different change.  Instead of using
self._parse at all, I just replaced it with self._parsegen().next.
Memory consumption continues to grow for me.

Oddly enough, if I break out of the above loop, do a gc.collect()
and then check gc.garbage, the CVS HEAD version shows a
list of one element containing a generator.  In the 2.4 release
branch version gc.garbage is empty.

Assigning to Barry as the email wiz...


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1294453&group_id=5470


More information about the Python-bugs-list mailing list