[issue2246] itertools.groupby() leaks memory with circular reference

Jeroen Ruigrok van der Werven report at bugs.python.org
Thu Mar 6 20:39:29 CET 2008


New submission from Jeroen Ruigrok van der Werven:

Quoting from my email to Raymond:

In the Trac/Genshi community we've been tracking a bit obscure memory 
leak that causes us a lot of problems.

Please see http://trac.edgewall.org/ticket/6614 and then
http://genshi.edgewall.org/ticket/190 for background.

We reduced the case to the following Python only code and believe it is 
a bug within itertool's groupby. As Armin Ronacher explains in Genshi 
ticket 190:

"Looks like genshi is not to blame. itertools.groupby has a grouper 
with a reference to the groupby type but no traverse func. As soon as a 
circular reference ends up in the groupby (which happens thanks to the 
func_globals in our lambda) genshi leaks."

This can be demonstrated with the following code (testcase attachment 
present with this issue):

import gc
from itertools import groupby

def run():
    keyfunc = lambda x: x
    for i, j in groupby(range(100), key=keyfunc):
        keyfunc.x = j

for x in xrange(20):
    gc.collect()
    run()
    print len(gc.get_objects())

On executing this in will show numerical output of the garbage 
collector, but every iteration will be +4 from the previous, as Armin 
specifies:

  "a frame, a grouper, a keyfunc and a groupby object"

We have been unable to come up with a decent patch and thus I am 
logging this issue now.

----------
files: testcase.py
messages: 63332
nosy: asmodai, rhettinger
severity: normal
status: open
title: itertools.groupby() leaks memory with circular reference
type: resource usage
versions: Python 2.4, Python 2.5, Python 2.6, Python 3.0
Added file: http://bugs.python.org/file9624/testcase.py

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue2246>
__________________________________


More information about the Python-bugs-list mailing list