[issue14762] ElementTree memory leak

Giuseppe Attardi report at bugs.python.org
Wed May 9 11:39:47 CEST 2012


New submission from Giuseppe Attardi <attardi at di.unipi.it>:

I confirm the presence of a serious memory leak in ElementTree, using the iterparse() function.
Memory grows disproportionately to dozens of GB when parsing a large XML file.

For further information, see discussion in:
  http://www.gossamer-threads.com/lists/python/bugs/912164?do=post_view_threaded#912164
but notice that the comments attributing the problem to the OS are quite off the mark.

To replicate the problem, try this on a Wikipedia dump:

    iterparse = ElementTree.iterparse(file)
    id = None
    for event, elem in iterparse:
        if elem.tag.endswith("title"):
            title = elem.text
        elif elem.tag.endswith("id") and not id:
            id = elem.text
        elif elem.tag.endswith("text"):
           print id, title, elem.text[:20]

----------
messages: 160266
nosy: Giuseppe.Attardi
priority: normal
severity: normal
status: open
title: ElementTree memory leak
type: resource usage
versions: Python 2.7

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue14762>
_______________________________________


More information about the Python-bugs-list mailing list