[New-bugs-announce] [issue2818] pulldom cannot handle xml file with large external entity properly
Luyang Han
report at bugs.python.org
Sun May 11 15:32:18 CEST 2008
New submission from Luyang Han <luyang.han at gmail.com>:
when use xml.dom.pulldom module to parse a large xml file, if all the
information is saved in one xml file, the module can handle it in the
following way without construction the whole DOM:
events = xml.dom.pulldom.parse('file.xml')
for (event, node) in events:
process(event, node)
But if 'file.xml' contains some large external entities, for example:
<!ENTITY file_external SYSTEM "others.xml">
<body>&file_external;</body>
Then using the same python snippet above leads to enormous memory
usage. I did not perform a concrete benchmark, in one case a 3M
external xml file drained about 1 GB memory. I think in this case it
might be the whole DOM structure is constructed.
----------
components: XML
messages: 66628
nosy: hanselda
severity: normal
status: open
title: pulldom cannot handle xml file with large external entity properly
type: resource usage
versions: Python 2.5
__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue2818>
__________________________________
More information about the New-bugs-announce
mailing list