elementtree: line numbers and iterparse
Fredrik Lundh
fredrik at pythonware.com
Wed Sep 13 00:24:13 EDT 2006
Stuart McGraw wrote:
> I have a broad (~200K nodes) but shallow xml file
> I want to parse with Elementtree. There are too many
> nodes to read into memory simultaneously so I use
> iterparse() to process each node sequentially.
>
> Now I find i need to get and save the input file line
> number of each node. Googling turned up a way
> to do it by subclassing FancyTreeBuilder,
> (http://groups.google.com/group/comp.lang.python/msg/45f5313409553b4b?hl=en&)
> but that tries to read everything at once.
>
> Is there a way to do something similiar with iterparse()?
something like this could work:
import elementtree.ElementTree as ET
import StringIO
data = """\
<doc>
<tag>
<subtag>text</subtag>
<subtag>text</subtag>
</tag>
</doc>
"""
class FileWrapper:
def __init__(self, source):
self.source = source
self.lineno = 0
def read(self, bytes):
s = self.source.readline()
self.lineno += 1
return s
# f = FileWrapper(open("source.xml")
f = FileWrapper(StringIO.StringIO(data))
for event, elem in ET.iterparse(f, events=["start", "end"]):
if event == "start":
print f.lineno, event, elem
</F>
More information about the Python-list
mailing list