[Tutor] parsing xml as lines

Peter Otten __peter__ at web.de
Wed Nov 4 14:41:27 EST 2015


richard kappler wrote:

> I have an xml file that get's written to as events occur. Each event
> writes a new 'line' of xml to the file, in a specific format, eg:
> sometthing like this:
> 
> <heresmydataline  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:noNamespaceSchemaLocation="Logging.xsd" version="1.0"><child of
> 
heresmydata/><anotherchildofheresmydata/><grandchild>somestuff</grandchild></heresmydata>
> 
> and each 'line' has that same structure or format.
> 
> I've written a script that parses out the needed data and forwards it on
> using regex's, but think it might be better to use an xml parser. I can
> parse out what I need to if I have just one line in the file, but when
> there are number of lines as there actually are, I can't figure out how to
> get it to work.
> 
> In other words, with a one line file, this works fine and I understand it:
> 
> import xml.etree.cElementTree as ET
> tree = ET.ElementTree(file='1lineTest.log'
> grandchild = tree.find('grandchild')
> print grandchild.tag, grandchild.text
> 
> and I get the output I desire:
> 
> grandchild Sally
> 
> But if I have several lines in the file try to run a loop:
> 
> import xml.etree.cElementTree as ET
> f1 = open('5lineTest.log', 'r')
> lineList = f1.readlines()
> Imax = len(lineList)
> 
> i = 0
> while i <= Imax:
>     tree = ET.ElementTree(lineList[i])
>     grandchild = tree.find('grandchild')
>     print grandchild.tag, grandchild.txt
>     i += 1
> 
> Traceback (most recent call last):
>   File "<stdin>", line 4, in <module>
> AttributeError: 'int' object has no attribute 'tag'
> 
> and yet I can do:
> 
> print lineList[0] and it will print out the first line.
> 
> I get why (I think), I just can't figure out a way around it.
> 
> Guidance please?

Ceterum censo ;) Abandon the notion of lines! 

To process nodes as they arrive from the parser have a look at iterparse:

http://effbot.org/zone/element-iterparse.htm
https://docs.python.org/dev/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse



More information about the Tutor mailing list