python and parsing an xml file
Matt Funk
mafunk at nmsu.edu
Mon Feb 21 17:07:52 EST 2011
HI Stefan,
thank you for your advice.
I am running into an issue though (which is likely a newbie problem):
My xml file looks like (which i got from the internet):
<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
</catalog>
Then i try to access it as:
from lxml import etree
from lxml import objectify
inputfile="../input/books.xml"
parser = etree.XMLParser(ns_clean=True)
parser = etree.XMLParser(remove_comments=True)
root = objectify.parse(inputfile,parser)
I try to access elements by (for example):
print root.catalog.book.author.text
But it errors out with:
AttributeError: 'lxml.etree._ElementTree' object has no attribute 'catalog'
So i guess i don't understand why this is.
Also, how can i print all the attributes of the root for examples or
obtain keys?
I really appreciate your help
thanks
matt
On 2/21/2011 10:43 AM, Stefan Behnel wrote:
> Matt Funk, 21.02.2011 18:30:
>> I want to create a set of xml input files to my code that look as
>> follows:
>> <?xml version="1.0" encoding="UTF-8"?>
>>
>> <!-- Settings for the algorithm to be performed
>> -->
>> <Algorithm>
>>
>> <!-- The algorithm type.
>> -->
>> <!-- The supported options are:
>> -->
>> <!-- - Alg0
>> -->
>> <!-- - Alg1
>> -->
>> <Type>Alg1</Type>
>>
>> <!-- the location/path of the input file for this algorithm
>> -->
>> <path>./Alg1.in</path>
>>
>> </Algorithm>
>>
>>
>> <!-- Relevant information during the processing will be written to a
>> logfile -->
>> <Logfile>
>>
>> <!-- the location/path of the logfile (i.e. where to put the
>> logfile) -->
>> <path>c:\tmp</path>
>>
>> <!-- verbosity level (i.e. how much to print)
>> -->
>> <!-- The supported options are:
>> -->
>> <!-- - 0 (nothing printed)
>> -->
>> <!-- - 1 (print on error)
>> -->
>> <verbosity>1</verbosity>
>>
>> </Logfile>
>
> That's not XML. XML documents have exactly one root element, i.e. you
> need an enclosing element around these two tags.
>
>
>> So there are comments, whitespace etc ... in it.
>> I would like to be able to put everything into some sort of structure
>
> Including the comments or without them? Note that ElementTree will
> ignore comments.
>
>
>> such that i can access it as:
>> structure['Algorithm']['Type'] == Alg1
>
> Have a look at lxml.objectify. It allows you to write
>
> alg_type = root.Algorithm.Type.text
>
> and a couple of other niceties.
>
> http://lxml.de/objectify.html
>
>
>> I was wondering if there is something out there that does this.
>> I found and tried a few things:
>> 1)
>> http://code.activestate.com/recipes/534109-xml-to-python-data-structure/
>> It simply doesn't work. I get the following error:
>> raise exception
>> xml.sax._exceptions.SAXParseException:<unknown>:1:2: not well-formed
>> (invalid token)
>
> "not well formed" == "not XML".
>
>
>> But i removed everything from the file except:<?xml version="1.0"
>> encoding="UTF-8"?>
>> and i still got the error.
>
> That's not XML, either.
>
>
>> Anyway, i looked at ElementTree, but that error out with:
>> xml.parsers.expat.ExpatError: junk after document element: line 19,
>> column 0
>
> In any case, ElementTree is preferable over a SAX based solution, both
> for performance and maintainability reasons.
>
> Stefan
>
More information about the Python-list
mailing list