python and parsing an xml file

Mon Feb 21 17:07:52 EST 2011

HI Stefan,
thank you for your advice.
I am running into an issue though (which is likely a newbie problem):

My xml file looks like (which i got from the internet):
<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications
      with XML.</description>
   </book>
</catalog>

Then i try to access it as:

from lxml import etree
from lxml import objectify

inputfile="../input/books.xml"
parser = etree.XMLParser(ns_clean=True)
parser = etree.XMLParser(remove_comments=True)
root = objectify.parse(inputfile,parser)

I try to access elements by (for example):
print root.catalog.book.author.text

But it errors out with:
AttributeError: 'lxml.etree._ElementTree' object has no attribute 'catalog'

So i guess i don't understand why this is.
Also, how can i print all the attributes of the root for examples or
obtain keys?

I really appreciate your help
thanks
matt

On 2/21/2011 10:43 AM, Stefan Behnel wrote:
> Matt Funk, 21.02.2011 18:30:
>> I want to create a set of xml input files to my code that look as
>> follows:
>> <?xml version="1.0" encoding="UTF-8"?>
>>
>> <!-- Settings for the algorithm to be performed
>>                                           -->
>> <Algorithm>
>>
>>      <!-- The algorithm type.
>>                                      -->
>>      <!-- The supported options are:
>>                                      -->
>>      <!--     - Alg0
>>                                          -->
>>      <!--     - Alg1
>>                                          -->
>>      <Type>Alg1</Type>
>>
>>      <!-- the location/path of the input file for this algorithm
>>                                          -->
>>      <path>./Alg1.in</path>
>>
>> </Algorithm>
>>
>>
>> <!-- Relevant information during the processing will be written to a
>> logfile                                 -->
>> <Logfile>
>>
>>      <!-- the location/path of the logfile (i.e. where to put the
>> logfile)                                    -->
>>      <path>c:\tmp</path>
>>
>>      <!-- verbosity level (i.e. how much to print)
>>                                      -->
>>      <!-- The supported options are:
>>                                      -->
>>      <!--     - 0    (nothing printed)
>>                                          -->
>>      <!--     - 1    (print on error)
>>                                          -->
>>      <verbosity>1</verbosity>
>>
>> </Logfile>
>
> That's not XML. XML documents have exactly one root element, i.e. you
> need an enclosing element around these two tags.
>
>
>> So there are comments, whitespace etc ... in it.
>> I would like to be able to put everything into some sort of structure
>
> Including the comments or without them? Note that ElementTree will
> ignore comments.
>
>
>> such that i can access it as:
>> structure['Algorithm']['Type'] == Alg1
>
> Have a look at lxml.objectify. It allows you to write
>
>     alg_type = root.Algorithm.Type.text
>
> and a couple of other niceties.
>
> http://lxml.de/objectify.html
>
>
>> I was wondering if there is something out there that does this.
>> I found and tried a few things:
>> 1)
>> http://code.activestate.com/recipes/534109-xml-to-python-data-structure/
>> It simply doesn't work. I get the following error:
>> raise exception
>> xml.sax._exceptions.SAXParseException:<unknown>:1:2: not well-formed
>> (invalid token)
>
> "not well formed" == "not XML".
>
>
>> But i removed everything from the file except:<?xml version="1.0"
>> encoding="UTF-8"?>
>> and i still got the error.
>
> That's not XML, either.
>
>
>> Anyway, i looked at ElementTree, but that error out with:
>> xml.parsers.expat.ExpatError: junk after document element: line 19,
>> column 0
>
> In any case, ElementTree is preferable over a SAX based solution, both
> for performance and maintainability reasons.
>
> Stefan
>