python and parsing an xml file
Stefan Behnel
stefan_ml at behnel.de
Mon Feb 21 12:43:16 EST 2011
Matt Funk, 21.02.2011 18:30:
> I want to create a set of xml input files to my code that look as follows:
> <?xml version="1.0" encoding="UTF-8"?>
>
> <!-- Settings for the algorithm to be performed
> -->
> <Algorithm>
>
> <!-- The algorithm type.
> -->
> <!-- The supported options are:
> -->
> <!-- - Alg0
> -->
> <!-- - Alg1
> -->
> <Type>Alg1</Type>
>
> <!-- the location/path of the input file for this algorithm
> -->
> <path>./Alg1.in</path>
>
> </Algorithm>
>
>
> <!-- Relevant information during the processing will be written to a
> logfile -->
> <Logfile>
>
> <!-- the location/path of the logfile (i.e. where to put the
> logfile) -->
> <path>c:\tmp</path>
>
> <!-- verbosity level (i.e. how much to print)
> -->
> <!-- The supported options are:
> -->
> <!-- - 0 (nothing printed)
> -->
> <!-- - 1 (print on error)
> -->
> <verbosity>1</verbosity>
>
> </Logfile>
That's not XML. XML documents have exactly one root element, i.e. you need
an enclosing element around these two tags.
> So there are comments, whitespace etc ... in it.
> I would like to be able to put everything into some sort of structure
Including the comments or without them? Note that ElementTree will ignore
comments.
> such that i can access it as:
> structure['Algorithm']['Type'] == Alg1
Have a look at lxml.objectify. It allows you to write
alg_type = root.Algorithm.Type.text
and a couple of other niceties.
http://lxml.de/objectify.html
> I was wondering if there is something out there that does this.
> I found and tried a few things:
> 1) http://code.activestate.com/recipes/534109-xml-to-python-data-structure/
> It simply doesn't work. I get the following error:
> raise exception
> xml.sax._exceptions.SAXParseException:<unknown>:1:2: not well-formed
> (invalid token)
"not well formed" == "not XML".
> But i removed everything from the file except:<?xml version="1.0"
> encoding="UTF-8"?>
> and i still got the error.
That's not XML, either.
> Anyway, i looked at ElementTree, but that error out with:
> xml.parsers.expat.ExpatError: junk after document element: line 19, column 0
In any case, ElementTree is preferable over a SAX based solution, both for
performance and maintainability reasons.
Stefan
More information about the Python-list
mailing list