[XML-SIG] Recipe 534109: XML to Python data structure

Stefan Behnel stefan_ml at behnel.de
Wed Jan 7 13:42:21 CET 2009


David Shi wrote:
> What I am trying to do is to have a generic script to turn xml to Python
> dataset. Then I can manipulate it as required. Then I can save
> processed data into a .dbf file.

I'd use iterparse() for the parsing, that allows you to construct the .dbf
content on the fly.

http://codespeak.net/lxml/parsing.html#iterparse-and-iterwalk

Working with the data elements returned by the iterparse iterator is quite
easy, you'll be fine with using the properties .tag and .text, as well as
the .find() method to find subelements.

http://codespeak.net/lxml/tutorial.html#the-element-class

If you can afford to load the entire XML tree into memory, you can also
try lxml.objectify, which will give you a Python-like interface to the
data.

http://codespeak.net/lxml/objectify.html

Note that the lxml.objectify in-memory tree is most likely a lot more
memory friendly (and the parsing is definitely faster) than what the
recipe gives you.

Stefan



More information about the XML-SIG mailing list