[XML-SIG] Recipe 534109: XML to Python data structure

Stefan Behnel stefan_ml at behnel.de
Wed Jan 7 14:48:00 CET 2009

It seems that apart from top-posting, you forgot to reply to the list.

David Shi wrote:
> lxml looks interesting to me as it deals with CDATA.
> Where is the step by step guide to use lxml to do what I need to do, as
> per my previous email.

I do not know any step-by-step guide that describes how to convert an XML
format to .dbf. I guess you'll have to figure out the mapping code
yourself to a certain extent. I gave you quite a number of references
including some tutorials and a link to a library that handles the dbf
format. If you want someone else to write the program for you for free,
you should say so.


> --- On Wed, 7/1/09, Stefan Behnel wrote:
> From: Stefan Behnel <stefan_ml at behnel.de>
> Subject: Re: [XML-SIG] Recipe 534109: XML to Python data structure
> To: "David Shi" <davidgshi at yahoo.co.uk>
> Cc: xml-sig at python.org
> Date: Wednesday, 7 January, 2009, 12:42 PM
> David Shi wrote:
>> What I am trying to do is to have a generic script to turn xml to Python
>> dataset. Then I can manipulate it as required. Then I can save
>> processed data into a .dbf file.
> I'd use iterparse() for the parsing, that allows you to construct the .dbf
> content on the fly.
> http://codespeak.net/lxml/parsing.html#iterparse-and-iterwalk
> Working with the data elements returned by the iterparse iterator is quite
> easy, you'll be fine with using the properties .tag and .text, as well as
> the .find() method to find subelements.
> http://codespeak.net/lxml/tutorial.html#the-element-class
> If you can afford to load the entire XML tree into memory, you can also
> try lxml.objectify, which will give you a Python-like interface to the
> data.
> http://codespeak.net/lxml/objectify.html
> Note that the lxml.objectify in-memory tree is most likely a lot more
> memory friendly (and the parsing is definitely faster) than what the
> recipe gives you.
> Stefan

More information about the XML-SIG mailing list