Convert XML to SQL

Stefan Behnel stefan_ml at behnel.de
Wed Jan 26 04:29:35 EST 2011


Johann Spies, 26.01.2011 10:07:
> I an not a Python newbie but working with xml is new to me.
>
> I get data through a soap connection, using suds, and want to convert that
> to objects which I can use to populate a rather complex database.

Your problem description is pretty comprehensive in general. It would be 
helpful to see an example snippet of the XML that you parse.


> I have been able to parse the xml using
>
> tree = etree.iterparse(infile,events=("start","end")) but it seems like a
> lot of work to get that to sql-objects.
>
> I have seen references to lxml.objectify and have created a object
> containing the contents of a whole file using
>
> tree = objectify.parse(fileobject)
>
> That object contains for example the data of 605 records and I do not know
> how to use it. I could not figure out from the lxml.objectify documentation
> how to do it.
>
> In the end I want to use data from about 54 fields of each records.  I would
> like to have a list of dictionaries as a result of the parsing.  From there
> it should not be too difficult to create sql.

I think iterparse() is a good way to deal with this, as is objectify. 
iterparse() has the advantage that you can dispose of handled records, thus 
keeping memory usage low (if that's an issue here).

Using objectify, you would usually do something like this:

   tree = objectify.parse(fileobject)
   root = tree.getroot()
   for record in root.xml_record_tag:
      title_name = record.title.text

It really just depends on what your XML looks like. In the above, I assumed 
that each record hangs directly below the root tag and is called 
"xml_record_tag". I also assumed that each record has a "title" tag with 
text content.

With iterparse(), you would intercept on your record elements and then use 
the ElementTree API, commonly the findtext() and findall() methods of the 
root object, to get at the specific record fields. Like this:

     for _, element in ET.iterparse(fileobject):
          if element.tag == 'xml_record_tag':
              title_name = element.findtext('title')

Does this help?

Stefan




More information about the Python-list mailing list