[Tutor] parsing XML into a python dictionary

Alan Gauld alan.gauld at btinternet.com
Sat Nov 14 09:50:19 CET 2009


"Christopher Spears" <cspears2002 at yahoo.com> wrote

> I've been working on a way to parse an XML document and
> convert it into a python dictionary.  I want to maintain the hierarchy of 
> the XML.

> Here is the sample XML I have been working on:
>
> <collection>
>  <comic title="Sandman" number='62'>
>    <writer>Neil Gaiman</writer>
>    <penciller pages='1-9,18-24'>Glyn Dillon</penciller>
>    <penciller pages="10-17">Charles Vess</penciller>
>  </comic>
> </collection>
>
> This is my first stab at this:
>
> #!/usr/bin/env python
>
> from lxml import etree
>
> def generateKey(element):
>    if element.attrib:
>        key = (element.tag, element.attrib)
>    else:
> key = element.tag
>    return key

So how are you handling multiple identical tags? It looks from your code
that you will replace the content of the previous tag with the content of 
the
last found tag? I would expect your keys to have some reference to
either the parse depth or a sequuence count. In your sample XML the
problem never arises and maybe in your real data it will never happen
either, but in the general case it is quite common for the same tag
and attribute pair to be used multiple times in a document.


> class parseXML(object):
>    def __init__(self, xmlFile = 'test.xml'):
>        self.xmlFile = xmlFile
>
>    def parse(self):
>        doc = etree.parse(self.xmlFile)
> root = doc.getroot()
> key = generateKey(root)
> dictA = {}
> for r in root.getchildren():
>     keyR = generateKey(r)
>     if r.text:
>         dictA[keyR] = r.text
>     if r.getchildren():
>         dictA[keyR] = r.getchildren()
>
> The script doesn't descend all of the way down because I'm
> not sure how to hand a XML document that may have multiple layers.
> Advice anyone?  Would this be a job for recursion?

Recursion is the classic way to deal with tree structures so
yes you could use there. provided your tree never exceeds
Pythons recursion depth limit (I think its still 1000 levels).

I'm not sure how converting etree's tree structure into a dictionary
will help you however. It seems like a lot of work for a small gain.

hth,

-- 
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/ 




More information about the Tutor mailing list