[XML-SIG] python xml (Lukasz Szybalski)

Wed Mar 8 14:03:34 CET 2006

> I am new to xml in python, but not new to python. After searching for 
> quite a bit, I can't seem to find the answer to this easy coding problem.
> I have a xml file that i need to find an element with id = 2 extract 
> second child from it, and pass it on to the rest of my program.
> 
> How can i do it in "<10" lines of code in python?
> 
> --sample-File------default.xml--
> <?xml version="1.0"?>
> <response id="1">
>     <category>out</category>
>         <text>Hello,
>                 I'm out of office...etc..
>         </text>
> </response>
> <response id="2">
>     <category>in</category>
>         <text>Hello,
>                 I'm in the office...etc..
>         </text>
> </response>

your XML is missing a root element (2 response elements) and is therefor 
not wellformed, see below

> ---python --
> # I found most of the info/how to on minidom, so I assume minidom is the 
> best choice here
> 
> from xml.dom import minidom, Node
> #parse a file,
> doc=minidom.parse('default.xml')
> # find attribute
> 
> .....and here I find myself reading tons of how to, but can't seem to 
> find what i'm looking for. .
> 
> #how can i do something like: (please correct me here)
>  for x in  doc.nodeList:
>     if x.attribute.get('id')==2:
>        pass_it_on = ????      #How do I get to <text> and extract it as 
> string?
> 

xml = """<?xml version="1.0"?>
<responses>
     <response id="1">
         <category>out</category>
             <text id="test">Hello,
                     I'm out of office...etc..
             </text>
     </response>
     <response id="2">
         <category>in</category>
             <text>Hello,
                     I'm in the office...etc..
             </text>
     </response>
</responses>
"""

from xml.dom import minidom, Node

def elementWithAttVal(e, a, v):
     children = [x for x in e.childNodes if x.nodeType == e.ELEMENT_NODE]
     for x in children:
         if x.getAttribute(a) == v:
             return x
         if x.childNodes:
             t = elementWithAttVal(x, a, v)
             if t:
                 return t

doc = minidom.parseString(xml)
id2element = elementWithAttVal(doc, "id", "2")

# 1st <text> child element, easy breakable...
textelement = id2element.getElementsByTagName('text')[0]

# text Node of the element
textelement_textnode = textelement.firstChild

# value of text node
print textelement_textnode.nodeValue

Maybe not the easiest implementation but nevertheless it works and shows 
DOMs rather complicated working...
I guess minidom is convinient as included in stdlib but not convinient 
in usage ;)

Easiest IMHO would be using XPath. 4suite or any other implementation 
would be fine, shown is 4suits xpath command line helper:

 > 4xpath default.xml "//*[@id=2]/*[2]/text()"
Result (XPath node-set):
========================
<Text at 0x00C4A1C0: u'Hello,\n       ...\x00\u39a0\u1e06'>

chris