[XML-SIG] python xml (Lukasz Szybalski)
Christof
csad7 at t-online.de
Wed Mar 8 14:03:34 CET 2006
> I am new to xml in python, but not new to python. After searching for
> quite a bit, I can't seem to find the answer to this easy coding problem.
> I have a xml file that i need to find an element with id = 2 extract
> second child from it, and pass it on to the rest of my program.
>
> How can i do it in "<10" lines of code in python?
>
> --sample-File------default.xml--
> <?xml version="1.0"?>
> <response id="1">
> <category>out</category>
> <text>Hello,
> I'm out of office...etc..
> </text>
> </response>
> <response id="2">
> <category>in</category>
> <text>Hello,
> I'm in the office...etc..
> </text>
> </response>
your XML is missing a root element (2 response elements) and is therefor
not wellformed, see below
> ---python --
> # I found most of the info/how to on minidom, so I assume minidom is the
> best choice here
>
> from xml.dom import minidom, Node
> #parse a file,
> doc=minidom.parse('default.xml')
> # find attribute
>
> .....and here I find myself reading tons of how to, but can't seem to
> find what i'm looking for. .
>
> #how can i do something like: (please correct me here)
> for x in doc.nodeList:
> if x.attribute.get('id')==2:
> pass_it_on = ???? #How do I get to <text> and extract it as
> string?
>
xml = """<?xml version="1.0"?>
<responses>
<response id="1">
<category>out</category>
<text id="test">Hello,
I'm out of office...etc..
</text>
</response>
<response id="2">
<category>in</category>
<text>Hello,
I'm in the office...etc..
</text>
</response>
</responses>
"""
from xml.dom import minidom, Node
def elementWithAttVal(e, a, v):
children = [x for x in e.childNodes if x.nodeType == e.ELEMENT_NODE]
for x in children:
if x.getAttribute(a) == v:
return x
if x.childNodes:
t = elementWithAttVal(x, a, v)
if t:
return t
doc = minidom.parseString(xml)
id2element = elementWithAttVal(doc, "id", "2")
# 1st <text> child element, easy breakable...
textelement = id2element.getElementsByTagName('text')[0]
# text Node of the element
textelement_textnode = textelement.firstChild
# value of text node
print textelement_textnode.nodeValue
Maybe not the easiest implementation but nevertheless it works and shows
DOMs rather complicated working...
I guess minidom is convinient as included in stdlib but not convinient
in usage ;)
Easiest IMHO would be using XPath. 4suite or any other implementation
would be fine, shown is 4suits xpath command line helper:
> 4xpath default.xml "//*[@id=2]/*[2]/text()"
Result (XPath node-set):
========================
<Text at 0x00C4A1C0: u'Hello,\n ...\x00\u39a0\u1e06'>
chris
More information about the XML-SIG
mailing list