Best way under libxml2?
deckerben
bdeck at lycos.co.uk
Thu Jun 5 18:34:35 EDT 2003
Hello,
Sometimes I need to parse an XML document for a certain attribute value for
a specific element node. Right now I am using the solution:
>>> import sys, os
>>> import libxml2
>>> mydoc = libxml2.parseFile("C:/dosfiles/xml/somefile.xml")
>>> scontext = mydoc.xpathEval("descendant-or-self::dos/script/@shell")[0]
>>> text = scontext.getContent()
>>> text
'python'
As you see, it is working as it should, but I am not very happy about the
assumptions I am making with it. I feel that using the [0] subscript is an
unstable kludge to make libxml2 recognize the single attribute element
rather than returning a list of nodes. I was hoping to avoid time-consuming
looping, btw. But somehow, the whole thing starts looking less like python
and more like XSLT :-(
....
I am also looking for the fastest way to translate a result tree into a
array of strings. Each string is the total text of all Text and CDATA
children found for a single element node of the result tree.
Something like turning
<element>
1<![CDATA[2]]>3
</element>
<element>
1<![CDATA[2]]>3
</element>
<element>
1<![CDATA[2]]>3
</element>
into:
('123', '123', '123')
Here I have done:
def gettext2(self, nodename):
# get a xpath argument and return a string array from it.
self.current_nodes = self.doc.xpathEval(nodename)
report = []
for system_node in self.current_nodes:
# nodeobject.children always returns a list of nodes
for i in system_node.children:
a = i.get_type()
name = i.nodePath()
if i.get_type() == 'element':
# if a child is an element node, might have text, too
location = i.nodePath()
# send absolute nodepath to gettext2 (sub)subprocess
self.gettext2(location)
# DON'T USE BREAK here - it will kill all parent loops.
if i.get_type() == 'text' or 'cdata':
element_text = i.getContent()
for ii in element_text:
# sometimes text node just an empty carriage return
if ii in ' \n':
pass
else:
header = "SYSTEM2data: " + name
report.append(header)
report.append(element_text)
break # USE BREAK here or process may not end!
# give the new text-array back to calling function
return report
Where 'nodename' is an xpath argument, too.
This method is part of a class that already knows where the xml is :-) But I
don't like all the loops and subloops. If it isn't for the 'beak' near the
end, the routine turns into a harddrive virus !!!!!!
I don't have any specific questions here, but I am posting this junk because
all I know about it I read on websites and I am not sure I am taking the
right approach...
Ben
More information about the Python-list
mailing list