Best way under libxml2?
deckerben
bdeck at lycos.co.uk
Sat Jun 7 15:38:23 EDT 2003
I have changed the second code...
> Sometimes I need to parse an XML document for a certain attribute value
for
> a specific element node. Right now I am using the solution:
>
> >>> import sys, os
> >>> import libxml2
> >>> mydoc = libxml2.parseFile("C:/dosfiles/xml/somefile.xml")
> >>> scontext = mydoc.xpathEval("descendant-or-self::dos/script/@shell")[0]
> >>> text = scontext.getContent()
> >>> text
> 'python'
>
No change here (yet)
> I am also looking for the fastest way to translate a result tree into a
> array of strings. Each string is the total text of all Text and CDATA
> children found for a single element node of the result tree.
>
> Something like turning
>
> <element>
> 1<![CDATA[2]]>3
> </element>
> <element>
> 1<![CDATA[2]]>3
> </element>
> <element>
> 1<![CDATA[2]]>3
> </element>
>
> into:
>
> ('123', '123', '123')
>
> Here I have done:
> def gettext2(self, nodename):
> # get a xpath argument and return a string array from it.
> self.current_nodes = self.doc.xpathEval(nodename)
> report = []
> for system_node in self.current_nodes:
> # nodeobject.children always returns a list of nodes
> for i in system_node.children:
> a = i.get_type()
> name = i.nodePath()
> if i.get_type() == 'element':
> # if a child is an element node, might have text, too
> location = i.nodePath()
> # send absolute nodepath to gettext2 (sub)subprocess
> self.gettext2(location)
> # DON'T USE BREAK here - it will kill all parent
loops.
> if i.get_type() == 'text' or 'cdata':
> element_text = i.getContent()
> for ii in element_text:
> # sometimes text node just an empty carriage
return
> if ii in ' \n':
> pass
> else:
> header = "SYSTEM2data: " + name
> report.append(header)
> report.append(element_text)
> break # USE BREAK here or process may not
end!
> # give the new text-array back to calling function
> return report
>
> Where 'nodename' is an xpath argument, too.
This is now:
def gettext2(self, nodename):
# get a xpath argument and return a string array from it.
report = []
current_nodes = self.doc.xpathEval(nodename)
for system_node in current_nodes:
# nodeobject.children always returns a list of nodes
for i in system_node.children:
# if a child is an element node, might have text, too
if i.get_type() == 'element':
# child element text already a direct system_node child
try: # Larger result trees will crash without this
'try'
for ii in i.children:
if ii.get_type() == 'element':
location = ii.nodePath()
# gettext2 (sub)subprocess needs xpath
subtext = self.gettext2(location)
for iii in subtext:
report.append(iii)
del subtext, location, iii
except:
pass
elif i.get_type() in ('text', 'cdata'):
element_text = i.get_content()
location = i.nodePath()
for ii in element_text:
# sometimes text node just an empty " \n "
if ii in ' \n':
pass
else:
header = DOC_MAGIC + " " + location
report.append(header)
report.append(element_text)
# Otherwise loop adds whole text for each ii!!
break
del current_nodes, system_node, i, ii, location
# give the new text-array back to calling function
return report
I realized that node.children includes the text nodes under each child
element node.
Ben
More information about the Python-list
mailing list