Best way under libxml2?

Sat Jun 7 15:38:23 EDT 2003

I have changed the second code...

> Sometimes I need to parse an XML document for a certain attribute value
for
> a specific element node. Right now I am using the solution:
>
> >>> import sys, os
> >>> import libxml2
> >>> mydoc = libxml2.parseFile("C:/dosfiles/xml/somefile.xml")
> >>> scontext = mydoc.xpathEval("descendant-or-self::dos/script/@shell")[0]
> >>> text = scontext.getContent()
> >>> text
> 'python'
>

No change here (yet)

> I am also looking for the fastest way to translate a result tree into a
> array of strings. Each string is the total text of all Text and CDATA
> children found for a single element node of the result tree.
>
> Something like turning
>
> <element>
> 1<![CDATA[2]]>3
> </element>
> <element>
> 1<![CDATA[2]]>3
> </element>
> <element>
> 1<![CDATA[2]]>3
> </element>
>
> into:
>
> ('123', '123', '123')
>
> Here I have done:
>     def gettext2(self, nodename):
>         # get a xpath argument and return a string array from it.
>         self.current_nodes = self.doc.xpathEval(nodename)
>         report = []
>         for system_node in self.current_nodes:
>             # nodeobject.children always returns a list of nodes
>             for i in system_node.children:
>                 a = i.get_type()
>                 name = i.nodePath()
>                 if i.get_type() == 'element':
>                     # if a child is an element node, might have text, too
>                     location = i.nodePath()
>                     # send absolute nodepath to gettext2 (sub)subprocess
>                     self.gettext2(location)
>                     # DON'T USE BREAK here - it will kill all parent
loops.
>                 if i.get_type() == 'text' or 'cdata':
>                     element_text = i.getContent()
>                     for ii in element_text:
>                         # sometimes text node just an empty carriage
return
>                         if ii in ' \n':
>                             pass
>                         else:
>                             header = "SYSTEM2data: " + name
>                             report.append(header)
>                             report.append(element_text)
>                             break   # USE BREAK here or process may not
end!
>         # give the new text-array back to calling function
>         return report
>
> Where 'nodename' is an xpath argument, too.

This is now:
    def gettext2(self, nodename):
        # get a xpath argument and return a string array from it.
        report = []
        current_nodes = self.doc.xpathEval(nodename)
        for system_node in current_nodes:
            # nodeobject.children always returns a list of nodes
            for i in system_node.children:
                # if a child is an element node, might have text, too
                if i.get_type() == 'element':
                    # child element text already a direct system_node child
                    try:   # Larger result trees will crash without this
'try'
                        for ii in i.children:
                            if ii.get_type() == 'element':
                                location = ii.nodePath()
                                # gettext2 (sub)subprocess needs xpath
                                subtext = self.gettext2(location)
                                for iii in subtext:
                                    report.append(iii)
                                del subtext, location, iii
                    except:
                        pass
                elif i.get_type() in ('text', 'cdata'):
                    element_text = i.get_content()
                    location = i.nodePath()
                    for ii in element_text:
                        # sometimes text node just an empty " \n "
                        if ii in ' \n':
                            pass
                        else:
                            header = DOC_MAGIC + " " + location
                            report.append(header)
                            report.append(element_text)
                            # Otherwise loop adds whole text for each ii!!
                            break
        del current_nodes, system_node, i, ii, location
        # give the new text-array back to calling function
        return report

I realized that node.children includes the text nodes under each child
element node.

Ben