Best way under libxml2?

deckerben bdeck at lycos.co.uk
Thu Jun 5 18:34:35 EDT 2003


Hello,

Sometimes I need to parse an XML document for a certain attribute value for
a specific element node. Right now I am using the solution:

>>> import sys, os
>>> import libxml2
>>> mydoc = libxml2.parseFile("C:/dosfiles/xml/somefile.xml")
>>> scontext = mydoc.xpathEval("descendant-or-self::dos/script/@shell")[0]
>>> text = scontext.getContent()
>>> text
'python'


As you see, it is working as it should, but I am not very happy about the
assumptions I am making with it. I feel that using the [0] subscript is an
unstable kludge to make libxml2 recognize the single attribute element
rather than returning a list of nodes. I was hoping to avoid time-consuming
looping, btw. But somehow, the whole thing starts looking less like python
and more like XSLT :-(

....

I am also looking for the fastest way to translate a result tree into a
array of strings. Each string is the total text of all Text and CDATA
children found for a single element node of the result tree.

Something like turning

<element>
1<![CDATA[2]]>3
</element>
<element>
1<![CDATA[2]]>3
</element>
<element>
1<![CDATA[2]]>3
</element>

into:

('123', '123', '123')

Here I have done:
    def gettext2(self, nodename):
        # get a xpath argument and return a string array from it.
        self.current_nodes = self.doc.xpathEval(nodename)
        report = []
        for system_node in self.current_nodes:
            # nodeobject.children always returns a list of nodes
            for i in system_node.children:
                a = i.get_type()
                name = i.nodePath()
                if i.get_type() == 'element':
                    # if a child is an element node, might have text, too
                    location = i.nodePath()
                    # send absolute nodepath to gettext2 (sub)subprocess
                    self.gettext2(location)
                    # DON'T USE BREAK here - it will kill all parent loops.
                if i.get_type() == 'text' or 'cdata':
                    element_text = i.getContent()
                    for ii in element_text:
                        # sometimes text node just an empty carriage return
                        if ii in ' \n':
                            pass
                        else:
                            header = "SYSTEM2data: " + name
                            report.append(header)
                            report.append(element_text)
                            break   # USE BREAK here or process may not end!
        # give the new text-array back to calling function
        return report

Where 'nodename' is an xpath argument, too.

This method is part of a class that already knows where the xml is :-) But I
don't like all the loops and subloops. If it isn't for the 'beak' near the
end, the routine turns into a harddrive virus !!!!!!

I don't have any specific questions here, but I am posting this junk because
all I know about it I read on websites and I am not sure I am taking the
right approach...


Ben














More information about the Python-list mailing list