[Tutor] pyXML DOM 2.0 Traversal and filters

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Tue Apr 29 15:11:15 2003


On Tue, 29 Apr 2003, Levy Lazarre wrote:

> Given the following sample file ('appliances.xml'), I
> am trying to write a TreeWalker that would print out
> the element
> nodes, while a filter would prevent the nodes with a
> status of "broken" from displaying.
>
> <?xml version="1.0"?>
> <appliances>
>     <clock status = "working">cuckoo</clock>
>     <television status = "broken">black and
> white</television>
> </appliances>
>
> I am getting some exceptions, making me think that I
> am calling the filter the wrong way or I am missing
> something.
> Can somebody please point me to the right direction?
> Here is the sample code:

Hi Levy,


I haven't played with the filtering stuff yet (I'm more into
xml.dom.pulldom), but let's give it a shot!  *grin*


Let's take a look at the code:


> from xml.dom.ext.reader import Sax2
> from xml.dom.NodeFilter import NodeFilter
>
> def filterbroken(thisNode):
>     if (thisNode.nodeType == thisNode.ELEMENT_NODE and
>         thisNode.getAttribute("status") == "broken"):
>             return NodeFilter.FILTER_REJECT
>     return NodeFilter.FILTER_ACCEPT
>
> reader = Sax2.Reader()
>
> input_file = file("appliances.xml")
> doc = reader.fromStream(input_file)
> walker = doc.createTreeWalker(doc.documentElement,
>                               NodeFilter.SHOW_ALL, filterbroken, 0)


Ok, let's stop at this point.


The error message we're getting:

> AttributeError: 'function' object has no attribute
> 'acceptNode'

is implying that the walker is thinking that filterbroken is some kind of
class instance: it may be trying to do something like:

    filterbroken.acceptNode()

to call the filter.  But let's double check the documentation on
createTreeWalker()  and see what it expects:

    http://pyxml.sourceforge.net/topics/howto/node22.html

Odd!  According to the docs, it expects a function.  But according to the
error message,



> "C:\Python22\Lib\site-packages\_xmlplus\dom\TreeWalker.py",
> line 168, in __checkFilter
>     return self.__dict__['__filter'].acceptNode(node)
> AttributeError: 'function' object has no attribute
> 'acceptNode'


... it's trying to call an acceptNode() method.  So something here is
definitly wrong.  Either the code is wrong, or the documentation is wrong.
*grin*



And I expect it's the documentation.  Published code that uses
createTreeWalker() does appear to pass in NodeFilter instances, and not
functions:

###
# (part of:
#  http://aspn.activestate.com/ASPN/Mail/Message/XML-checkins/954448)

    def checkWalkerOnlyTextNodesParentNodeFirstChildFilterSkipB(self):
        class SkipBFilter(NodeFilter):
            def acceptNode(self, node):
                if node.nodeValue == 'B':
                    return self.FILTER_SKIP
                else:
                    return self.FILTER_ACCEPT

        walker = self.document.createTreeWalker(self.document,
            NodeFilter.SHOW_TEXT, SkipBFilter(), 0)
###



So you may find that this will work:

###
class FilterBroken(NodeFilter):
    def acceptNode(self, thisNode):
        if (thisNode.nodeType == thisNode.ELEMENT_NODE and
                thisNode.getAttribute("status") == "broken"):
            return NodeFilter.FILTER_REJECT
        return NodeFilter.FILTER_ACCEPT

reader = Sax2.Reader()
input_file = file("appliances.xml")
doc = reader.fromStream(input_file)
walker = doc.createTreeWalker(doc.documentElement,
                              NodeFilter.SHOW_ALL,
                              FilterBroken(), 0)
###


If this does do the trick, let's send a holler to the pyxml documentation
maintainers and get them to fix their documentation.  *grin*


Hope this helps!