[XML-SIG] Useful Python 2.2 tools for the DOM hacker

Uche Ogbuji uche.ogbuji@fourthought.com
22 Jun 2002 15:29:59 -0600


--=-ZJpz5ChmaCs1c1poLgWW
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

I've kicked up a few useful routines today for DOM processing with
Python 2.2.  (BTW, generators kick ass).

I've attched a module of these routines just in case they come in handy
to others.  I must say that the combo of generators/iterators and list
comps makes this code extraordinarily cleaner and faster than it would
have been in, say Python 1.5.

BTW, if someone more graphically inclined than I wants to write the
stubbed out function dom_trace and post it back here, I'd be happy to go
into debt of a beer.  What would really be cool is a routine that emits
an SVG diagram of a DOM's contents.  Based on stuff I've seen done with
SVG, this should be quite feasible.


-- 
Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
Track chair, XML/Web Services One (San Jose, Boston):
http://www.xmlconference.com/
DAML Reference - http://www.xml.com/pub/a/2002/05/01/damlref.html
The Languages of the Semantic Web -
http://www.newarchitectmag.com/documents/s=2453/new1020218556549/index.html
XML, The Model Driven Architecture, and RDF @ XML Europe -
http://www.xmleurope.com/2002/kttrack.asp#themodel

--=-ZJpz5ChmaCs1c1poLgWW
Content-Disposition: attachment; filename=DomTools.py
Content-Transfer-Encoding: quoted-printable
Content-Type: text/x-python; name=DomTools.py; charset=ISO-8859-1

########################################################################
#
# File Name:            DomARama.py
#
"""
DOM Processing Utilities: Python 2.2 only
Copyright 2002 Uche Ogbuji http://uche.ogbuji.net
http://4suite.org/
"""

from __future__ import generators
from xml.dom import Node


def in_order_iterator(node):
    yield node
    for child in node.childNodes:
        for cn in in_order_iterator(child):
            yield cn
    return


def in_order_iterator_filter(node, filter_func):
    if filter_func(node):
        yield node
    for child in node.childNodes:
        for cn in in_order_iterator_filter(child, filter_func):
            if filter_func(cn):
                yield cn
    return


def get_elements_by_tag_name_ns(node, ns, local):
    return in_order_iterator_filter(node, lambda n: n.nodeType =3D=3D Node.=
ELEMENT_NODE and n.namespaceURI =3D=3D ns and n.localName =3D=3D local)


def string_value(node):
    text_nodes =3D in_order_iterator_filter(node, lambda n: n.nodeType =3D=
=3D Node.TEXT_NODE)
    return u''.join([ n.data for n in text_nodes ])


#def DomTrace(node):
#    """
#    Display a rudimentary diagram of a DOM's contents
#    """


if __name__ =3D=3D "__main__":
    DOC =3D """<spam xmlns:x=3D'http://spam.com'>eggs<monty>python</monty><=
/spam>
    """
    from Ft.Xml.Domlette import NonvalidatingReader
    doc =3D NonvalidatingReader.parseString(DOC, "http://spam.com/base")
    print "All nodes:"
    for node in in_order_iterator(doc):
        print node
    print "Elements only:"
    for node in in_order_iterator_filter(
        doc, lambda x: x.nodeType =3D=3D Node.ELEMENT_NODE):
        print node
    print "Get elements by tag name:"
    for node in get_elements_by_tag_name_ns(doc, None, 'monty'):
        print node
    print "String value:"
    print string_value(doc)


--=-ZJpz5ChmaCs1c1poLgWW--