[XML-SIG] Re: dom.minidom getting the text content of a node

Fredrik Lundh fredrik at pythonware.com
Thu Dec 9 20:47:41 CET 2004

Rick Hurst wrote:

> i'm trying the following (i'm a python newbie BTW):-
> from xml.dom.minidom import parse, parseString
> dom1 = parse('foo.xml')
> for node in dom1.getElementsByTagName("blog"):
>    id = node.getAttribute("id")
>    print id
>    for contentNode in node.getElementsByTagName("text"):
>       for titleNode in node.getElementsByTagName("blogtitle"):
>          print titleNode.nodeName  #returns "blogtitle"
>          print titleNode.nodeType   #returns 1
>          #print titleNode.data         #AttributeError: Element
> instance has no attribute 'data'
>          print titleNode.nodeValue  #returns "None"
> is there a way of doing this with minidom or do I need to be using a
> different parser? Any advice appreciated!

if you add this to the inner loop,

    print titleNode.childNodes
    print titleNode.firstChild.wholeText

you get this output (under 2.3.3):

    [<DOM Text node "\n">, <DOM CDATASection node "Plone: rem...">]

    Plone: remove member self registration

> http://sourceforge.net/tracker/?func=detail&atid=105470&aid=549725&group_id=5470

this bug report complains that the DOM represents the CDATA section as
four text nodes, which is also perfectly valid (see Martin's explanation).  code
that depends on being able to identify a CDATA section in the source file is
broken; character data, character references, entities, and CDATA section
should all be treated as text.

btw, here's the corresponding ElementTree version:

    from elementtree import ElementTree

    tree = ElementTree.parse("foo.xml")

    for node in tree.findall(".//blog"):
        print node.get("id")
        for content_node in node.findall("text"):
            print content_node.findtext("blogtitle")

or, shorter:

    for node in tree.findall(".//blog"):
        print node.get("id")
        print node.findtext("text/blogtitle")


More information about the XML-SIG mailing list