[XML-SIG] Re: dom.minidom getting the text content of a node

Rick Hurst rick.hurst at gmail.com
Fri Dec 10 09:40:45 CET 2004

On Thu, 9 Dec 2004 20:47:41 +0100, Fredrik Lundh <fredrik at pythonware.com> wrote:
> if you add this to the inner loop,
>     print titleNode.childNodes
>     print titleNode.firstChild.wholeText
> you get this output (under 2.3.3):
>     [<DOM Text node "\n">, <DOM CDATASection node "Plone: rem...">]

Thanks Frederik

> > http://sourceforge.net/tracker/?func=detail&atid=105470&aid=549725&group_id=5470
> this bug report complains that the DOM represents the CDATA section as
> four text nodes, which is also perfectly valid (see Martin's explanation).  code
> that depends on being able to identify a CDATA section in the source file is
> broken; character data, character references, entities, and CDATA section
> should all be treated as text.

that makes sense

> btw, here's the corresponding ElementTree version:
>     from elementtree import ElementTree
>     tree = ElementTree.parse("foo.xml")
>     for node in tree.findall(".//blog"):
>         print node.get("id")
>         for content_node in node.findall("text"):
>             print content_node.findtext("blogtitle")
> or, shorter:
>     for node in tree.findall(".//blog"):
>         print node.get("id")
>         print node.findtext("text/blogtitle")

wow, that looks like a more concise way to do it - thanks i'll take a
look at that.

FWIW I had some sucess using Sax2 last night:-

import sys
from xml.dom.ext.reader import Sax2

# create Reader object
reader = Sax2.Reader()

# parse the document
dom1 = reader.fromStream('200406archive010.xml')

for node in dom1.getElementsByTagName("blog"):
    id = node.getAttribute("id")
    print int(id)
    for contentNode in node.getElementsByTagName("text"):
       for titleNode in contentNode.getElementsByTagName("blogtitle"):
          print titleNode.firstChild.data
       for titleNode in contentNode.getElementsByTagName("blogbody"):
          print titleNode.firstChild.data 

Rick Hurst

More information about the XML-SIG mailing list