[Python-bugs-list] [ python-Bugs-549725 ] xml.dom.minidom doesn't pass CDATA

noreply@sourceforge.net noreply@sourceforge.net
Sun, 28 Apr 2002 13:04:28 -0700


Bugs item #549725, was opened at 2002-04-28 08:41
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=549725&group_id=5470

Category: Python Library
Group: Python 2.3
>Status: Closed
Resolution: Invalid
Priority: 5
Submitted By: Matthias Urlichs (smurf)
Assigned to: Nobody/Anonymous (nobody)
Summary: xml.dom.minidom doesn't pass CDATA

Initial Comment:
>>> import sys
>>> from sxml.xml2py import parseFile
# this is a simple wrapper to xml.dom.minidom.parse()
>>> x=parseFile(sys.stdin)

<?xml version="1.0" ?>           
<foo><![CDATA[dies ist
ein bar
]]></foo>
^D

>>> x
[<DOM Element: foo at 1076384172>]
>>> x.childNodes[0].childNodes
[<DOM Text node "dies ist">, <DOM Text node "\n">, <DOM Text node "ein bar">, <DOM Text node "\n">]
>>> 

I was expecting a CDATASection node here.
(In fact, my code would like to depend on it.)


----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-04-28 20:03

Message:
Logged In: YES 
user_id=10327

Oh well...

I'm therefore closing this bug.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-04-28 19:57

Message:
Logged In: YES 
user_id=21627

Support for CDATA sections is optional because the DOM spec
says so, see

http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/core.html#ID-E067D597

Notice that the DOM spec itself is silent on the issue of
building DOM trees. In DOM Level 3, there is a feature to
control whether CDATA sections are created or not, but
minidom is not targeted at DOM level 3 (and DOM level 3 is
not completed).

The DOM tree is build based on the information that the XML
parser produces, which happens to be Expat. This parser, in
turn, does not support reporting CDATA section boundaries. 

You could try to use a different XML parser. Notice that the
minidom builder uses the SAX API, which supports reporting
of CDATA section boundaries as an option only, as well. So
you'd not only need a different parser, but also a different
DOM builder. If you absolutely need this functionality, you
can use 4DOM with xmlproc, from PyXML.

If you don't like several subsequent Text nodes, you can use
the DOM element .normalize method to merge them. Notice that
.normalize would not merge CDATA sections.

In any case, this is clearly not a bug in minidom.

----------------------------------------------------------------------

Comment By: Matthias Urlichs (smurf)
Date: 2002-04-28 18:55

Message:
Logged In: YES 
user_id=10327

SXML is a project of mine. As I said, it's just a simple wrapper for minidom.

Why should CDATA handling be optional? It seems that it should be _easier_ to package the string into one CDATASection element. Instead, four Text elements are used -- the first line, the first linefeed, the second line, and the second linefeed. It's additional effort, and I'd like to turn it off if I don't want it.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-04-28 13:59

Message:
Logged In: YES 
user_id=21627

What is sxml? Why is this a bug in Python?

Notice that the use of CDATA in the DOM is completely
optional - the DOM tree represents your document correctly.
Code relying on CDATA is broken.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=549725&group_id=5470