[email@example.com: Re: [XML-SIG] Thought it was a bug, maybe XML is weirder than I thought]
Martin v. Loewis
Mon, 2 Oct 2000 01:33:12 +0200
> On further reflection, I can see that my previous concern about two
> original TEXT children of <username> was nonsensical (if they were
> really distinct, they should be elements), but nonetheless, the
> lesson about having to concatenate all TEXT children to get the
> original text value seems to be true.
I think you have a point on splitting a text fragment into multiple
Text nodes; the DOM spec says about the interface Text:
# If there is no markup inside an element's content, the text is
# contained in a single object implementing the Text interface that is
# the only child of the element. If there is markup, it is parsed into
# a list of elements and Text nodes that form the list of children of
# the element.
# When a document is first made available via the DOM, there is only
# one Text node for each block of text. Users may create adjacent Text
# nodes that represent the contents of a given element without any
# intervening markup, but should be aware that there is no way to
# represent the separations between these nodes in XML or HTML, so
# they will not (in general) persist between DOM editing sessions. The
# normalize() method on Element [p.38] merges any such adjacent Text
# objects into a single node for each block of text; this is
# recommended before employing operations that depend on a particular
# document structure, such as navigation with XPointers.
I'm not sure what that means for parsing <hallo> - is it
permitted that these are split into three Text nodes, is it required
that they are split, or is it disallowed?
According to section 2.4 of XML 1.0 [REC-xml-19980210] says that an
entity reference is markup; 4.1 says that > is an entity reference
(*not* a character reference) - so it appears permitted that multiple
Text nodes are created.
You *should* be able to merge them by calling normalize() on the tree;
I'm not sure whether that worked in 0.5.5.1, it does work with 4DOM in
PyXML 0.6. Please note that normalize won't merge CDATA sections.