[martin@loewis.home.cs.tu-berlin.de: Re: [martin@loewis.home.cs.tu-berlin.de: Re: [XML-SIG] Thought it was a bug, maybe XML is weirder than I thought]]

Sun, 1 Oct 2000 19:17:54 -0700

From: "Martin v. Loewis" <martin@loewis.home.cs.tu-berlin.de>
>I think you have a point on splitting a text fragment into multiple
>Text nodes; the DOM spec says about the interface Text:
>
># If there is no markup inside an element's content, the text is
># contained in a single object implementing the Text interface that is
># the only child of the element. If there is markup, it is parsed into
># a list of elements and Text nodes that form the list of children of
># the element.
>
>[from REC-DOM-Level-1-19981001]
>
>I'm not sure what that means for parsing &lt;hallo&gt; - is it
>permitted that these are split into three Text nodes, is it required
>that they are split, or is it disallowed?
>
>According to section 2.4 of XML 1.0 [REC-xml-19980210] says that an
>entity reference is markup; 4.1 says that &gt; is an entity reference
>(*not* a character reference) - so it appears permitted that multiple
>Text nodes are created.

Thanks, Martin.  (And please accept my apologies for posting from a
state of abysmal ignorance regarding XML.  Being a person who actually
enjoys reading standards documents, I'm going to read through the
document you referenced.)

>You *should* be able to merge them by calling normalize() on the tree;
>I'm not sure whether that worked in 0.5.5.1, it does work with 4DOM in
>PyXML 0.6. Please note that normalize won't merge CDATA sections.

It does work, at least on my test data.

-- 
Clarence Gardner
Software Engineer
NetLojix Communications
clarence@netlojix.com