[Expat-discuss] CRLF conversion question

Armin Bauer armin.bauer at desscon.com
Wed Sep 8 00:18:39 CEST 2004


On Tue, 2004-09-07 at 16:55, Fred L. Drake, Jr. wrote:
> On Tuesday 07 September 2004 10:09 am, Armin Bauer wrote:
>  > I suspect that this is a bug in the parser which manufactures the xml
>  > tree.
> 
> I'm not convinced.
> 
>  > the wbxml lib goes through the cdata which consist of a lot of nodes
>  > like this:
>  >
>  > WBXML Encoder> CDATA Begin
>  > WBXML Encoder> Text: <BEGIN:VCARD>
>  > WBXML Encoder> Text: <
>  >
>  > WBXML Encoder> Text: <VERSION:2.1>
>  > WBXML Encoder> Text: <
> 
> That seems like how Expat reports it.  Most tree-builders normalize on input, 
> though, so you would get just one text node.  There may be a knob to twist; 
> that depends on your tree-builder.  If you're getting a DOM, you should be 
> able to call .normalize() on the document to take care of this issue.
> 
>  > But it should be only one node which holds all the cdata. By the way:
>  > the <
>  >
>  > > node holds a 0x0a.
>  >
>  > So i guess the parser just creates a node for every 0x0d it encounters
>  > even it is in the cdata. That doesnt sound right to me.
>  >
>  > The correct fix to this (if my assumptions are correct) would be to not
>  > parse the cdata into nodes but to leave it as is.
> 
> It's not clear to me what this means.  If CDATA marked sections are 
> represented, it'll contain zero or more text nodes; this is expected of the 
> DOM.  I don't know what API your using, though (wbxml doesn't tell me 
> anything, unfortunately), but the nodes look like what I'd expect as output 
> events from Expat.  Especially the normalized newlines coming through as a 
> separate bit.
> 

Sorry if its me being stupid but why do CDATA sections contain nodes at
all? As far as my understanding goes the parser has not to touch the
cdata section at all.

So if the parser encounters something like this:

<![CDATA[BEGIN:VCARD
VERSION:2.1
X-EVOLUTION-FILE-AS:Mike, Smith
FN:Smith Mike
N:Mike;Smith
TEL;PREF;WORK:+1 469 43220403
EMAIL;INTERNET:mike.smith at yahoo.com
TITLE:Business Developer
UID:pas-id-413DC011000000B2
END:VCARD]]>

wouldnt it be correct if it created exactly one text node containing all
the text as is? At the moment it creates a lot of nodes for every line
etc.

wbxml is the library that converts syncml request to wap binary xml ( a
form of conversion before it is send over gprs) :)

>  > BTW: the &#13; fix did not work. It was left as is.
> 
> Inside of CDATA marked sections, yes.
> 
> 
>   -Fred



More information about the Expat-discuss mailing list