[Expat-bugs] [ expat-Bugs-898906 ] Expat fails to correctly parse content data

SourceForge.net noreply at sourceforge.net
Sun Mar 14 19:25:35 EST 2004


Bugs item #898906, was opened at 2004-02-17 12:33
Message generated for change (Comment added) made by kwaclaw
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=898906&group_id=10127

>Category: None
>Group: Not a Bug
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: Expat fails to correctly parse content data

Initial Comment:
In the following XML file:
<?xml version="1.0" ?>
<document>
	<test1 name="if (x<y)" >
	if (x < y)
	</test1>
</document>

the < in the attribute is correctly returned as "<",
however, in the character block, it appears to be 
returned as a NULL.



I will try to look into it to see if I can figure out
where expat went wrong.


Here is the output from my simple test app using expat:

defaultHandler: &#039;<?xml version="1.0" ?&#039;
defaultHandler: &#039;&#039;
startElement &#039;document&#039;
.characterData: &#039;&#039;
.characterData: &#039;&#039;
.startElement &#039;test1&#039;
.Attribute: &#039;name&#039; = &#039;if (x<y)&#039;
..characterData: &#039;&#039;
..characterData: &#039;      if (x&#039;
..characterData: &#039;&#039;
..characterData: &#039; y&#039;
..characterData: &#039;&#039;
..characterData: &#039;&#039;
.endElement &#039;test1&#039;
.characterData: &#039;&#039;
endElement &#039;document&#039;


Thanks,
Nicholas



----------------------------------------------------------------------

>Comment By: Karl Waclawek (kwaclaw)
Date: 2004-03-14 19:25

Message:
Logged In: YES 
user_id=290026

Examined code provided by submitter.
Seems there was an assumption that character data are
reported as null-terminated string, which it is not. 
Changing the code accordingly solved problem.

Closed - not a bug.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2004-03-11 12:03

Message:
Logged In: YES 
user_id=290026

I just checked your code reference, and in my version of 
xmlparse.c, XML_TOK_DATA_CHARS is nowhere near line 
2334. I also could not find a condition for calling XmlConvert().

So, which version of Expat are you using?


----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2004-03-10 23:13

Message:
Logged In: YES 
user_id=290026

Please e-mail me the xml file that causes your problem.
I will have a look at it.

My address: karl at waclawek.net

----------------------------------------------------------------------

Comment By: Nicholas hendricks (nhendricks)
Date: 2004-03-10 15:06

Message:
Logged In: YES 
user_id=977364

I tried setting the default handler to the Expand default,
but this did not help.

I reviewed the xml spec & according to section 4.4,
character references must be expanded in characater data.
Using a different default handler should not have any affect
on how character references are handled...

I went ahead & trapped this in the debugger. It appears the
problem occurs around line 2334 in xmlparse.c.
Look for case XML_TOK_DATA_CHARS:

In this case stamement, XmlConvert is conditionally called
before the character callback. In my case, XmlConvert does
not get called for the character data.

The reason it works in the attribute case is that XmlConvert
is always called. 

This really looks like a bug in expat, not in my usage of it.

Thanks for looking into it.
Nicholas Hendricks

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2004-03-08 21:36

Message:
Logged In: YES 
user_id=290026

Yes, and that behaviour of the characterData handler
would be the effect of setting a default handler with
XML_SetDefaultHandler instead of 
XML_SetDefaultHandlerExpand.
So, what function did you use to set your default handler?

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2004-03-08 20:24

Message:
Logged In: NO 

It is the characterData handler that is not properly
processing the 'less than' sequence. 

If you notice, the 'less than' sequence was properly handled
in the element handler.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2004-02-25 11:32

Message:
Logged In: YES 
user_id=290026

One thing that comes to mind:

There are two ways of setting a default handler.
One of them suppresses reporting of internal entitities.

Please check the docs and compare with your code.

----------------------------------------------------------------------

Comment By: Nicholas hendricks (nhendricks)
Date: 2004-02-25 11:24

Message:
Logged In: YES 
user_id=977364

Make that a &lt; where the < is.  
I didn't realize sourceforce would parse the comments,
instead of including them as plain text.

----------------------------------------------------------------------

Comment By: Nicholas hendricks (nhendricks)
Date: 2004-02-25 11:20

Message:
Logged In: YES 
user_id=977364

I am a sourceforce NOOB, so I cant figure out how to attach
files.

The < in the above example should be <



----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2004-02-18 10:51

Message:
Logged In: YES 
user_id=290026

The way your document appears here is not well-formed, but
maybe the HTML processing expanded some entities.
Please attach the document as a file.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=898906&group_id=10127



More information about the Expat-bugs mailing list