[ expat-Bugs-580503 ] 1.95.2 Reuses Buffer if isFinal != 0

noreply@sourceforge.net noreply@sourceforge.net
Fri Jul 12 23:11:02 2002


Bugs item #580503, was opened at 2002-07-12 04:47
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=580503&group_id=10127

Category: None
Group: None
Status: Open
Resolution: Rejected
Priority: 5
Submitted By: Ari Johnson (ari_j)
Assigned to: Nobody/Anonymous (nobody)
Summary: 1.95.2 Reuses Buffer if isFinal != 0

Initial Comment:
In expat 1.95.2 (the latest version for which a 
FreeBSD package is available), calling 
XML_ParseBuffer() or XML_Parse() with isFinal != 0 
causes the previously-parsed buffer to be parsed, 
at least partially, again.  For example, say that I'm 
sending it chunks of 1024 bytes at a time, and the 
last root tag of my XML file is <test>.  Let's say 
that the last buffer chunk I sent it was as follows:
--
<nodeN>value</nodeN>
<data/>
</test>
--
The next time I call XML_ParseBuffer() or XML_Parse
(), I get the full complement of startElement, 
endElement, and characterData events for 
<nodeN>.  This occurs after I've received the 
<test> endElement event.  Moreover, the error 
occurs whether I send len = 0 to XML_Parse*() or if I 
actually send data to it.

Is this a known bug?  Is it fixed in a newer version?  
I couldn't find any references to it on the bug list, 
open or closed; nor in the CVS log for xmlparser.c.


----------------------------------------------------------------------

>Comment By: Ari Johnson (ari_j)
Date: 2002-07-13 01:10

Message:
Logged In: YES 
user_id=90811

A documentation bug...my favorite kind.  Of course, you 
could always do the more malevolent fix, which would 
be to have XML_GetBuffer() return 0xd00d and 
XML_ParseBuffer() return error with message 'you 
blithering idiot, you can't do that!'.  Of course, 
XML_Parse() would not be changed at all, so people who 
use it would automatically get the core dump for 
accessing memory at 0xd00d.

Then again, that probably only works for my users.  
Some people aren't so tolerant. ;)

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-13 00:09

Message:
Logged In: YES 
user_id=290026

Great!
I think you are right - we should add a note to those
functions about when *not* to call them.

I leave this bug report open, so that we don't forget.
I am sure once Fred gets annoyed by the open bug
report he will read this note. <g>

Karl

----------------------------------------------------------------------

Comment By: Ari Johnson (ari_j)
Date: 2002-07-13 00:03

Message:
Logged In: YES 
user_id=90811

That worked, thanks!  Maybe you should make a very 
blatant note of that in the reference for all of 
XML_GetBuffer(), XML_ParseBuffer(), and XML_Parse(), as 
well as any other functions that have that requirement.  
Thanks again for the help, and the great library.  It's 
saving me from the bloat. ;)

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-12 23:19

Message:
Logged In: YES 
user_id=290026

Without running your code, just by looking at it:

You are not supposed to call XML_GetBuffer or 
XML_ParseBuffer from within a handler 
(xml_handle_endelement in your case).

Take it out and then try again.
I have a suspicion that that might help.

Karl



----------------------------------------------------------------------

Comment By: Ari Johnson (ari_j)
Date: 2002-07-12 16:39

Message:
Logged In: YES 
user_id=90811

No, my file size is not a multiple of 1024.  Also, as my 
file's name implies and you probably guessed from the 
jabber.xml file I'm using for testing, I intend to use this 
code over a network to communicate with a Jabber 
server.  So I absolutely can't predict my file size in 
advance.  However, I see the oversight you're pointing 
out and will correct it; that's not where this bug is 
coming from, though.

----------------------------------------------------------------------

Comment By: Ari Johnson (ari_j)
Date: 2002-07-12 16:34

Message:
Logged In: YES 
user_id=90811

Forgot about my platform-specific code...in test.c, 
change 'stdin->_file' to '0' and 'stdout->_file' to '1'.  
And to run it, ./test < jabber.xml > output.xml


----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-12 16:31

Message:
Logged In: YES 
user_id=290026

Another thought:

If your file size is a multiple of 1024, your code will
call XML_ParseBuffer twice with a buffer size of zero.

This is probably not what is was intended for.
If your file size happens to be like that, maybe you
could try another approach?

----------------------------------------------------------------------

Comment By: Ari Johnson (ari_j)
Date: 2002-07-12 16:28

Message:
Logged In: YES 
user_id=90811

I've narrowed the problem down considerably...

I had XML_ParseBuffer() print out 'start' to stderr after it 
sets start in the first bit of the function.  What I found is 
that the contents are correct, for example if I set my 
buffer to equal "<!-- this is a comment -->", then that 
string gets appended to the end of the buffer where it 
should be.  However, start still points to the beginning 
of the previous segment of the buffered text.  This is 
only apparently happening on the call to 
XML_ParseBuffer() after the call that doesn't use the full 
size of the buffer.

Attached, please find a .tar.gz (warning, no directory 
structure, but only 4 files so it shouldn't cause a mess 
for you) with the actual code I'm using.  You'll have to 
compile test.c and xmlstream.c manually and then link 
them to form an executable.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-12 16:15

Message:
Logged In: YES 
user_id=290026

Since I cannot duplicate this, do you have access
to another platform/OS and can you try it there?
Maybe it is related to the fact that you are compiling
and running this on your version of FreeBSD.



----------------------------------------------------------------------

Comment By: Ari Johnson (ari_j)
Date: 2002-07-12 16:04

Message:
Logged In: YES 
user_id=90811

It doesn't matter what size buffer I request, the 
XML_ParseBuffer() call ends up re-parsing the previous 
buffer when I give it a zero size argument.  Also, even if 
I feed it a fictitious XML comment, it still gives me the 
error 9 - junk after document element.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-12 15:37

Message:
Logged In: YES 
user_id=290026

Try removing the call for getting a zero sized buffer:
  char *buff = XML_GetBuffer(p, 0);
or change the size parameter back to 1024.
Does that have any effect?


----------------------------------------------------------------------

Comment By: Ari Johnson (ari_j)
Date: 2002-07-12 15:29

Message:
Logged In: YES 
user_id=90811

Just an update...the same thing happens when I don't 
set isFinal in the error-producing call, or of course if I 
feed it more text (since this is just after the </root> tag 
has been read).

----------------------------------------------------------------------

Comment By: Ari Johnson (ari_j)
Date: 2002-07-12 15:10

Message:
Logged In: YES 
user_id=90811

I apologize for a mistake; the error it returns is 9 -> 
junk after document element.  I was calling 
GetErrorCode wrong.  Anyhow...

---
int c = 1024;
while(c == 1024) {
  char *buff = XML_GetBuffer(p, 1024);
  if(!buff)
    ...
  c = read(fd, buff, 1024);
  if(c < 0)
    ...
  if(!XML_ParseBuffer(p, c, 0))
    ...
}

...

void handle_end_element(...) {
  if([it was the root element]) {
    char *buff = XML_GetBuffer(p, 0);
    if(!XML_ParseBuffer(p, 0, 1))
      /* This is where the error occurs. */
  }
}


----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-12 14:42

Message:
Logged In: YES 
user_id=290026

In that case please post a small example that can
produce the behaviour you have reported.

Karl

----------------------------------------------------------------------

Comment By: Ari Johnson (ari_j)
Date: 2002-07-12 14:22

Message:
Logged In: YES 
user_id=90811

Same thing occurs with the current CVS.  This even 
occurs when I actually send XML_Parse*() some text to 
process.  And to make things worse, XML_Parse() 
returns false since it's trying to parse text past the close 
tag of my root element, but the error code is set to 0.


----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-12 08:07

Message:
Logged In: YES 
user_id=290026

Please retest with 1.95.3 or the current CVS.
Also, we are planning to release 1.95.4 today,
so if this is a bug, let's find out ASAP.

Karl

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-12 07:52

Message:
Logged In: YES 
user_id=290026

Please retest with 1.95.3 or the current CVS.
Also, we are planning to release 1.95.4 today,
so if this is a bug, let's find out ASAP.

Karl

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=580503&group_id=10127