[ expat-Bugs-580503 ] 1.95.2 Reuses Buffer if isFinal != 0
noreply@sourceforge.net
noreply@sourceforge.net
Fri Jul 12 22:04:02 2002
Bugs item #580503, was opened at 2002-07-12 04:47
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=580503&group_id=10127
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Ari Johnson (ari_j)
Assigned to: Nobody/Anonymous (nobody)
Summary: 1.95.2 Reuses Buffer if isFinal != 0
Initial Comment:
In expat 1.95.2 (the latest version for which a
FreeBSD package is available), calling
XML_ParseBuffer() or XML_Parse() with isFinal != 0
causes the previously-parsed buffer to be parsed,
at least partially, again. For example, say that I'm
sending it chunks of 1024 bytes at a time, and the
last root tag of my XML file is <test>. Let's say
that the last buffer chunk I sent it was as follows:
--
<nodeN>value</nodeN>
<data/>
</test>
--
The next time I call XML_ParseBuffer() or XML_Parse
(), I get the full complement of startElement,
endElement, and characterData events for
<nodeN>. This occurs after I've received the
<test> endElement event. Moreover, the error
occurs whether I send len = 0 to XML_Parse*() or if I
actually send data to it.
Is this a known bug? Is it fixed in a newer version?
I couldn't find any references to it on the bug list,
open or closed; nor in the CVS log for xmlparser.c.
----------------------------------------------------------------------
>Comment By: Ari Johnson (ari_j)
Date: 2002-07-13 00:03
Message:
Logged In: YES
user_id=90811
That worked, thanks! Maybe you should make a very
blatant note of that in the reference for all of
XML_GetBuffer(), XML_ParseBuffer(), and XML_Parse(), as
well as any other functions that have that requirement.
Thanks again for the help, and the great library. It's
saving me from the bloat. ;)
----------------------------------------------------------------------
Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-12 23:19
Message:
Logged In: YES
user_id=290026
Without running your code, just by looking at it:
You are not supposed to call XML_GetBuffer or
XML_ParseBuffer from within a handler
(xml_handle_endelement in your case).
Take it out and then try again.
I have a suspicion that that might help.
Karl
----------------------------------------------------------------------
Comment By: Ari Johnson (ari_j)
Date: 2002-07-12 16:39
Message:
Logged In: YES
user_id=90811
No, my file size is not a multiple of 1024. Also, as my
file's name implies and you probably guessed from the
jabber.xml file I'm using for testing, I intend to use this
code over a network to communicate with a Jabber
server. So I absolutely can't predict my file size in
advance. However, I see the oversight you're pointing
out and will correct it; that's not where this bug is
coming from, though.
----------------------------------------------------------------------
Comment By: Ari Johnson (ari_j)
Date: 2002-07-12 16:34
Message:
Logged In: YES
user_id=90811
Forgot about my platform-specific code...in test.c,
change 'stdin->_file' to '0' and 'stdout->_file' to '1'.
And to run it, ./test < jabber.xml > output.xml
----------------------------------------------------------------------
Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-12 16:31
Message:
Logged In: YES
user_id=290026
Another thought:
If your file size is a multiple of 1024, your code will
call XML_ParseBuffer twice with a buffer size of zero.
This is probably not what is was intended for.
If your file size happens to be like that, maybe you
could try another approach?
----------------------------------------------------------------------
Comment By: Ari Johnson (ari_j)
Date: 2002-07-12 16:28
Message:
Logged In: YES
user_id=90811
I've narrowed the problem down considerably...
I had XML_ParseBuffer() print out 'start' to stderr after it
sets start in the first bit of the function. What I found is
that the contents are correct, for example if I set my
buffer to equal "<!-- this is a comment -->", then that
string gets appended to the end of the buffer where it
should be. However, start still points to the beginning
of the previous segment of the buffered text. This is
only apparently happening on the call to
XML_ParseBuffer() after the call that doesn't use the full
size of the buffer.
Attached, please find a .tar.gz (warning, no directory
structure, but only 4 files so it shouldn't cause a mess
for you) with the actual code I'm using. You'll have to
compile test.c and xmlstream.c manually and then link
them to form an executable.
----------------------------------------------------------------------
Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-12 16:15
Message:
Logged In: YES
user_id=290026
Since I cannot duplicate this, do you have access
to another platform/OS and can you try it there?
Maybe it is related to the fact that you are compiling
and running this on your version of FreeBSD.
----------------------------------------------------------------------
Comment By: Ari Johnson (ari_j)
Date: 2002-07-12 16:04
Message:
Logged In: YES
user_id=90811
It doesn't matter what size buffer I request, the
XML_ParseBuffer() call ends up re-parsing the previous
buffer when I give it a zero size argument. Also, even if
I feed it a fictitious XML comment, it still gives me the
error 9 - junk after document element.
----------------------------------------------------------------------
Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-12 15:37
Message:
Logged In: YES
user_id=290026
Try removing the call for getting a zero sized buffer:
char *buff = XML_GetBuffer(p, 0);
or change the size parameter back to 1024.
Does that have any effect?
----------------------------------------------------------------------
Comment By: Ari Johnson (ari_j)
Date: 2002-07-12 15:29
Message:
Logged In: YES
user_id=90811
Just an update...the same thing happens when I don't
set isFinal in the error-producing call, or of course if I
feed it more text (since this is just after the </root> tag
has been read).
----------------------------------------------------------------------
Comment By: Ari Johnson (ari_j)
Date: 2002-07-12 15:10
Message:
Logged In: YES
user_id=90811
I apologize for a mistake; the error it returns is 9 ->
junk after document element. I was calling
GetErrorCode wrong. Anyhow...
---
int c = 1024;
while(c == 1024) {
char *buff = XML_GetBuffer(p, 1024);
if(!buff)
...
c = read(fd, buff, 1024);
if(c < 0)
...
if(!XML_ParseBuffer(p, c, 0))
...
}
...
void handle_end_element(...) {
if([it was the root element]) {
char *buff = XML_GetBuffer(p, 0);
if(!XML_ParseBuffer(p, 0, 1))
/* This is where the error occurs. */
}
}
----------------------------------------------------------------------
Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-12 14:42
Message:
Logged In: YES
user_id=290026
In that case please post a small example that can
produce the behaviour you have reported.
Karl
----------------------------------------------------------------------
Comment By: Ari Johnson (ari_j)
Date: 2002-07-12 14:22
Message:
Logged In: YES
user_id=90811
Same thing occurs with the current CVS. This even
occurs when I actually send XML_Parse*() some text to
process. And to make things worse, XML_Parse()
returns false since it's trying to parse text past the close
tag of my root element, but the error code is set to 0.
----------------------------------------------------------------------
Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-12 08:07
Message:
Logged In: YES
user_id=290026
Please retest with 1.95.3 or the current CVS.
Also, we are planning to release 1.95.4 today,
so if this is a bug, let's find out ASAP.
Karl
----------------------------------------------------------------------
Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-12 07:52
Message:
Logged In: YES
user_id=290026
Please retest with 1.95.3 or the current CVS.
Also, we are planning to release 1.95.4 today,
so if this is a bug, let's find out ASAP.
Karl
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=580503&group_id=10127