[ expat-Bugs-580258 ] Problem with GetBuffer/ParseBuffer

noreply@sourceforge.net noreply@sourceforge.net
Fri Jul 12 13:40:02 2002


Bugs item #580258, was opened at 2002-07-11 15:40
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=580258&group_id=10127

Category: None
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Christopher M. Woods (cmwoods)
Assigned to: Nobody/Anonymous (nobody)
Summary: Problem with GetBuffer/ParseBuffer

Initial Comment:
This is my first post so forgive me if I leave anything 
out...
     I've encountered a problem while using the 
XML_GetBuffer/XML_ParserBuffer methods of the Expat 
library.  [Using libexpatw.lib, on Win2K w/ MSVC++ 6.0 -
 sp5, wide chars (UNICODE), with a UTF-16 encoded 
XML file of roughly 33KB of data.]  When using these 
methods, I've experienced errors from the parser stating 
one of the following: not well-formed XML, illegal token, 
or unclosed token.  Each of the errors appear 
consistently for a given file and a given buffer size.
     I haven't narrowed down the problem yet - and will 
include more information once I get a chance to dig into 
the code further.  I can tell you that I get no errors on the 
file if I read it into my own buffer and use the XML_Parse 
method.  I can also tell you that I DO GET errors if I 
request a buffer large enough for the file, read the file 
into the buffer, and then call XML_ParseBuffer... so the 
problem appears to be [at least on the surface] with 
XML_ParseBuffer.

----------------------------------------------------------------------

>Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-12 16:39

Message:
Logged In: YES 
user_id=290026

Great!
Bug report closed.

----------------------------------------------------------------------

Comment By: Christopher M. Woods (cmwoods)
Date: 2002-07-12 16:34

Message:
Logged In: YES 
user_id=576763

Karl,
     Your suggestion worked - thank you.  You can pull this 
from the bug list then.

-Chris

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-12 12:51

Message:
Logged In: YES 
user_id=290026

I could get it to work, but it does not produce any 
output. Anyway, I had a look at your loop.

It should not work the way it is written.
Expat expects a buffer of bytes, but you are passing 
the buffer chunks as null terminated wide strings.
Expat can even handle it if the buffer boundaries are 
*within* a wide character.

The following code should work (or come close):

while (bRC && dwSize!=0)
{
  pBuffer = XML_GetBuffer(m_pParser, READ_SIZE);
    if (pBuffer)
  {
    dwSize = fread(pBuffer, 1, READ_SIZE, pInputFile);
    bRC = XML_ParseBuffer(m_pParser, dwSize, 
dwSize==0);
  } else {
    bRC = false;
  }
}

Please test. 
At this point I don't see a bug in Expat.

Karl

----------------------------------------------------------------------

Comment By: Christopher M. Woods (cmwoods)
Date: 2002-07-12 12:02

Message:
Logged In: YES 
user_id=576763

Karl,
   Don't know what to say... I'm using Win2K from the cmd 
line also...

Sample.cpp <D:\Temp\Sample\Sample.xml

-Chris

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-12 11:39

Message:
Logged In: YES 
user_id=290026

Your sample project expats input redirection,
but that does not work on my system, it seems.
Using W2K, cmd.exe.

Karl

----------------------------------------------------------------------

Comment By: Christopher M. Woods (cmwoods)
Date: 2002-07-12 10:59

Message:
Logged In: YES 
user_id=576763

File was too large... I had to remove libexpatw_d.dll... you'll 
want to rebuild or change project settings to link to 
libexpatw.dll (for debug)

----------------------------------------------------------------------

Comment By: Christopher M. Woods (cmwoods)
Date: 2002-07-12 10:56

Message:
Logged In: YES 
user_id=576763

Hmm... I'll try again.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-12 10:09

Message:
Logged In: YES 
user_id=290026

I can't see your zip file.

Karl

----------------------------------------------------------------------

Comment By: Christopher M. Woods (cmwoods)
Date: 2002-07-12 09:41

Message:
Logged In: YES 
user_id=576763

   Hmm... Perhaps I'm missing something in my setup or the 
delphi code isn't quite the same?  I'm using wchar_t 
(UNICODE & _UNICODE defines) and maybe the issue is 
related to that?  I've included a sample project in the zip file 
that exhibits the behaviorism that I'm seeing.
   The libexpatw.lib is right from the win32bin distribution 
(1.95.3) and the libexpatw_d.lib is a debug build I made from 
the included source code.
   If you look in XMLExtractor.cpp, in the Process() function, 
starting at line 52, you'll see both methodologies are coded 
and one is commented out.  The GetBuffer/ParseBuffer 
method fails for me and the Parse method (the 2nd one) 
seems to work fine.  [Separate question: Why does 
XML_Parse take const char* and not XML_Char*?]
   I'd be perfectly happy if this turns out to be nothing more 
that a simple configuration problem on my part.  But I'm 
curious why it seems to work fine one way and not the other.

Thank you for your time,
-Chris

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-11 21:22

Message:
Logged In: YES 
user_id=290026

I cannot reproduce your problem with the current CVS
and also not with 1.95.3. I tested with the same buffer sizes 
you gave me.

I compiled the Dll with VC++6 (I believe I have SP5 also).
My test program is written in Delphi, but you can probably
still compare my parsing loop with yours:

const
  XML_READBUFSIZE = 65536;
...
IsFinal := False;
while not IsFinal do
  begin
  Buffer := XMLGetBuffer(Parser, XML_READBUFSIZE);
  if Buffer = nil then OutOfMemoryError;
  ReadCount := Stream.Read(Buffer^, XML_READBUFSIZE);
  IsFinal := ReadCount < XML_READBUFSIZE;
  if XMLParseBuffer(Parser, ReadCount, Integer(IsFinal)) = 0 
then
    begin
    ErrorCode := Ord(XMLGetErrorCode(Parser));
    ...
    Break;
    end;
  end;

Is that different?

Karl



----------------------------------------------------------------------

Comment By: Christopher M. Woods (cmwoods)
Date: 2002-07-11 17:51

Message:
Logged In: YES 
user_id=576763

Frank,
   I was/am using the 1.95.3 version of the project.  I'll try the 
CVS version when I get a chance (very busy).

Karl,
   I've attached the sample file that I was using/having 
problems with.  I tried several buffer sizes including: 32768, 
65536, 1000000, and (if I remember correctly) 8192.

Thanks,
-Chris

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-11 16:01

Message:
Logged In: YES 
user_id=290026

Would you mind attaching the file and giving me the 
buffer size you used?
I will try to duplicate the problem.

Karl

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-07-11 15:51

Message:
Logged In: YES 
user_id=3066

Are you using Expat 1.95.3 or the CVS version of Expat?

If you're not using the CVS version, can you take a look at
that version and try to reproduce the problem?  You can get
information on getting the CVS version anonymously at:

http://www.libexpat.org/dev/cvs.html

Thanks!

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=580258&group_id=10127