[Expat-bugs] [ expat-Bugs-600479 ] error decoding UTF-8 triplet
noreply@sourceforge.net
noreply@sourceforge.net
Tue, 27 Aug 2002 10:07:29 -0700
Bugs item #600479, was opened at 2002-08-26 17:34
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=600479&group_id=10127
Category: None
Group: Test Required
>Status: Closed
Resolution: Fixed
Priority: 6
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: error decoding UTF-8 triplet
Initial Comment:
On Windows, when reading the UTF-8 sequence "EF
BA BF", utf8_isInvalid3 returns TRUE, when it should
return FALSE. This UTF-8 sequence encodes to "FEBF"
as UCS-2 (Unicode), but as a result of utf8_isInvalid3
returning TRUE, an error results and the character isn't
decoded properly.
This is using expat 1.95.4.
Attached is a simple XML file which illustrates the
problem.
----------------------------------------------------------------------
>Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-08-27 13:07
Message:
Logged In: YES
user_id=3066
Ok, I've commited the regression test for this as
tests/runtests.c revision 1.34. The other bugs should be
filed in new reports.
----------------------------------------------------------------------
Comment By: Karl Waclawek (kwaclaw)
Date: 2002-08-27 13:04
Message:
Logged In: YES
user_id=290026
Maybe you checked out from CVS in between changes
we made. Could you please re-try with the newest?
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2002-08-27 13:01
Message:
Logged In: NO
Sorry, I should be more specific: the problems I am having
with the source from CVS do not relate specifically to UTF-8
encodings, but lots of different problems like segmentation
violations, etcetera. These problems did not occur with
1.95.4, except the UTF-8 problem. I will isolate the specifics
for you over the next couple of days.
I haven't been using any new functionality. The code I have is
using the 1.95.2 API.
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2002-08-27 12:59
Message:
Logged In: NO
Sorry, I should be more specific: the problems I am having
with the source from CVS do not relate specifically to UTF-8
encodings, but lots of different problems like segmentation
violations, etcetera. These problems did not occur with
1.95.4, except the UTF-8 problem. I will isolate the specifics
for you over the next couple of days.
I haven't been using any new functionality. The code I have is
using the 1.95.2 API.
----------------------------------------------------------------------
Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-08-27 12:29
Message:
Logged In: YES
user_id=3066
I've got a test case ready to checkin for the specific
reported character that caused problems in the original
report, but will hold off checking it in until we have the
additional failure information, so I can generalize the test.
----------------------------------------------------------------------
Comment By: Karl Waclawek (kwaclaw)
Date: 2002-08-27 12:19
Message:
Logged In: YES
user_id=290026
Yes, please give us test cases that blow everything.
Only from mistakes we can learn ...
----------------------------------------------------------------------
Comment By: Nobody/Anonymous (nobody)
Date: 2002-08-27 12:15
Message:
Logged In: NO
Thanks Karl. I applied your fix to the define UTF8_INVALID3
with the expat 1.95.4 tarball (xmltok.c) and this worked fine,
however, when I tried using what was in CVS, everything blew
up on me.
I can pass you some further test cases and possibly some
patches, if you like.
----------------------------------------------------------------------
Comment By: Karl Waclawek (kwaclaw)
Date: 2002-08-26 20:30
Message:
Logged In: YES
user_id=290026
Yes, this is a bug.
utf8_isInvalid3 tries to detect the invalid XML sequences
(*not* invalid unicode) EF BF BE and EF BF BF, but
only checks the first and third byte, not the second one.
Fix alread checked into CVS (xmltok.c 1.23).
Please check out and test.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=600479&group_id=10127