[Expat-bugs] Problem with decoding UTF-8 triplet and expat 1.95.4
Tim Crook
tim.crook@adobe.com
Mon, 26 Aug 2002 17:51:25 -0400
On Windows, when reading the UTF-8 sequence "EF BA BF", utf8_isInvalid3
returns TRUE, when it should return FALSE. This UTF-8 sequence encodes to
"FEBF" as UCS-2 (Unicode), but as a result of utf8_isInvalid3 returning
TRUE, an error results and the character isn't decoded properly.
Here is a simple XML file which illustrates the problem:
<?xml version="1.0" encoding="UTF-8" ?>
<test>
<ARABIC_LETTER_DAD_INITIAL_FORM>xxx</ARABIC_LETTER_DAD_INITIAL_FORM>
</test>
To see the problem, replace xxx with the string value for "EF BA BF".
_________________________________________
Tim Crook
Computer Scientist
Adobe Systems Canada Inc.
785 Carling Avenue
Ottawa, Ontario
Canada K1S 5H4
Phone: +1 613.751.4800 Ext 5734
Fax: +1 613.594.8886
E-mail: tim.crook@adobe.com