[Expat-bugs] [Bug #123767] UTF-8 BOM triggers a crash
noreply@sourceforge.net
noreply@sourceforge.net
Tue, 28 Nov 2000 19:06:31 -0800
Bug #123767, was updated on 2000-Nov-28 19:06
Here is a current snapshot of the bug.
Project: Expat XML Parser
Category: None
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Summary: UTF-8 BOM triggers a crash
Details: Files (at least the file I was working with) which start with a UTF-8 BOM (0xEF 0xBB 0xBF)
can trigger a crash.
My best guess is that the problem lies in xmltok.c, in initScan().
Specifically, it looks like the "case 0xEFBB:" section neglects to set "*nextTokPtr = ptr + 3;"
which would be behavior consistent with the other XML_TOK_BOM cases in this function
since this UTF-8 BOM is 3 bytes long.
I am working with an old version of Expat but I checked the latest version
of xmltok.c and this code has apparently not changed.
The proposed fix is listed below.
Bruce Kaskel
Adobe SVG Viewer Engineering Lead
Adobe Systems Incorporated
case 0xEFBB:
/* Maybe a UTF-8 BOM (EF BB BF) */
/* If there's an explicitly specified (external) encoding
of ISO-8859-1 or some flavour of UTF-16
and this is an external text entity,
don't look for the BOM,
because it might be a legal data. */
if (state == XML_CONTENT_STATE) {
int e = INIT_ENC_INDEX(enc);
if (e == ISO_8859_1_ENC || e == UTF_16BE_ENC || e == UTF_16LE_ENC || e == UTF_16_ENC)
break;
}
if (ptr + 2 == end)
return XML_TOK_PARTIAL;
if ((unsigned char)ptr[2] == 0xBF) {
*nextTokPtr = ptr + 3; // <<---------** PROPOSED FIX **
*encPtr = encodingTable[UTF_8_ENC];
return XML_TOK_BOM;
}
break;
For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=123767&group_id=10127