[Expat-bugs] [Bug #123767] UTF-8 BOM triggers a crash

noreply@sourceforge.net noreply@sourceforge.net
Tue, 28 Nov 2000 19:06:31 -0800


Bug #123767, was updated on 2000-Nov-28 19:06
Here is a current snapshot of the bug.

Project: Expat XML Parser
Category: None
Status: Open
Resolution: None
Bug Group: None
Priority: 5
Summary: UTF-8 BOM triggers a crash

Details: Files (at least the file I was working with) which start with a UTF-8 BOM (0xEF 0xBB 0xBF) 
can trigger a crash. 

My best guess is that the problem lies in xmltok.c, in initScan(). 
Specifically, it looks like the "case 0xEFBB:" section neglects to set "*nextTokPtr = ptr + 3;" 
which would be behavior consistent with the other XML_TOK_BOM cases in this function 
since this UTF-8 BOM is 3 bytes long.

I am working with an old version of Expat but I checked the latest version
of xmltok.c and this code has apparently not changed.

The proposed fix is listed below.

Bruce Kaskel
Adobe SVG Viewer Engineering Lead
Adobe Systems Incorporated


    case 0xEFBB:
      /* Maybe a UTF-8 BOM (EF BB BF) */
      /* If there's an explicitly specified (external) encoding
         of ISO-8859-1 or some flavour of UTF-16
         and this is an external text entity,
	 don't look for the BOM,
         because it might be a legal data. */
      if (state == XML_CONTENT_STATE) {
	int e = INIT_ENC_INDEX(enc);
	if (e == ISO_8859_1_ENC || e == UTF_16BE_ENC || e == UTF_16LE_ENC || e == UTF_16_ENC)
	  break;
      }
      if (ptr + 2 == end)
	return XML_TOK_PARTIAL;
      if ((unsigned char)ptr[2] == 0xBF) {
	*nextTokPtr = ptr + 3; // <<---------** PROPOSED FIX **
	*encPtr = encodingTable[UTF_8_ENC];
	return XML_TOK_BOM;
      }
      break;


For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=123767&group_id=10127