[ expat-Bugs-434664 ] utf8_toutf16 infinite loop

noreply@sourceforge.net noreply@sourceforge.net
Sun Jul 28 18:07:02 2002


Bugs item #434664, was opened at 2001-06-19 22:44
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=434664&group_id=10127

Category: None
Group: None
>Status: Closed
Resolution: None
Priority: 4
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: utf8_toutf16 infinite loop

Initial Comment:
I believe this is a low priority bug since I can't see 
how it could ever get tickled calling only the 
interface in expat.h, but I came across it while 
calling into xmltok directly. If the input buffer 
contains a BT_LEAD4 character with only space left for 
one output character, the routine goes into an 
infinite loop. I fixed it with the following goto --- 
your tolerance of goto may vary.

Michael Isard.

[ I am waiting for sourceforge to fix their new 
account procedure which seems to have been broken for 
some days --- in the meantime you can mail me directly 
at michael.isard@compaq.com ]

--------

from expat/xmltok/xmltok.c

static
void utf8_toUtf16(const ENCODING *enc,
		  const char **fromP, const char 
*fromLim,
		  unsigned short **toP, const unsigned 
short *toLim)
{
  unsigned short *to = *toP;
  const char *from = *fromP;
  while (from != fromLim && to != toLim) {
    switch (((struct normal_encoding *)enc)->type
[(unsigned char)*from]) {
    case BT_LEAD2:
      *to++ = ((from[0] & 0x1f) << 6) | (from[1] & 
0x3f);
      from += 2;
      break;
    case BT_LEAD3:
      *to++ = ((from[0] & 0xf) << 12) | ((from[1] & 
0x3f) << 6) | (from[2] & 0x3f);
      from += 3;
      break;
    case BT_LEAD4:
      {
	unsigned long n;
	if (to + 1 == toLim)
	  goto after; /* BUGBUG this used to say 
break; which keeps
                               looping */

	n = ((from[0] & 0x7) << 18) | ((from[1] & 
0x3f) << 12) | ((from[2] & 0x3f) << 6) | (from[3] & 
0x3f);
	n -= 0x10000;
	to[0] = (unsigned short)((n >> 10) | 0xD800);
	to[1] = (unsigned short)((n & 0x3FF) | 0xDC00);
	to += 2;
	from += 4;
      }
      break;
    default:
      *to++ = *from++;
      break;
    }
  }
 after: /* BUGBUG new jump target added to escape from 
loop */

  *fromP = from;
  *toP = to;
}


----------------------------------------------------------------------

>Comment By: Karl Waclawek (kwaclaw)
Date: 2002-07-28 21:06

Message:
Logged In: YES 
user_id=290026

Finally applied the patch.

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-08-09 14:20

Message:
Logged In: YES 
user_id=3066

This is not XML::Parser specific, so I'm changing the
categorization.  Forcing SF to send me a copy of the
properly-indented code is a nice side-effect.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=434664&group_id=10127