[Expat-bugs] [ expat-Bugs-2894085 ] expat: buffer over-read and crash in big2_toUtf8()

SourceForge.net noreply at sourceforge.net
Sun Nov 8 12:06:53 CET 2009

Bugs item #2894085, was opened at 2009-11-08 12:06
Message generated for change (Tracker Item Submitted) made by iankko
You can respond by visiting: 

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: XML::Parser (inactive)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: Yes
Submitted By: Jan Lieskovsky (iankko)
Assigned to: Nobody/Anonymous (nobody)
Summary: expat: buffer over-read and crash in big2_toUtf8() 

Initial Comment:
Hello SourceForge expat maintainers,

  originally CVE-2009-3720 was reported in expat:
  [1]  http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2009-3720

  Non-public, original bug report for CVE-2009-3720:
  [2] http://sourceforge.net/tracker/?func=detail&aid=1990430&group_id=10127&atid=110127

  And relevant patch for CVE-2009-3720:
  [3] http://expat.cvs.sourceforge.net/viewvc/expat/expat/lib/xmltok_impl.c?r1=1.13&r2=1.15&view=patch

While the above patch [3] solves the issue in expat itself and in various other packages
(PyXML, 4Suite), which embed expat, or when called via perl-XML-Parser-Expat, it does
 not help,when using the same reproducer via perl-XML-Twig module.  In this case the
 crash (buffer overread) occurs in expat's big2_toUtf8 () routine - more exactly in
 DEFINE_UTF16_TO_UTF8(big2_) macro in lib/xmltok.c:626.

 Have investigated the issue in more detail, and assuming the crash occurs in
 540 E ## toUtf8(const ENCODING *enc, \...) routine, as present in
 expat-2.0.1/lib/xmltok.c (at line 540). 

 Assuming the problematic line of the code is this one (lib/xmltok.c):

545   for (from = *fromP; from != fromLim; from += 2) { \

'from' represents pointer to the start of XML data, we are about to
 parse, 'fromLim' represents upper bound - point, where parsing
 should end. In each pass of the for loop we increment 'from'
 value by two (because on lines:

    548     unsigned char lo = GET_LO(from); \
    549     unsigned char hi = GET_HI(from); \

we consumed both parts of from). This works perfect,
when addresses of 'from' and 'fromLim' are aligned,
i.e. both are multiple of '2'. But the problem arises,
when 'fromLim' has not value dividable by two
(for example 165218551) - in that case, 'from' value
can't never equal to 'fromLim' value (in last round
== 'fromLim - 1', so we increment it by two, but
now we already 'skipped' it from == fromLim + 1,
and keep incrementing it (in the effort to reach
from == fromLim condition) in an infinite loop,
till the operating system recognizes we tried to
access memory location, which doesn't belong
to us and kills the process.


You can respond by visiting: 

More information about the Expat-bugs mailing list