[Expat-bugs] [ expat-Bugs-2894085 ] expat: buffer over-read and crash in big2_toUtf8()

SourceForge.net noreply at sourceforge.net
Sun Nov 8 12:09:59 CET 2009


Bugs item #2894085, was opened at 2009-11-08 12:06
Message generated for change (Comment added) made by iankko
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=2894085&group_id=10127

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: XML::Parser (inactive)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: Yes
Submitted By: Jan Lieskovsky (iankko)
Assigned to: Nobody/Anonymous (nobody)
Summary: expat: buffer over-read and crash in big2_toUtf8() 

Initial Comment:
Hello SourceForge expat maintainers,

  originally CVE-2009-3720 was reported in expat:
  [1]  http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2009-3720

  Non-public, original bug report for CVE-2009-3720:
  [2] http://sourceforge.net/tracker/?func=detail&aid=1990430&group_id=10127&atid=110127

  And relevant patch for CVE-2009-3720:
  [3] http://expat.cvs.sourceforge.net/viewvc/expat/expat/lib/xmltok_impl.c?r1=1.13&r2=1.15&view=patch

While the above patch [3] solves the issue in expat itself and in various other packages
(PyXML, 4Suite), which embed expat, or when called via perl-XML-Parser-Expat, it does
 not help,when using the same reproducer via perl-XML-Twig module.  In this case the
 crash (buffer overread) occurs in expat's big2_toUtf8 () routine - more exactly in
 DEFINE_UTF16_TO_UTF8(big2_) macro in lib/xmltok.c:626.

 Have investigated the issue in more detail, and assuming the crash occurs in
 540 E ## toUtf8(const ENCODING *enc, \...) routine, as present in
 expat-2.0.1/lib/xmltok.c (at line 540). 

 Assuming the problematic line of the code is this one (lib/xmltok.c):

545   for (from = *fromP; from != fromLim; from += 2) { \

'from' represents pointer to the start of XML data, we are about to
 parse, 'fromLim' represents upper bound - point, where parsing
 should end. In each pass of the for loop we increment 'from'
 value by two (because on lines:

    548     unsigned char lo = GET_LO(from); \
    549     unsigned char hi = GET_HI(from); \

we consumed both parts of from). This works perfect,
when addresses of 'from' and 'fromLim' are aligned,
i.e. both are multiple of '2'. But the problem arises,
when 'fromLim' has not value dividable by two
(for example 165218551) - in that case, 'from' value
can't never equal to 'fromLim' value (in last round
== 'fromLim - 1', so we increment it by two, but
now we already 'skipped' it from == fromLim + 1,
and keep incrementing it (in the effort to reach
from == fromLim condition) in an infinite loop,
till the operating system recognizes we tried to
access memory location, which doesn't belong
to us and kills the process.















----------------------------------------------------------------------

>Comment By: Jan Lieskovsky (iankko)
Date: 2009-11-08 12:09

Message:
Here is my further issue analysis (some of the information might be
duplicate, but there is also additional one):

While running "perl XML-Parser-Expat.pl" reports error on fixed
CVE-2009-3720 expat packages, running "perl XML-Twig.pl" still crashes:

$ perl XML-Twig.pl 
Segmentation fault (core dumped)

gdb output:
...
Core was generated by `perl XML-Twig.pl'.
Program terminated with signal 11, Segmentation fault.
[New process 23957]
#0  0x009e9cb9 in big2_toUtf8 (enc=0xa00900, fromP=0xbffa17b0,
fromLim=0x8ceca2f "", toP=0xbffa179c, toLim=0x88115f4 "\201") at
lib/xmltok.c:634
634 DEFINE_UTF16_TO_UTF8(big2_)

The problem is present in expat-2.0.1/lib/xmltok.c in toUtf8() macro:

    538 #define DEFINE_UTF16_TO_UTF8(E) \
    539 static void  PTRCALL \
    540 E ## toUtf8(const ENCODING *enc, \
    541             const char **fromP, const char *fromLim, \
    542             char **toP, const char *toLim) \
    543 { \
    544   const char *from; \
    545   for (from = *fromP; from != fromLim; from += 2) { \
    546     int plane; \
    547     unsigned char lo2; \
    548     unsigned char lo = GET_LO(from); \
    549     unsigned char hi = GET_HI(from); \
    550     switch (hi) { \
    551     case 0: \
    552       if (lo < 0x80) { \
    553         if (*toP == toLim) { \
    554           *fromP = from; \
    555           return; \
    556         } \
    557         *(*toP)++ = lo; \
    558         break; \
    559       } \
    560       /* fall through */ \
    561     case 0x1: case 0x2: case 0x3: \
    562     case 0x4: case 0x5: case 0x6: case 0x7: \
    563       if (toLim -  *toP < 2) { \
    564         *fromP = from; \
    565         return; \
    566       } \
    567       *(*toP)++ = ((lo >> 6) | (hi << 2) |  UTF8_cval2); \
    568       *(*toP)++ = ((lo & 0x3f) | 0x80); \
    569       break; \
    570     default: \
    571       if (toLim -  *toP < 3)  { \
    572         *fromP = from; \
    573         return; \
    574       } \

"from" should point to start of the data and "fromLim" represents upper
bound till above for cycle should loop. In each pass of the for loop,
we increment the "from" value by 2 because we have already eaten its
both parts:

    548     unsigned char lo = GET_LO(from); \
    549     unsigned char hi = GET_HI(from); \

and can move further. But the problem arises, when the address of
"fromLim"
is not aligned with the address of "from", i.e. it's not multiple of two.
In that case (assume from == fromLim -1) we will increment from value 
(because it != fromLim) but cross the limit value for the "fromLim" and
end up in an infinite loop till the OS recognizes buffer over read and
kills the process.

Running "perl XML-Twig.pl" demonstrates this issue.

Patched expat-2.0.1 to be more verbose which branch the code went
through,
and after finding out that by processing "pythontest1.xml" it loops in
"case 0:" for "hi", added functions to print out the values of "from" and
"fromLim" variables. Here is the output:

fromLim (end) has value = 165218551
from has value = 165218548
Went by default branch
fromLim (end) has value = 165218551
from has value = 165218552
fromLim (end) has value = 165218551
from has value = 165218554
...
from has value = 165416942
fromLim (end) has value = 165218551
from has value = 165416944
seg fault

So at startup from < fromLim, we increment from with 2, so the distance
is < 3 -> we go to "default:" break part ("Went by the default branch"),
detect "from" still isn't equal to "fromLim" and increment "from" value
again by two. From now we end up in endless loop, killed by OS.

Further note:
-------------
When you add one more characted (even space) into 'pythontest1.xml',
save it and try to process it again - syntax error by processing XML
file is reported:

$ perl XML-Twig.pl 

syntax error at line 1, column 1, byte 2 at
/usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi/XML/Parser.pm
line
187
 at XML-Twig.pl line 4
 at XML-Twig.pl line 4  

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=2894085&group_id=10127


More information about the Expat-bugs mailing list