[Expat-bugs] [ expat-Bugs-1990430 ] Parser crash with specially formatted UTF-8 sequences

SourceForge.net noreply at sourceforge.net
Sat Jan 17 17:15:44 CET 2009


Bugs item #1990430, was opened at 2008-06-11 00:45
Message generated for change (Comment added) made by kwaclaw
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1990430&group_id=10127

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: www.libexpat.org
>Group: Test Required
Status: Open
Resolution: Fixed
Priority: 5
Private: Yes
Submitted By: Peter Valchev (petervalchev)
Assigned to: Karl Waclawek (kwaclaw)
Summary: Parser crash with specially formatted UTF-8 sequences

Initial Comment:
I have discovered a way to crash libexpat's xml parser with certain specially formatted UTF-8 sequences. All applications that link w/ expat and use it to render user-provided XML files are affected. As far as I see, the issue is not exploitable, just denial of service.

This is the patch that I have come up with, also attached to this email:

+++ lib/xmltok_impl.c 2007-12-21 11:11:42.054417000 -0800
@@ -1745,6 +1745,9 @@
 switch (BYTE_TYPE(enc, ptr)) {
 #define LEAD_CASE(n) \
 case BT_LEAD ## n: \
+ if (end - ptr < n) { \
+   return; \
+ } \
 ptr += n; \
 break;
 LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)

The parser's updatePosition function which keeps track of the current position pointer increments the ptr by {2, 3, 4} to skip past multibyte character ombinations, and this causes ptr in the "while (ptr != end)" loop to jump past the terminating condition, causing the loop to continue reading past 'end' and into out of bounds memory until a crash.

In general this parser does not appear the most robust and could be the source of some security issues.

A fault file is attached. To reproduce, compile examples/outline.c and run against it. This patch may not be 100% complete...

Contact:
Peter Valchev <pvalchev at google.com>

----------------------------------------------------------------------

>Comment By: Karl Waclawek (kwaclaw)
Date: 2009-01-17 11:15

Message:
Has anyone tested current CVS for this issue's fix?

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2008-06-13 13:48

Message:
Logged In: YES 
user_id=290026
Originator: NO

I know what you mean and I had the exact same thoughts. I think at least
for older CPUs doing a "<" comparison might have been slower than doing a
comparison for equality, so I am not sure how that would work out now.
Also, we have to consider that this could modify the logic, with a slight
risk of introducing an error.

Anyway, I think there are other parts of the code (like the encoding
checking) that ensures that pathological situations like the one you found
don't happen. It is also important to observe that the crash seems to
happen only *after* the parser returned with an error, and the code gets
called again to get the error location while the parser is already in an
error state.

Don't forget, Expat is by default installed on most Linux distributions
and it is used by Firefox/Mozilla, so Expat has been given a real good
workout with millions of installations. The bugs you reported are the first
bug reports we received in a long time (except for platform specific build
issues).

Nevertheless, please keep digging, we really want Expat to be resistant to
pathological input.

Thanks for the efforts,

Karl

----------------------------------------------------------------------

Comment By: Peter Valchev (petervalchev)
Date: 2008-06-13 12:41

Message:
Logged In: YES 
user_id=2114255
Originator: YES

Yes, I originally had the same patch (while ptr < end) but chose the
latter to keep in line with the rest of the file, sigh.

Now the next question is, there are many while (ptr != end) loops still
left in that file if you search.. some of the look like they could have the
same problem. Should all of them be changed, to be on the paranoid side?
The original construct is just evil and waiting for bugs like this to
happen...

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2008-06-13 09:24

Message:
Logged In: YES 
user_id=290026
Originator: NO

I applied the original fix I had in mind, which I didn't use because I
thought yours might perform better. It is quite simple: replace "while (ptr
!= end)" with "while (ptr < end)".
Seems to work so far. Fixed in xmltok_impl.c rev. 1.15.

Karl

----------------------------------------------------------------------

Comment By: Peter Valchev (petervalchev)
Date: 2008-06-13 01:15

Message:
Logged In: YES 
user_id=2114255
Originator: YES

Ugh the padding I inserted broke the example. New attached tarball with
expat-fault3.xml crashes both outline and xmlwf

(gdb) r < /tmp/expat-fault3.
Starting program: /usr/src/lib/libexpat/examples/outline <
/tmp/expat-fault3.
/bin/ksh: cannot open /tmp/expat-fault3.: No such file or directory

Program exited with code 01.
You can't do that without a process to debug.
(gdb) r < /tmp/expat-fault3.xml
Starting program: /usr/src/lib/libexpat/examples/outline <
/tmp/expat-fault3.xml

Program received signal SIGSEGV, Segmentation fault.
little2_updatePosition (enc=0x2676dec0, ptr=0x7f4ba000 <Address 0x7f4ba000
out of bounds>, 
    end=0x7f4b9003 "", pos=0x7de87198) at xmltok_impl.c:1748
1748        switch (BYTE_TYPE(enc, ptr)) {
(gdb) bt
#0  little2_updatePosition (enc=0x2676dec0, ptr=0x7f4ba000 <Address
0x7f4ba000 out of bounds>, 
    end=0x7f4b9003 "", pos=0x7de87198) at xmltok_impl.c:1748
#1  0x0676b8a3 in XML_GetCurrentLineNumber (parser=0x7de87000)
    at /usr/src/lib/libexpat/lib/xmlparse.c:1799
#2  0x1c000a24 in main ()

File Added: expat-fault3.tar.gz

----------------------------------------------------------------------

Comment By: Peter Valchev (petervalchev)
Date: 2008-06-13 01:14

Message:
Logged In: YES 
user_id=2114255
Originator: YES

Ugh the padding I inserted broke the example. New attached tarball with
expat-fault3.xml crashes both outline and xmlwf

(gdb) r < /tmp/expat-fault3.
Starting program: /usr/src/lib/libexpat/examples/outline <
/tmp/expat-fault3.
/bin/ksh: cannot open /tmp/expat-fault3.: No such file or directory

Program exited with code 01.
You can't do that without a process to debug.
(gdb) r < /tmp/expat-fault3.xml
Starting program: /usr/src/lib/libexpat/examples/outline <
/tmp/expat-fault3.xml

Program received signal SIGSEGV, Segmentation fault.
little2_updatePosition (enc=0x2676dec0, ptr=0x7f4ba000 <Address 0x7f4ba000
out of bounds>, 
    end=0x7f4b9003 "", pos=0x7de87198) at xmltok_impl.c:1748
1748        switch (BYTE_TYPE(enc, ptr)) {
(gdb) bt
#0  little2_updatePosition (enc=0x2676dec0, ptr=0x7f4ba000 <Address
0x7f4ba000 out of bounds>, 
    end=0x7f4b9003 "", pos=0x7de87198) at xmltok_impl.c:1748
#1  0x0676b8a3 in XML_GetCurrentLineNumber (parser=0x7de87000)
    at /usr/src/lib/libexpat/lib/xmlparse.c:1799
#2  0x1c000a24 in main ()

File Added: expat-fault3.tar.gz

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2008-06-12 22:40

Message:
Logged In: YES 
user_id=290026
Originator: NO

What is supposed to happen with the second test case?
It reports an error, but does not crash with xmlwf or outline.
IE reports the same error, so that is probably OK.

----------------------------------------------------------------------

Comment By: Peter Valchev (petervalchev)
Date: 2008-06-12 18:08

Message:
Logged In: YES 
user_id=2114255
Originator: YES

Thanks.

Actually rechecking in this area again, I think I found another issue,
which seems to not be covered with the patch I provided :( I am attaching
the new test case.

I haven't found many other issues at this point... I stumbled upon this
one fairly quickly and my above observations were more general than
anything. If I do find more I'll be sure to tell you.

File Added: expat-fault2.xml

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2008-06-11 10:46

Message:
Logged In: YES 
user_id=290026
Originator: NO

Can reproduce. The problem is that this code can be called *after* an
error has been found (to report line and column number). Therefore it
should not rely on correct byte counts for multibyte characters.

Patch applied in xmltok_impl.c rev. 1.14.

Would you please also report all the other issues you have found?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1990430&group_id=10127


More information about the Expat-bugs mailing list