[issue9054] pyexpat configured with "--with-system-expat" is incompatible with expat 2.0.1

Dave Malcolm report at bugs.python.org
Tue Jun 22 00:12:50 CEST 2010


New submission from Dave Malcolm <dmalcolm at redhat.com>:

pyexpat configured with "--with-system-expat" segfaults in one selftest when built against expat 2.0.1

SVN trunk:
$ ./configure --with-system-expat
(with expat-2.0.1)
$ make
$ ./python Lib/test/test_pyexpat.py
[snip]
test_parse_only_xml_data (__main__.sf1296433Test) ... Segmentation fault (core dumped)

The issue occurs when an exception happens within a character handler.

When a Python exception is raised inside pyexpat.c:call_character_handler(), it invokes pyexpat.c:flag_error() which clears the pyexpat object's handlers, and it then sets the expat structure's handler to a no-op handler:
        XML_SetCharacterDataHandler(self->itself,
                                    noop_character_data_handler);

The reason why appears to be that python's copy of that code in xmlparse.c:doContent() looks like this:
            characterDataHandler(handlerArg, dataBuf,
                                 (int)(dataPtr - (ICHAR *)dataBuf));

(see
http://svn.python.org/view/python/trunk/Modules/expat/xmlparse.c?annotate=77680&pathrev=77680#l2544
)

i.e. using:
  #define characterDataHandler (parser->m_characterDataHandler)
to look up the handler each time, thus picking up on the newly installed noop handler, whereas the corresponding code in expat 2.0.1 looks like this:

        XML_CharacterDataHandler charDataHandler = characterDataHandler;
        if (charDataHandler) {
            ...
            for (;;) {
              ...
=>            charDataHandler(handlerArg, dataBuf,
                              (int)(dataPtr - (ICHAR *)dataBuf));
              ...
            }
          }

i.e. keeping the old function pointer cached

(gdb) p self->itself->m_characterDataHandler 
$30 = (XML_CharacterDataHandler) 0xf8b370 <noop_character_data_handler>

(gdb) p charDataHandler
$31 = (XML_CharacterDataHandler) 0xf8daf0 <my_CharacterDataHandler>

So when the exception is raised and the handler is changed, the local in the loop doesn't change, and we're calling the old handler which doesn't allow for being in the error-recovery state, and self->handlers[3] is NULL, leading to an eventual segfault.

This is with expat-2.0.1-10.fc13.i686 on a Fedora 13 box.

expat's xmlparse.c's caching of the character handler appears to be this
change:
http://expat.cvs.sourceforge.net/viewvc/expat/expat/lib/xmlparse.c?r1=1.157&r2=1.158

> Revision 1.158 - (view) (download) (annotate) - [select for diffs] 
> Mon Jul 10 18:59:52 2006 UTC (3 years, 10 months ago) by kwaclaw 
> Branch: MAIN 
> Changes since 1.157: +39 -37 lines 
> Diff to previous 1.157
> Improved fix for issues # 1515266 and # 1515600. Will now preserve the
> "failover to default handler" logic. Note: clearing the character data handler
> does not take effect immediately anymore.

which is almost 4 years old (2006-07-10)

For reference the issues referred to above are:
http://sourceforge.net/tracker/?func=detail&aid=1515266&group_id=10127&atid=110127
http://sourceforge.net/tracker/?func=detail&aid=1515600&group_id=10127&atid=110127

expat CVS R_2_0_0 has xmlparse.c r1.151 from 2005-12-23
expat CVS R_2_0_1 has xmlparse.c r1.161 from 2007-05-08

So it appears that this "clearing the character data handler does not take
effect immediately anymore." is in expat 2.0.1 onwards, but not in 2.0.0

Running:
$ diff -rup ~/coding/python-svn/trunk-clean/Modules/expat
/usr/src/debug/expat-2.0.1/lib/|less
to show the difference between system expat and latest python SVN code
indicates that we're using 2.0.1 as the system copy, and latest Python SVN code
is using a copy of 2.0.0

Looking at http://bugs.python.org/issue1296433 (as per the test name), this was fixed in r47191 (see http://svn.python.org/view?view=rev&revision=47191 ) on 2006-07-01, adding a termination clause to the loop within its copy of xmlparse.c but this was followed five days later (2006-07-06) with:
http://svn.python.org/view?view=rev&revision=47253
which had the log:
> - back out Expat change; the final fix to Expat will be different
> - change the pyexpat wrapper to not be so sensitive to this detail of the
>  Expat implementation (the ex-crasher test still passes)

though this change happened before the change in expat on 2006-07-10

So, _if_ I'm reading this right, it appears that Python's pyexpat.c module is incompatible with expat 2.0.1: pyexpat expects that when it sets:
    XML_SetCharacterDataHandler(self->itself,
                                noop_character_data_handler);
that my_CharacterDataHandler will no longer be called, but it is, and it segfaults during the test.

Working around it appears trivial; I will attach a patch.

----------
components: XML
messages: 108323
nosy: dmalcolm
priority: normal
severity: normal
status: open
title: pyexpat configured with "--with-system-expat" is incompatible with expat 2.0.1

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9054>
_______________________________________


More information about the Python-bugs-list mailing list