[Expat-bugs] [ expat-Bugs-1515600 ] Segfault after removing character data handler

SourceForge.net noreply at sourceforge.net
Mon Jul 10 21:03:33 CEST 2006


Bugs item #1515600, was opened at 2006-07-01 13:21
Message generated for change (Comment added) made by kwaclaw
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1515600&group_id=10127

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: Fixed
Priority: 5
Submitted By: Fred L. Drake, Jr. (fdrake)
Assigned to: Nobody/Anonymous (nobody)
Summary: Segfault after removing character data handler

Initial Comment:
Removing the character data handler from within the
character data handler while character data remains to
be reported causes a call to a NULL pointer (generally
followed by a memory access violation of your
platform's favorite flavor).

If the XML_StopParser() API has been called, this is
not a problem with the version in CVS.

This is admittedly an odd use case.  The recent fixes
to make the XML_StopParser() calls supported makes the
parser behave well when accessed from languages that
support exceptions (the host language API can call
XML_StopParser to abort further work from Expat when an
exception occurs).  The case of a character data
handler removing itself is unusual (in context, there
can be no calls to anything else other than a decoding
handler).

I think there are two possible solutions:

1) Document that the character data handler cannot
remove itself without calling XML_StopParser().  This
avoids introducing a performance penalty for really
this really odd case, but I don't know how bad testing
for a NULL value would really be at this point, since
there are a few other checks and an indirect assignment.

2) Add a check that the character data handler is still
set before the loop goes around again, and fall back to
the defaultHandler for the remaining data.  This would
introduce a single check for a NULL pointer in the loop
in the XML_TOK_DATA_CHARS case in doContent().

I've attached a patch with a test case that
demonstrates this bug; the test generates a segfault on
Unix.

----------------------------------------------------------------------

>Comment By: Karl Waclawek (kwaclaw)
Date: 2006-07-10 15:03

Message:
Logged In: YES 
user_id=290026

Applied an improved patch that preserves default handler
failover logic. See xmlparse.c rev. 1.158. Docs updated as
well. Python compatibility still needs testing.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2006-07-05 09:09

Message:
Logged In: YES 
user_id=290026

Applied patch in xmlparse.c rev. 1.156 and reference.html
rev. 1.71. Please let nme know if we should discuss special
treatment of aborting vs. suspending.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2006-07-04 13:42

Message:
Logged In: YES 
user_id=290026

I replaced my last attachment with one that includes an
update to the docs (reference.html). This solution should
fix issue # 1515266 as well. I intend to commit this soon,
if no objections are made.

Note to Fred: I took out your test for XML_FINISHED and
XML_SUSPENDED, as it currently introduces an issue for
XML_SUSPENDED, and inconsistent behaviour for XML_FINISHED.

We can discuss special treatment of aborting vs. suspending
(i.e. ensure no more call-backs when aborting) later, but
even as it is, subsequent call-backs can be suppressed by
setting the affected handlers to NULL.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2006-07-04 09:55

Message:
Logged In: YES 
user_id=290026

The same issue also exists in the doCdataSection() function,
and I think the solution I suggested (putting the check if
the character data handler is set into the internal loop)
also solves bug # 1515266, as I described there.

For the case where there is only one call-back, this should
not be a performance penalty at all, as there still would be
only one check if the handler is set.

Attached as xmlparse.c.diff (Internal loop solution) - this
also fixes the doCdataSection() function.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2006-07-03 16:40

Message:
Logged In: YES 
user_id=290026

I think in most cases this is not a problem. The general
parsing loop in doContent() always checks if the
characterDataHandler is set first. In the specific case you
mentioned, there is a loop within the general loop, and in
that internal loop there is no check for NULL. We could, for
instance, pull the NULL check inside the loop, like your 2nd
case, and the result would look like this:

    case XML_TOK_DATA_CHARS:
      if (MUST_CONVERT(enc, s)) {
        for (;;) {
          if (characterDataHandler) {
            ICHAR *dataPtr = (ICHAR *)dataBuf;
            XmlConvert(enc, &s, next, &dataPtr, (ICHAR
*)dataBufEnd);
            *eventEndPP = s;
            characterDataHandler(handlerArg, dataBuf,
                                 (int)(dataPtr - (ICHAR
*)dataBuf));
            if (s == next)
              break;
            *eventPP = s;
          }
        }
      }
      else if (characterDataHandler) {
        characterDataHandler(handlerArg,
                             (XML_Char *)s,
                             (int)((XML_Char *)next -
(XML_Char *)s));
      }
      else if (defaultHandler)
        reportDefault(parser, enc, s, next);
      break;

I am not sure if the performance penalty is that high.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1515600&group_id=10127


More information about the Expat-bugs mailing list