[Expat-bugs] [ expat-Bugs-1515266 ] missing check of stopped parser in doContent() 'for' loop

SourceForge.net noreply at sourceforge.net
Sun May 25 01:50:42 CEST 2008


Bugs item #1515266, was opened at 2006-06-30 19:04
Message generated for change (Comment added) made by dleverton
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1515266&group_id=10127

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: Test Required
Status: Open
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Brett Cannon (bcannon)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: missing check of stopped parser in doContent() 'for' loop

Initial Comment:
In Expat 2.0.0, in expat.c:doConvert() there is a 'for'
loop for the XML_TOK_DATA_CHARS case.  There is
unfortunately no check in that loop whether the parser
was stopped during that call because of an error.

This was discovered in Python
(Lib/test/crashers/xml_parsers.py) because pyexpat,
upon error where there is no error return code like
with characterDataHandlers, sets all handlers to 0,
sets parsingStatus to XML_FINISHED, and sets errorCode.
 This leads to a segfault if the 'for' loop goes around
again because parser->m_characterDataHandler is set to 0.

A simple check if the parser is stopped fixes the
problem.  I have attached a simple patch that just
breaks out of the loop and lets execution fall through
to the bottom of the 'switch' statement.  I don't know
if returning errorCode directly would be better or if
checking for XML_SUSPENDED is also desirable.

----------------------------------------------------------------------

Comment By: David Leverton (dleverton)
Date: 2008-05-25 00:50

Message:
Logged In: YES 
user_id=2011415
Originator: NO

> Python compatibility still needs testing.

This does indeed break Python: when it clears the handler, it sets the
variable containing the Python-side handler to NULL, and then assumes that
said variable is non-null when expat calls the old handler.

(The vanilla Python distribution uses a bundled copy of expat from before
this change was made, which is probably why no-one's complained until now,
but some Linux distros like to use the system copy instead of the bundled
one.)

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2006-11-25 17:41

Message:
Logged In: YES 
user_id=290026
Originator: NO

Is anyone testing CVS for this solution?

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2006-07-10 20:02

Message:
Logged In: YES 
user_id=290026

Applied the patch preserving default handler failover.
See xmlparse.c rev. 1.158. Docs updated as well.
Python compatibility still needs testing.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2006-07-06 18:19

Message:
Logged In: YES 
user_id=290026

I am attaching a patch to current CVS that preserves the
default handler failover logic by saving the character data
handler to a local variable instead of moving the NULL check
into the inner for loop (file "localCharDataHandlerPatch.diff").

The drawback: Even if the handler is cleared, it will be
called back on as long as the inner for loop is active.

Could be a problem for Python, if it cannot deal with a few
more call-backs despite clearing the handlers.

----------------------------------------------------------------------

Comment By: Brett Cannon (bcannon)
Date: 2006-07-06 18:03

Message:
Logged In: YES 
user_id=357491

Yes, I'm listening, Fred.  =)

If you look at PEP 356
(http://www.python.org/dev/peps/pep-0356/) it seems like b2
is due on July 12 and rc1 August 1.  So there is still time
to get whatever change/fix needed to Python's wrapper before
we hit final release.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2006-07-06 13:56

Message:
Logged In: YES 
user_id=290026

One way to preserver the old default handler logic would be
this:

Revert back to the original code, but save the character
data handler into a local variable for the duration of the
inner for loop. This would prevent the segfault, but would
enforce the call-backs in the loop to go on until the loop
terminates, even if the character data handler was cleared.

I personally like this solution, but the question is how
Python could handle it if there were more call-backs even
after the handlers were cleared.

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2006-07-06 06:15

Message:
Logged In: YES 
user_id=3066

Python (on the trunk) is no longer quite as sensitive to the
Expat implementation for this, so that's not a source of
time pressure to come up with the final fix for this.

Reducing priority back to "Medium"

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2006-07-06 04:23

Message:
Logged In: YES 
user_id=3066

The tests now pass, but agree that the lack of falling back
to the default handler is undesirable.  As noted, I'm not
sure how much we want to worry about this in code, though,
rather than through documentation.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2006-07-06 03:55

Message:
Logged In: YES 
user_id=290026

The bug is quite obvious when you look at it.
When the character data handler is cleared, the for loop
will do nothing forever. Please check again with xmlparse.c
rev. 1.157. 

However this quick fix is not quite satisfying. There is one
piece of logic that becomes ill-fitted now: the "fail-over"
to the default handler does not work as expected anymore, so
I'll have some more thinking to do.


----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2006-07-06 03:38

Message:
Logged In: YES 
user_id=3066

The tests no longer complete, but take up all the CPU the
system will let them have.  Hallmarks of an infinite loop,
if you ask me.  :-)

Are you able to run the tests on Windows?  I don't know if a
MSVC++ target was ever built for them, and don't have access
to a Windows development machine most of the time.

One thing that can be done is to document that the character
data handler can't be removed (though it can be replaced),
during parsing, except from some non-character data (and
non-decoding-related) handler.  Then the Python bindings can
use an alternate approach, replacing the character handler
with a completely no-op handler until it can be safely
removed completely.

Brett, are you still paying attention?  I can make the
needed changes to the Python bindings to isolate those from
some of the changes in Expat, hopefully no later than
sometime this weekend.  Not sure what the release schedule
is, though.

Karl, I'm generally inclined to make Expat as safe from
segfaults as possible, so I'd like things to "just work" in
even some of the oddball scenarios that exception-handling
wrappers built to support scripting languages might present,
though I don't object to making them go through a bit of
extra work.  I know our main audience is very
performance-sensitive, so I don't want to pay too high a
cost on that front.  It might be worth taking the discussion
of alternatives to the mailing list, but I vaguely recall
that we've done that before.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2006-07-05 14:26

Message:
Logged In: YES 
user_id=290026

Corrected Summary.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2006-07-05 14:14

Message:
Logged In: YES 
user_id=290026

Applied the patch for bug # 1515600 which solves this issue
as well. Removed the check for XML_FINISHED/XML_SUSPENDED.
We could discuss special treatment of XML_FINISHED, but if
one is clearing all handlers anyway, then special treatment
of XML_FINISHED is not necessary.

For Fred: I have not re-run the test cases. Please do so and
close the issue if successful.



----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2006-07-04 14:37

Message:
Logged In: YES 
user_id=290026

I am re-opening this issue because in the case of a
suspended parser, breaking out of the inner loop in
XML_TOK_DATA_CHARS means that character call-backs are
missed when resuming the parser. We should let the inner
loop finish reporting all characters.

The documentation already states that after calling
XML_StopParser() there may still be a few call-backs that
would otherwise be missed, so this would not be new
behaviour, but consistent with existing behaviour.

The solution to the problem described is the same as
suggested for bug # 1515600 (Segfault after removing
character data handler). Just put the NULL check for the
character data handler inside the internal loop.

Btw, the same problem exists in the doCdataSection()
function. I'll attach a patch suggestion to bug # 1515600.

We might decide to treat XML_FINISHED different from
XML_SUSPENDED such that no other call-backs will happen, but
in that case we need to review all the other places where
this would need to be done as well (and update the
documentation, of course).

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2006-07-01 16:32

Message:
Logged In: YES 
user_id=3066

Confirmed that the suspend behavior parallels the abort
behavior Brett's patch fixed; fixed and added a regression
test in lib/xmlparse.c 1.155 and tests/runtests.c 1.66.

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2006-07-01 16:02

Message:
Logged In: YES 
user_id=3066

Added a regression test in tests/runtests.c revision 1.65.

Closing this report.

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2006-07-01 05:00

Message:
Logged In: YES 
user_id=3066

That seems fine, but can be done faster within the Expat
implementation.  I've committed the simplified patch as
lib/xmlparse.c revision 1.154.

I'll have a test case committed tomorrow as well.  Leaving
this report open for now since I need to finish up the test
case.

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2006-06-30 19:40

Message:
Logged In: YES 
user_id=3066

The Python folks need this dealt with before Python 2.5, so 
I'll try and take a look at it this weekend if no one beats 
me to it.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1515266&group_id=10127


More information about the Expat-bugs mailing list