[XML-SIG] Nastiness in xml/extensions/pyexpat.c

Uche Ogbuji uche.ogbuji@fourthought.com
Sat, 08 Jun 2002 09:40:35 -0600


The following simple program dumps core currently.


from xml.sax import *
from xml.sax.handler import feature_namespaces
from xml.dom.ext.reader.Sax2 import XmlDomGenerator
from Ft.Xml import cDomlette

handler = XmlDomGenerator(implementation=cDomlette.implementation)

parser = make_parser()
parser.setFeature(feature_namespaces, 1)
parser.setContentHandler(handler)
parser.parse("test.xml")

doc = handler.getRootNode()

Uncommenting the line "parser.setFeature(feature_namespaces, 1)" stops it 
doing so.

In tracing the through xml/extensions/pyexpat.c (and a tremendous fag that 
was), I found an instance of sheer nastiness:

static void
flag_error(xmlparseobject *self)
{
    clear_handlers(self, 0);
}

This function doesn't really do anything to "flag" an error.  It merely clears the handlers.  The problem is that in most cases the code just continues on, and then it dumps core the next time it comes to invoking a handler (jump to address 0) since all handler pointers are now NULL.

How we handle this in cDomlette is to have a full-blown state machine.  We leave handlers untouched on error, but we have them all do nothing on error state.  This would probably be overkill for pyexpat.so, but we should at least set an actual flag on the parser instance that can be used to short-cut execution on error, and move the clear_handlers() invokation to a safer spot.

It turns out that the direct cause of the bug is that I didn't call initState() on the XmlDomGenerator instance, which eventually caused an AttributeError exception in the startPrefixMapping callback, which caused a null return to the low-level StartNamespaceDeclHandler in pyexpat.c, which invoked a flag_error() which cleared the handlers without stopping the parse, which caused the NULL address violation when the subsequent StartElementHandler was invoked.

The following code works as expected:

from xml.sax import *
from xml.sax.handler import feature_namespaces
from xml.dom.ext.reader.Sax2 import XmlDomGenerator
from Ft.Xml import cDomlette

handler = XmlDomGenerator(implementation=cDomlette.implementation)
handler.initState()

parser = make_parser()
parser.setFeature(feature_namespaces, 1)
parser.setContentHandler(handler)
parser.parse("test.xml")

doc = handler.getRootNode()


So the question is, how do we make the handling of the startPrefixMapping exception less confusing and drastic.  I have ideas, but I only dimly understood pyexpat.c, and these would invariably be hacks.


-- 
Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
Track chair, XML/Web Services One (San Jose, Boston): http://www.xmlconference.com/
DAML Reference - http://www.xml.com/pub/a/2002/05/01/damlref.html
The Languages of the Semantic Web - http://www.newarchitectmag.com/documents/s=2453/new1020218556549/index.html
XML, The Model Driven Architecture, and RDF @ XML Europe - http://www.xmleurope.com/2002/kttrack.asp#themodel