[XML-SIG] Bug in exception handling?

Rob Hooft r.hooft@euromail.net
Thu, 24 Jun 1999 13:19:36 +0200 (MZT)


>>>>> "JJ" == Jack Jansen <jack@oratrix.nl> writes:

 JJ> my first guess would be a mismatch in the Python build: if pyexpat is
 JJ> compiled as a dynamic library it may have been linked against an older 
 JJ> version of Python, or one of the "critical" build options (refcount
 JJ> debugging and such) was different. 

It doesn't look like that....

What I (by accident) did find is that it has something to do with
Refcounting: The current code (drv_pyexpat) looks like:

        if not self.parser.Parse(fileobj.read(),1):
            self.__report_error()

If I replace that by 
        buf=fileobj.read()
        if not self.parser.Parse(buf,1):
            self.__report_error()

The exception does not dump core.

The "by accident" I'm talking about is that I tried to eliminate the
"sax" layer from the code, because in the profile listing of a test
parse, the top routines were all in drv_pyexpat:

    21989    4.600    0.000    6.930    0.000 evaly.py:87(HandleReflection)
    21989    5.070    0.000    7.950    0.000 evaly.py:102(HandleEndReflection)
   117706    7.490    0.000    7.490    0.000 saxutils.py:86(__init__)
    21989    8.760    0.000   13.080    0.001 evaly.py:95(HandleIntensity)
    22733   10.130    0.000   16.400    0.001 evaly.py:90(HandleIndex)
   134166   12.920    0.000   12.920    0.000 saxutils.py:113(__getitem__)
   154259   14.020    0.000   14.020    0.000 evaly.py:55(characters)
   117706   14.190    0.000   22.140    0.000 evaly.py:63(endElement)
   117706   16.540    0.000   38.680    0.000 drv_pyexpat.py:45(endElement)
   117706   19.330    0.000   55.740    0.000 evaly.py:50(startElement)
   154259   28.090    0.000   42.110    0.000 drv_pyexpat.py:48(characters)
   117706   41.440    0.000  104.670    0.001 drv_pyexpat.py:38(startElement)
        1   47.530   47.530  232.990  232.990 drv_pyexpat.py:58(parseFile)

I think especially that:

    def startElement(self,name,attrs):
        at = {}
        for i in range(0, len(attrs), 2):
            at[attrs[i]] = attrs[i+1]
            
        self.doc_handler.startElement(name,saxutils.AttributeMap(at))

is very expensive, as I'm not normally using the attributes on most of
the elements. For me, a lazy version of AttributeMap would help a bit.
Bypassing sax altogether and using pyexpat directly reduces parsing
time with 40%. 45 seconds on a "moderately sized" file (some of my
clients have files that are going to be 20 times bigger still,
i.e. 60MB of XML) is still considerably long, so I'll need to speed it
up a bit more to make it really usable.

Regards,

Rob Hooft.

-- 
=====   R.Hooft@EuroMail.net   http://www.xs4all.nl/~hooft/rob/  =====
=====   R&D, Nonius BV, Delft  http://www.nonius.nl/             =====
===== PGPid 0xFA19277D ========================== Use Linux! =========