[XML-SIG] Bug in exception handling?
Rob Hooft
r.hooft@euromail.net
Thu, 24 Jun 1999 13:19:36 +0200 (MZT)
>>>>> "JJ" == Jack Jansen <jack@oratrix.nl> writes:
JJ> my first guess would be a mismatch in the Python build: if pyexpat is
JJ> compiled as a dynamic library it may have been linked against an older
JJ> version of Python, or one of the "critical" build options (refcount
JJ> debugging and such) was different.
It doesn't look like that....
What I (by accident) did find is that it has something to do with
Refcounting: The current code (drv_pyexpat) looks like:
if not self.parser.Parse(fileobj.read(),1):
self.__report_error()
If I replace that by
buf=fileobj.read()
if not self.parser.Parse(buf,1):
self.__report_error()
The exception does not dump core.
The "by accident" I'm talking about is that I tried to eliminate the
"sax" layer from the code, because in the profile listing of a test
parse, the top routines were all in drv_pyexpat:
21989 4.600 0.000 6.930 0.000 evaly.py:87(HandleReflection)
21989 5.070 0.000 7.950 0.000 evaly.py:102(HandleEndReflection)
117706 7.490 0.000 7.490 0.000 saxutils.py:86(__init__)
21989 8.760 0.000 13.080 0.001 evaly.py:95(HandleIntensity)
22733 10.130 0.000 16.400 0.001 evaly.py:90(HandleIndex)
134166 12.920 0.000 12.920 0.000 saxutils.py:113(__getitem__)
154259 14.020 0.000 14.020 0.000 evaly.py:55(characters)
117706 14.190 0.000 22.140 0.000 evaly.py:63(endElement)
117706 16.540 0.000 38.680 0.000 drv_pyexpat.py:45(endElement)
117706 19.330 0.000 55.740 0.000 evaly.py:50(startElement)
154259 28.090 0.000 42.110 0.000 drv_pyexpat.py:48(characters)
117706 41.440 0.000 104.670 0.001 drv_pyexpat.py:38(startElement)
1 47.530 47.530 232.990 232.990 drv_pyexpat.py:58(parseFile)
I think especially that:
def startElement(self,name,attrs):
at = {}
for i in range(0, len(attrs), 2):
at[attrs[i]] = attrs[i+1]
self.doc_handler.startElement(name,saxutils.AttributeMap(at))
is very expensive, as I'm not normally using the attributes on most of
the elements. For me, a lazy version of AttributeMap would help a bit.
Bypassing sax altogether and using pyexpat directly reduces parsing
time with 40%. 45 seconds on a "moderately sized" file (some of my
clients have files that are going to be 20 times bigger still,
i.e. 60MB of XML) is still considerably long, so I'll need to speed it
up a bit more to make it really usable.
Regards,
Rob Hooft.
--
===== R.Hooft@EuroMail.net http://www.xs4all.nl/~hooft/rob/ =====
===== R&D, Nonius BV, Delft http://www.nonius.nl/ =====
===== PGPid 0xFA19277D ========================== Use Linux! =========