[XML-SIG] Bug in exception handling?

Rob Hooft r.hooft@euromail.net
Thu, 24 Jun 1999 14:37:23 +0200 (MZT)


>>>>> "FL" == Fredrik Lundh <fredrik@pythonware.com> writes:

 FL> Rob Hooft <r.hooft@euromail.net> wrote:

 >> Bypassing sax altogether and using pyexpat directly reduces parsing
 >> time with 40%. 45 seconds on a "moderately sized" file (some of my
 >> clients have files that are going to be 20 times bigger still,
 >> i.e. 60MB of XML) is still considerably long, so I'll need to speed it
 >> up a bit more to make it really usable.

 FL> with a little luck, you might be able to use sgmlop instead
 FL> (it cannot handle all possible XML constructs yet, but it
 FL> might work on your material).

 FL> here's a simple benchmark, run on an old 200 MHz pentium
 FL> box, under NT:

 >> dir big.xml
 FL> 99-06-24  13:47             62 078 532 big.xml
 >> python benchxml.py big.xml
 FL> sgmlop/null parser: 8.567 seconds; 7246131 bytes per second
 FL> sgmlop/dummy parser: 51.943 seconds; 1195134 bytes per second
 FL> ^C

I'm using a 200MHz pentium as well, but I think the biggest problem
is the kind of data I'm handling. It is mostly numerical. We're
still working on the DTD, but I can show you a typical fragment:

...
<REFLECTION NR="14" BATCH="1">
<INDEX H="-7" K="-3" L="7"/>
<INTENSITY I="8384.55" SIGMA="25.05"/>
<IMPACT HOR="-5.24" VER="-20.09" ROT="-163.146"/>
</REFLECTION>
<REFLECTION NR="15" BATCH="1">
<INDEX H="-9" K="-3" L="8"/>
<INTENSITY I="40.61" SIGMA="4.05"/>
<IMPACT HOR="0.608" VER="-23.893" ROT="-163.24"/>
<FLAG>
<WEAK/>
</FLAG>
</REFLECTION>
<REFLECTION NR="16" BATCH="1">
<INDEX H="-4" K="5" L="2"/>
<INTENSITY I="66.57" SIGMA="2.5"/>
<IMPACT HOR="-9.787" VER="10.048" ROT="-163.12"/>
</REFLECTION>
...

I think a large part of my time with any parser will be spent in
atof() and atoi().... I'll try sgmlop as soon as I can.

Rob

-- 
=====   R.Hooft@EuroMail.net   http://www.xs4all.nl/~hooft/rob/  =====
=====   R&D, Nonius BV, Delft  http://www.nonius.nl/             =====
===== PGPid 0xFA19277D ========================== Use Linux! =========