[XML-SIG] Bug in exception handling?

Fredrik Lundh fredrik@pythonware.com
Thu, 24 Jun 1999 14:10:49 +0200


Rob Hooft <r.hooft@euromail.net> wrote:
> Bypassing sax altogether and using pyexpat directly reduces parsing
> time with 40%. 45 seconds on a "moderately sized" file (some of my
> clients have files that are going to be 20 times bigger still,
> i.e. 60MB of XML) is still considerably long, so I'll need to speed it
> up a bit more to make it really usable.

with a little luck, you might be able to use sgmlop instead
(it cannot handle all possible XML constructs yet, but it
might work on your material).

here's a simple benchmark, run on an old 200 MHz pentium
box, under NT:

> dir big.xml
99-06-24  13:47             62 078 532 big.xml
> python benchxml.py big.xml
sgmlop/null parser: 8.567 seconds; 7246131 bytes per second
sgmlop/dummy parser: 51.943 seconds; 1195134 bytes per second
^C

(didn't have time to wait for the standard xmllib
implementation to finish...)

in this test, the null parser defines no parser callbacks
at all, so it basically measures the time it takes sgmlop
to read the file from disk, and to split it into elements.
the dummy parser defines all python callbacks as empty
methods. as you see, it's quite expensive to call Python
methods from C.  if you're going to DO things with the
data, things get even worse... (but a few hundred kb's
per second on a similar box should be no problem).

get your copy from:
http://www.pythonware.com/madscientist/

</F>