[XML-SIG] SAX prettyprinter V2 and SGMLOP

Christian Tismer tismer@appliedbiometrics.com
Sat, 23 Jan 1999 16:44:02 +0100

Lars Marius Garshol wrote:
> * Christian Tismer
> |
> | the appended version of Indenter.py can use sgmlop to format large
> | XML files. It then processes a few megabytes in a few seconds.
> How is the performance when you use sgmlop directly compared to when
> you use it's SAX driver?

I didn't try yet since I was very happy with the speed.

> | BTW - is sgmlop deprecated?
> If it works with your XML it should be OK, but it does not conform
> very closely to the standard, unlike expat.

I could no use pyexpat yet, since a pyexpat dll is missing.
I will build one for Windows (as I also did before with sgmlop,
the binary in the CVS was broke). I just wasn't aware that
I need to get an extra tar file for that.

When I find the time, I will also provide a patch for sgmlop for
a couple of things.
What I need to find is the fastest acceptable parser which allows
me to turn masses of XML data into Python structures. We don't
work with complicated but smaller documents, but we are processing
XML encoded database records which are quite irregular (useless
to use a relational database) and quite simple, but the standard
size is some 50MB. This is why I'm after speed, much more than

A general question (comes up because I had to hack my Indenter
especially for sgmlop):
Is a SAX parser required to report ignorableWHitespace events?
Or is it also allowed to never call this method, as sgmlop does?
If so, then the interface doesn't make too much sense since I have
to collect all data and handle whitespace when the next tag appears.

ciao - chris

Christian Tismer             :^)   <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH      :     Have a break! Take a ride on Python's
Kaiserin-Augusta-Allee 101   :    *Starship* http://starship.skyport.net
10553 Berlin                 :     PGP key -> http://pgp.ai.mit.edu/
     we're tired of banana software - shipped green, ripens at home