[XML-SIG] How do I write an efficient parser?

Thomas B. Passin tpassin@home.com
Sun, 23 Sep 2001 10:39:00 -0400


[Martin v. Loewis]
    [Don Allingham]
> > My project has been using the SAX parser under 1.5.2 and 2.X. The XML
> > file contains genealogy information. When I have about 2000 people in
> > the database, the expat parser reads it in a reasonable amount of time -
> > a few seconds, not too long for the user.
> >
> > However, when I starts reaching the 6000-7000 entries, it can take up to
> > a minute or longer. At 50000, it takes several minutes, which is just
> > unacceptable.
>
> This suggests linear complexity, right? This is how it should be,
> parsers do have linear complexity by nature.
>
> > Is there a way to write a more efficient parser without having to resort
> > to C?
>

It doesn't sound linear to me, from the sketchy information that Don
provided.  If it were linear it would have taken about a minute for 50,000,
not several minutes.  Still, a minute might be too long for him.

One thing that has been known to slow things down non-linearly is if you
build up a really long string for output (like tens of thousands of bytes)
by concatenating little pieces.  This is about Python strings, not XML
processing per se.

If that is what Don's program does, an easy way to get a dramatic
improvement would be to append each little text fragment to a list, then do
string.join() on the list at the end.  I've seen this turn a dog of a
program into a snappy performer.

Let's hope that's Don's situation.

Cheers,

Tom P