Fri, 11 Aug 2000 08:24:50 -0700 (PDT)
> PyExpat has the additional cost of having to cross
> the C->Python line
> all of the time. Still, I haven't heard that anyone
> has made a
> reasonably complete pure-Python parser that is as
> fast as PyExpat. I
> don't know anything about Aaron's.
That comment of Robin's wasn't supposed to have leaked
The cat is out of the bag, so here's what is
happening: ReportLab (me, Robin and Aaron) have been
looking at all the parsers around and trying to figure
out a natural way to map XML to Python object models
for a whole load of customer projects. So we wanted
the easiest way to get a tree structure, without
caring if it was DOM or not.
Aaron sat down to write a simple rec-descent parser
using string.find and nothing else, which outputs a
tree of dictionaries. It handles tags, text, cdata
and very little else. This was mostly a learning
exercise and took half a day. Amazingly, it gets
similar speeds on Hamlet to qp_xml. We reckon this is
because essentially the same thing is going on: C code
(string.find) grabs the next token, then calls back
into Python to do something with it. We've found out
in the past that extensions don't give much of a
speedup when you make lots of little calls to them.
Don't get too excited, as there are probably a whole
bunch of occasional cases it doesn't handle yet and
which may slow it down. It may not be "reasonably
complete" by your definition - unlike PyExpat, which
is extremely well proven.
It should hopefully get released in a week or two, but
there's some more to do first.
Do You Yahoo!?
Kick off your party with Yahoo! Invites.