Lars Marius Garshol larsga@ifi.uio.no
19 Apr 1999 10:32:49 +0200

* Lars Marius Garshol
| I think it makes sense to have something a bit more lightweight and
| easier to use than the DOM. However, why not build it on top of SAX
| instead of pyexpat? No reason to restrict ourselves to just one
| parser, is there?

* Greg Stein
| No particular reason, although it will be somewhat slower if based
| on SAX. 

It will, so maybe we should consider making two builders?

| I see in drv_pyexpat.py that the startElement handler does a good
| bit of work before getting to the "real" start handler. It would be
| nice to skip that :-) (honestly, though, I don't know what kind of
| overhead it creates).

If you have a lot of attributes I guess it will be slow, but I think
applications using your qp_xml will essentially have to redo that work
(and quite possibly in a less efficient manner), since they can't just
do a simple lookup to get the attribute values.

So your qp_xml would be nicer if it had a hash of attributes instead
of a list, and applications based on it would very likely be faster.

Also, I think it might make sense to modify pyexpat to create a hash
in the PyAPI wrapping instead of a list as it does now. That would
most likely be both the fastest and the nicest solution.
| It might be nice to switch it to SAX and bench the pure pyexpat
| version against the SAX version.

Feel free. I don't have the time, I'm afraid.
| I do agree that SAX-based would be the Right Thing, but I'm also
| willing to trade that for speed since people can always use the DOM
| if they need to use a different, underlying parser (such as
| xmlproc).

Or sgmlop, or htmllib, or sgmllib. Or, when I get round to it, SP or
Java parsers under JPython. Maybe also RXP.

--Lars M.