uche at ogbuji.net
Sat Feb 21 16:34:54 CET 2004
"James Kew" <james.kew at btinternet.com> wrote in message news:<c0rkhb$1ai18a$1 at ID-71831.news.uni-berlin.de>...
> "Chris Herborth" <chrish at cryptocard.com> wrote in message
> news:5z3Yb.3920$Cd6.177500 at news20.bellglobal.com...
> > PyXML on Sourceforge (http://pyxml.sourceforge.net/) has faster
> > DOM-producing routines.
> Which are? I like PyXML, but well-documented it ain't. I tend to use PyXML's
> minidom, fed by either the validating (== xmlproc) or non-validating (==
> expat) parsers -- are there faster PyXML alternatives?
> > pyRXP (http://www.reportlab.org/pyrxp.html) is probably the fastest XML
> > parser for Python, but it doesn't produce a DOM or have a SAX API...
> And recent threads here suggest it's not fully XML-compliant either, unless
> you can work in an ASCII-only XML subset.
Yes, and this is a very serious problem. Anyone entering into XML
processing with the belief that they'll never need anything but
Unicode characters under U+256 is fooling himself. Heck, even XML
exports from MS Office will generate high Unicode characters for
"smart" quotes, em nd en dashes, ellipses and a lot of other comon
punctuation. All of these will blow up with PyRXP.
You can use PyRXPU, which is compliant but indications are that it
isn't as fast.
> For raw speed, libxml2 (and its Python wrapper) seems to get a lot of
> glowing reviews. It's not a standard DOM API, though, and again
> documentation is a problem (lots of C-API-level documentation, but not much
> in terms of how to put it together into a working Python app).
> I gave it a whirl and it certainly seemed to fly, but getting to grips with
> the API and converting my existing DOM-manipulating code to it felt like too
> much of a hurdle given that my app runs fast enough as it is.
This was my biggest problem with libxml2/Python as documented here:
If documentation for Python users is improved, it will be hard to beat
But your criteria lead me to suggest that you give cDomlette a try. I
is also implemented in C for performance. It's as much DOM compliant
as libxml2's DOM API (which is to say not fully so), but we do try to
document it from the Python POV. See:
More information about the Python-list