[DOC-SIG] XML extension module.
Stefane Fermigier
fermigie@math.jussieu.fr
Fri, 19 Dec 1997 21:28:00 +0100 (MET)
Forwarded message:
>> From: Sean Mc Grath <digitome@iol.ie>
>>
>> I would like to see basic XML support provided as a portable C extension module.
>> I believe XML will take off and that the XML support in Python
>> will be sufficently useful to go into the standard
>> distribution.
>>
>> 1) XML processing must be fast. Microsoft build but criticise Java XML parsers
>> over speed issues and trumpet Windows specific ActiveX components as
>> a "solution" to the speed issues. Lets not allow the same charge be leveled
>> at Python.
>>
>> 2) I believe the speed difference will be *significant*. I have Python + C
>> implementation of a LoadEsis treebuilding library for Python (the tree building
>> process for XML will be analagous I suspect). The C one is massively faster.
>>
>> At the very least, we could include a C based XML lexer to take care of
>> some of the hairy bits and spit out basic XML tokens. Python layers could
>> site on top of that to do well formedness checks/tree building/validation.
>> James Clark has
>> written such a thing (in Ansi C) and made it freely available.
Another option would be to use Marc-Andre Lemburg fast tag engine, which
is made of a kernel written in C (kind of enhanced Finite State Automaton)
and already has an HTML parser written (if you don't know about it, it's on
his page on starship I guess).
My own opinion is that we should strive to write a modular system where
several parsers can be used to the same effect: slow but portable parsers
written in Python, parsers built on sgmls ESIS output or fast parser written
as C modules.
Cheers,
S.
_______________
DOC-SIG - SIG for the Python Documentation Project
send messages to: doc-sig@python.org
administrivia to: doc-sig-request@python.org
_______________