[DOC-SIG] XML extension module.

Stefane Fermigier fermigie@math.jussieu.fr
Fri, 19 Dec 1997 21:28:00 +0100 (MET)


Forwarded message:
>> From: Sean Mc Grath <digitome@iol.ie>
>> 
>> I would like to see basic XML support provided as a portable C extension module.
>> I believe XML will take off and that the XML support in Python
>> will be sufficently useful to go into the standard
>> distribution.
>> 
>> 1) XML processing must be fast. Microsoft build but criticise Java XML parsers
>> over speed issues and trumpet Windows specific ActiveX components as
>> a "solution" to the speed issues. Lets not allow the same charge be leveled
>> at Python.
>> 
>> 2) I believe the speed difference will be *significant*. I have Python + C
>> implementation of a LoadEsis treebuilding library for Python (the tree building
>> process for XML will be analagous I suspect). The C one is massively faster.
>> 
>> At the very least, we could include a C based XML lexer to take care of
>> some of the hairy bits and spit out basic XML tokens. Python layers could
>> site on top of that to do well formedness checks/tree building/validation.
>> James Clark has
>> written such a thing (in Ansi C) and made it freely available.

Another option would be to use Marc-Andre Lemburg fast tag engine, which
is made of a kernel written in C (kind of enhanced Finite State Automaton)
and already has an HTML parser written (if you don't know about it, it's on
his page on starship I guess).

My own opinion is that we should strive to write a modular system where
several parsers can be used to the same effect: slow but portable parsers
written in Python, parsers built on sgmls ESIS output or fast parser written
as C modules.

Cheers,

	S.

_______________
DOC-SIG  - SIG for the Python Documentation Project

send messages to: doc-sig@python.org
administrivia to: doc-sig-request@python.org
_______________