adding the XML to 2.0 to be a mistake?

Robert Roy rjroy at takingcontrol.com
Tue Jan 16 11:48:33 EST 2001


On 15 Jan 2001 22:20:38 -0500, Andrew Kuchling
<akuchlin at mems-exchange.org> wrote:

>John Schmitt <jschmitt at vmlabs.com> writes:
>> Pardon the ignorance, but where is the mistake?  Is it in adding PyXML to
>> 2.0 or is it the way it was done?  Is there no development strategy that
>> makes this less of a burden?  If a previous release of PyXML had been added
>> to 2.0, would you still consider it a mistake?
>
>Duplicating complex code in two different projects, so that they have
>to be kept in sync manually at the cost of time and effort, is the
>mistake.  Another one is tying a fast-moving project such as PyXML to
>the slower releases of Python; Python 2.0 was released on October 16,
>and there have been two PyXML releases (0.6.2 and 0.6.3) since then.
>
>--amk
>
I agree with what you are saying. Another aspect that concerns me is
that with the addition of the XML tools, xmllib is now deprecated. The
recommended alternative, SAX, does not offer the level of control that
xmllib does.

 For several tasks (eg: translation to another DTD/Schema) it is
desireable not to resolve any character entities including the
standard XML entity defs. In xmllib this is easy, set the entitydefs
dict to {} and override unknown_entityref, unknown_charref and
handle_charref to rewrite the reference to the data stream. Using SAX
you have to do string substitutions on the output data stream to put
things back the way they were. While the code to do this is fairly
simple, it is arguably counter intuitive.

Undeclared entities are a problem in SAX but can be handled cleanly
using the unknown_entityref mecanism in xmllib.

SAX as is now stands, handles marked CDATA sections as part of the
data stream. This may be fine for most uses but may be undesireable
when you are using XML as a container wrapping sections of html etc...


SAX swallows comments. Maybe I need to pass the comments through eg:
if doing a translation.

Granted most of what I have outlined here is best done at a lower
level than SAX, say pyexpat upon which SAX is built. However pyexpat
is for all intents undocumented and SAX is put forward as the
"officiallly blessed" alternative in the deprecation notice. 

pyexpat is presently not a viable option since several of the above
issues with SAX are due to the underlying expat library.

So where does that leave us?

I believe that in light of its unique capabilities, xmllib deserves a
permanent place in the Python library. To improve performance (in
sgmllib too), Fredrik's sgmlop extension should become part of the
standard distribution as well. If at some point (py)expat evolves to
the point where it can do everything that xmllib does (allow access to
ALL entity refs, identification of marked sections, comments ...),
then it could become the underlying library.



Bob



More information about the Python-list mailing list