[XML-SIG] saxlib.py, package structure, & HOWTO outline (repost)

Andrew Kuchling akuchlin@cnri.reston.va.us
Tue, 17 Mar 1998 11:50:08 -0500 (EST)


[I'm reposting this earlier message of mine, because I think many
 people joined the list after it went out and haven't seen it.
 This message bounces over several topics; some bits are questions
 that we should discuss, others are just random musings of mine.]

The current status of the XML-SIG is quite promising; we've already
got prototype implementations of the two XML APIs (SAX and DOM), and a
prototype interface to XMLTok.

A bit of explanation (that will probably get recycled into the HOWTO):
SAX and DOM are two sides of the same coin; they're different ways to
access representations of XML documents.  DOM is a tree-based
representation, so you have the whole document in memory at once
(unless you either do something extremely clever with lazy
construction of the tree, or place constraints on how you can traverse
the tree and then build it on the fly).  SAX is an event-based API, so
you write callbacks, which get called by the XML parser as elements
begin and end.  Both are useful for different tasks; you can wander
all over the tree at random with DOM, but SAX is lower-level and lets
you construct only the data structures you require--perhaps none at
all.

This distinction is nicely explained at
<http://www.microstar.com/XML/SAX/event.html>.  I'll add a link to
this page to the XML-SIG's Resources page, at
<http://www.python.org/sigs/xml-sig/links.html>. Suggestions for more
links are welcome.

I've taken a brief look at saxlib.py, and it looks very neat and
understandable; I quite agree with Paul Prescod's favorable impression
of it.  What's missing from it?  As far as I can tell, documentation
is the only thing missing, but I'm no XML expert.  Tutorial
information on SAX seems hard to come by, but that's what the HOWTO
will be for...

One minor nit: saxdemo.py has a problem with the following lines.
import xmlproc
p=xmlproc.Parser()

There doesn't seem to be a Parser class or function in xmlproc.py, so
an AttributeError is raised.  Have I messed something up?

I've done no more than download Stefane Fermigier's DOM code; haven't
actually looked at it yet.  One thing I've noticed is that it uses
packages ("from dom.transformer import *") while the SAX library just
uses top-level modules.  Perhaps we should try to pin down the layout
of the XML package first.  Should there be subpackages (XML.SAX.foo,
XML.DOM.foo, ...) or is it enough to put everything in a package named
'XML'?

I'd like to settle the question of package organization first, so the
SAX and DOM implementations can be modified accordingly.  Then I'll
try rewriting my quotation-file handling using the new code, and see
what problems I run into.

Fuzzily thinking about the organization of an XML-HOWTO, my outline
looks like:

Overview: (a few paragraphs)
        What is XML?  Why do you care?
Introduction to XML: (a few pages)
        Extremely brief intro to XML syntax & ideas, w/ pointers to complete
        resources
Glossary:
        Glossaries usually come at the end, but there are enough
        acronyms and concepts that it might be better placed here.
DOM:
        The tree-based interface to XML documents.  Explanations,
        sample code, ...
SAX:
        The event-based interface.  Explanations, sample code, ...



A.M. Kuchling			http://starship.skyport.net/crew/amk/
Technology is a gift of God. After the gift of life it is perhaps the greatest
of God's gifts. It is the mother of civilizations, of arts and of sciences.
	-- Freeman Dyson , _Infinite in All Directions_