[XML-SIG] SAX 2.0, again

Ken MacLeod ken@bitsko.slc.ut.us
21 Feb 2000 13:37:46 -0600

Lars Marius Garshol <larsga@garshol.priv.no> writes:

> ### XML names
> The first problem is that of how to represent XML names. SAX 2.0 can
> handle namespaces, and so we must somehow represent namespace-names.
> I can see several different ways of doing this, all with their
> advantages and disadvantages, and would very much like to hear the
> opinion of the XML-SIG on this.
> The alternatives I've thought of are
>  - use (uri, localpart) tuple for namespace-names, simple strings for
>    ordinary names
>  - use (uri, localpart, rawname) for namespace-names, simple strings
>    for ordinary names; rawname must be communicated out of band
>    somehow
>  - use XMLName objects for names, regardless of kind. If these were
>    made immutable and drivers used hashtables of these this might not
>    be too inefficient.
>  - use separate parameters for uri, localpart and rawname, letting
>    some of these be None depending on what was in the document and
>    what the parser supports.

The proposal I made earlier (passing objects instead of positional
parameters) is another solution.  From my proposal and Paul's miniDOM
proposal earlier, start_element would be passed an Element object:

class Element(Node):
        string tagName
        {Dictionary of Name->Value} attributes
        string namespaceURI
        string prefix
        string localName

I believe tagName is the raw name and the remaining three are set
depending on whether NS processing is turned on.  For attributes to be
a dictionary and support both NS and no-NS processing, I like (uri,
localName) for NS and (None, tagName) for no-NS.

> ### Unicode support
> Python 1.6 will have Unicode support, and so we should make PySAX 2.0
> Unicode-ready. The main part of this is really adding the InputSource
> object to the library, since this allows applications to feed byte or
> character streams to the parser in a convenient way.

Adding InputSource may not be necessary if there was a method
parseCharFile() to specify character streams.

> ### easySAX vs Pyxie
> What should we do with this? Should we try to turn Pyxie into what we
> envisioned easySAX to be, or should we maintain two such libraries? I
> see advantages and disadvantages to both approaches.
> One idea I've had for easySAX is something inspired by John Aycock's
> Spark parser generator, that one could write SAX document handlers
> with three kinds of special methods: start-element, end-element and
> element content methods. These could use the 's_', 'e_' and 'c_'
> prefixes, respectively.

> I'm fairly confident that a layer on top of SAX 2.0 to enable such
> easySAX applications could be made fairly fast and it should be pretty
> easy to implement as well. (I've made an early sketch of this.)

If I understand correctly, yes, having a SAX filter that calls
tag-based methods names should be really easy.

I think the part I don't understand about easySAX and Pyxie (and it's
probably from not having the opportunity to use them) is: why isn't
the SAX binding already this easy?

  -- Ken