Lars Marius Garshol
04 Oct 1999 13:49:47 +0200
* uche ogbuji
| The module is attached.
Uche, this is great! It duplicates what I have already done (and
already posted), but that doesn't matter. If we can thrash out the
issues on the list and arrive at one set of interfaces then that would
I've sent your proposal to the printer and will look at it tonight.
For comparison, here is mine:
The list below is copied directly from David Megginsons latest
proposal. Note that all features are optional.
Validate (true) or don't validate (false).
Expand external general entities (true) or don't expand (false).
Expand external parameter entities including the external DTD subset
(true) or don't expand (false).
Preprocess namespaces (true) or don't preprocess (false). See also
the http://xml.org/sax/properties/namespace-sep property.
Ensure that all consecutive text is returned in a single callback to
DocumentHandler.characters or DocumentHandler.ignorableWhitespace
(true) or explicitly do not require it (false).
Provide a Locator using the DocumentHandler.setDocumentLocator
callback (true), or explicitly do not provide one (false).
This handler is supposed to be used by applications that need
information about lexical details in the document such as comments and
entity boundaries. Most applications won't need this, but the DOM will
find it useful. Support for this handler will be optional.
This handler has the handerID http://xml.org/sax/handlers/lexical.
def xmlDecl(self, version, encoding, standalone):
"""All three parameters are strings. encoding and standalone are not
specified on the XML declaration, their values will be None."""
def startDTD(self, root, publicID, systemID):
"""This event is reported when the DOCTYPE declaration is
encountered. root is the name of the root element type, while the two last
parameters are the public and system identifiers of the external
"This event is reported after the DTD has been parsed."
def startEntity(self, name):
"""Reports the beginning of a new entity. If the entity is the
external DTD subset the name will be '[dtd]'."""
def endEntity(self, name):
--- Extended parser
def setFeature(featureID, state)
This turns on or off (depending on whether state is true or false)
support for a particular feature (like namespaces, validation etc).
The parser can raise SAXNotSupportedException if it doesn't
support the feature or its subclass SAXUnrecognizedException.
def setHandler(handlerID, handler):
This registers an event handler with the parser (LexicalHandler,
NamespaceHandler or maybe some special parser-defined handler).
The parser can raise SAXNotSupportedException if it doesn't
support the handler or its subclass SAXUnrecognizedException.
def set(propertyID, value):
This sets the value of a parser property (such as the namespace
separator string or something parser-defined.) The parser can
raise SAXNotSupportedException if it doesn't support the handler
or its subclass SAXUnrecognizedException.
This returns the value of a property. The parser can raise
SAXNotSupportedException if it doesn't support the handler or its
The first three properties come from the JavaSAX proposal, while the
last one was invented by yours truly.
http://xml.org/sax/properties/namespace-sep <String> (write-only)
Set the separator to be used between the URI part of a name and the
local part of a name when namespace processing is being performed
(see the http://xml.org/sax/features/namespaces feature). By
default, the separator is a single space. This property may not be
set while a parse is in progress (throws a SAXNotSupportedException).
http://xml.org/sax/properties/dom-node <Node> (read-only)
Get the DOM node currently being visited, if the SAX parser is
iterating over a DOM tree. If the parser recognises and supports
this property but is not currently visiting a DOM node, it should
return null (this is a good way to check for availability before the
This property doesn't make much sense for Python, but I see no point
in leaving it out, either.
http://xml.org/sax/properties/xml-string <String> (read-only)
Get the literal string of characters associated with the current
event. If the parser recognises and supports this property but is
not currently parsing text, it should return null (this is a good
way to check for availability before the parse begins). I stole
this idea from Expat.
In addition, I think PySAX needs the following property:
http://python.org/sax/properties/data-encoding <String> (read/write)
This property can be used to control which character encoding is
used for data events that come from the parser. Throws
SAXEncodingNotSupportedException if the encoding is not supported
by the parser.
This posting specifies both an extended AttributeList interface for
information needed by the DOM (and possibly also others) and also for
full XML 1.0 conformance. I'm not really sure whether we should
actually use all of this, so opinions are welcome.
"""Returns true if the attribute was explicitly specified in the
document and false otherwise. attr can be the attribute name or
its index in the AttributeList."""
"""This returns the EntityRefList (see below) for an attribute,
which can be specified by name or index."""
The class below is inteded to be used for discovering entity reference
boundaries inside attribute values. This is needed because the XML 1.0
recommendation requires parsers to report unexpanded entity references,
also inside attribute values. Whether this is really
something we want is another matter.
"Returns the number of entity references inside this attribute value."
def getEntityName(self, ix):
"Returns the name of entity reference number ix (zero-based index)."
def getEntityRefStart(self, ix):
"""Returns the index of the first character inside the attribute
value that stems from entity reference number ix."""
def getEntityRefEnd(self, ix):
"Returns the index of the last character in entity reference ix."
One redeeming feature of this interface is that it lives entirely
outside the attribute value, and so can be ignored entirely by those
who are not interested.