[XML-SIG] SAX: Names with no namespace

Thomas B. Passin tpassin@home.com
Tue, 20 Feb 2001 20:02:56 -0500

Martin v. Loewis wrote -

> >   Actually, I had though we *had* decided, and None was the
> > concensus.
> That is also my recollection - there is even a PEP document somewhere;
> you can get a copy from the archives, or from Tom Passin.
I don't recall  that anyone actually declared that it was decided, but almost
everyone who posted on this issue agreed that using "None" is the way to go.
I propose that we do declare that it has been decided - Martin, are you
willing to be the temporary benevolent dictator on this?

Here's a copy of the draft PEP:

<?xml version='1.0'?>
  <pep_title>Values for Null Or Empty Namespace URIs</pep_title>
   <author name='Thomas B. Passin' email='tpassin@home.com'/>
  <type>Standards Track</type>
   <post date='29-Jan-2001'/>
   <post date='4-Feb-2001'/>
  This PEP specifies the proper values of the Namespace URI property
  when its value might otherwise appear to be either "null", "None", or the
  empty string.

  Such Namespace URIs are discussed in SAX[1], DOM2[2], and XML-Namespaces[3]
  These three recommendations do not appear to be in full agreement.  This
  and differences between Java and Python, has lead to some confusion and
  some disagreement between various implementations supported by PyXML.  The
  language in these three Recommendations is reviewed.

  The recommendation is made to use None as the URI value in all cases where
  no URI applies to an element or attribute.

  The XMLPEP, when approved, will apply to all namespace-aware software
  maintained by the pyxml interest group.

  <para title='Namespace-aware applications'>
   When no namespace has been declared whose scope applies to a
   particular element or attribute, the application MUST report the
   URI of the namespace of the element or attribute as None.  When there is no
   namespace prefix, the application MUST report the value of the prefix as

  <para title='Namespace-ignorant applications'>
   This requirement does not apply for applications that are not

  <para title='Applicability'>
   This requirement applies to all XML processing software maintained by the
   interest group.

  <para title='Definitive Treatment Needed'>
  This PEP is needed because of continued uncertainty among varous PyXML
  developers as to the proper values to use, and because of inconsistency
  among various PyXML products.  Differences between Python, IDL, and Java
  make an unambiguous interpretation unclear.

  A definitive and consistent treatment is needed so that all the PyXML
  software may be made consistent.

  <para title='W3C Namespaces Recommendation'>
   The Namespaces Recommendation recognizes that a namespace URI may
   be given no value - called "empty" in the Recommendation - even
   though a structure for a URI is provided in the document.  Two relevant
   passages are quoted here:

    <quote>Section 2. ...
      [Definition:] If the attribute name matches DefaultAttName,
      then the namespace name in the attribute value is that of the
      default namespace in the scope of the element to which the declaration
      is attached. In such a default declaration, the attribute value
      may be empty.
    <quote>5.2 Namespace Defaulting
      A default namespace is considered to apply to the element where
      it is declared (if that element has no namespace prefix), and to
      all elements with no prefix within the content of that element.
      If the URI reference in a default namespace declaration is empty,
      then unprefixed elements in the scope of the declaration are not
      considered to be in any namespace. Note that default namespaces
      do not apply directly to attributes.

      ...The default namespace can be set to the empty string. This has the
      same effect, within the scope of the declaration, of there being no
      default namespace.

     The term "empty" is not defined further, but in the context of the
     Recommendation, it must mean a missing string value.  The last
     fragment quoted above suggests, but does not require, that an
     empty string may be returned for an "empty" URI value.

     This has no direct applicability to values returned by implemenations,
       1) the word "can" is used, rather than "must", and
       2) the Recommendation seems to apply to XML documents,
          not to implementations.

  <para title='W3C DOM Level 2 Recommendation'>
    The W3C DOM Level 2 Recommendation refers to "null" namespaces in
    several places.  The thrust is clear and consistent: a "null" value
    is to be used to indicate a non-existent namespace URI value. Here
    are some relevant extracts from the Recommendation:

     <quote>Note that because the DOM does no lexical checking, the
       empty string will be treated as a real namespace URI in DOM Level 2
       methods. Applications must use the value null as the namespaceURI
       parameter for methods if they wish to have no namespace.

    The IDL definition for the createAttributeNS() method creates an
    attribute with these characteristics:
        A new Attr object with the following attributes:
Attribute    Value
Node.nodeName    qualifiedName
Node.namespaceURI   namespaceURI
Node.prefix    prefix, extracted from qualifiedName,
                                    or null if there is no prefix
Node.localName    local name, extracted from qualifiedName
Attr.name    qualifiedName
Node.nodeValue    the empty string

  <para>For the older, non-NS aware createAttribute() method, the
Recommendation says
    <quote>...localName, prefix, and namespaceURI set to null. </quote>

  <para>This is typical - a "null" is returned of there is no prefix or

  <para>It is clear that the IDL specifies the use of "null" for empty
    rather that the empty string.  The java binding does not specify any
    way value.

    Thus there seems to be nothing the the DOM Recommendation that suggests
    empty strings should be used, and there is clear language that "null"
    should be used.

  <para title='SAX2'>
    The SAX2 java API clearly says that an empty string is to be
    returned.  The following extracts demonstrate this:

    <quote>In SAX2, the startElement and endElement callbacks in a content
      look like this:
            public void startElement (String uri, String localName,
                 String qName, Attributes atts)
                 throws SAXException;

            public void endElement (String uri, String localName, String
                   throws SAXException;
      By default, an XML reader will report a Namespace URI and a local name
      every element, in both the start and end handler. Consider the following
        <html:hr xmlns:html="http://www.w3.org/1999/xhtml"/>
      With the default SAX2 Namespace processing, the XML reader would report
      a start and end element event with the Namespace URI
      "http://www.w3.org/1999/xhtml" and the local name "hr". The XML
       reader might also report the original qName "html:hr", but that
       parameter might simply be an empty string.

        <h:hello xmlns:h="http://www.greeting.com/ns/" id="a1"
        If namespaces is true and namespace-prefixes is true,
        then a SAX2 XML reader will report the following:
           an element with the Namespace URI "http://www.greeting.com/ns/",
           the local name "hello", and the qName "h:hello";
           an attribute with no Namespace URI (empty string),
             no local name (empty string), and the qName "xmlns:h";
           an attribute with no Namespace URI (empty string), the
             local name "id", and the qName "id"; and an attribute
             with the Namespace URI "http://www.greeting.com/ns/",
             the local name "person", and the qName "h:person".

  <para title='Discussion of The Three Recommendations'>
    To summarize, the Namespace Recommendation is essentially silent
    on the subject, the DOM clearly specifies "null" values, and SAX2
    clearly specifies the use of empty strings.



  <para title='Arguments Favoring the Use of "None"'>
   The "highest" level Recommendation is presumably the DOM.
   Python offers a data object similar to "null" - the None object.
   The None object can be tested for exactly as for an empty string:

    <code>if uri:

   Alternatively, None can be tested for explicitly, as in:

    <code>if uri is not None:

   Thus, None is flexible enough to be useful for this purpose.

    Many posts to the PyXML list have favored the use of None,
    although not all.  Either None or the empty string would seem to
    work in this context.  "None" agrees with the DOM Recommendation,
    and would seem (in a mnemonic sense)to suggest the absence of
    a prefix or URI.

  <para title='4DOM Handling of None URIs and Prefixes'>
    The 4DOM code will handle a None URI correctly in many places,
     since it uses tests like this typical example:

          if namespaceURI and namespaceURI != XML_NAMESPACE:
            # ...

    This code works correctly if the namespaceURI is None.

  <para>Another test used in 4DOM is as follows:

    <code>def getElementsByTagNameNS(self,namespaceURI,localName):
        root = self.documentElement
        if root == None:
            return implementation.createNodeList([])
        py = root.getElementsByTagNameNS(namespaceURI,localName)
        if namespaceURI == '*' or namespaceURI == root.namespaceURI:
            if localName == '*' or localName == root.localName:
        return py

    The expression "namespaceURI == '*'" also evaluates correctly when
    the URI is None.

  <para>If handling code is consistent throughout 4DOM, then it will handle
     None correctly.

  <para title='SAX2'>
   [Need material here]

 <reference_implementation>[Should there be a reference here to one
  particular processor, such as xmlproc?]
 <copyright>This PEP may be used by anyone.</copyright>