[XML-SIG] Draft PEP for Using None in Namespace URIs

Thomas B. Passin tpassin@home.com
Mon, 5 Feb 2001 00:38:22 -0500


Here is a revised version of the PEP about using None for namespace URIs.
It's extended quite a bit.  I've tried to use the suggestions of Ken MacLeod,
Uchi, and Martin, among others, and I've spent a fair amount of time rummaging
through the various Recs.

I was disappointed to learn that the PyXML docs, as well as the 4DOM docs,
don't really say anything about this issue.  So I made a start on reading
through some of the code (not the most recent versions in the CVS tree, but
what I've got from version 0.6.2 and from downloading from 4Thought).

I'd appreciate it if anyone who is quite familiar with the SAX and SAX2 code
help out and verify if using None would cause any problems for the existing
code.

The PEP (below) makes for a longish posting, but I didn't want to use an
attachment unless everyone agrees it's OK to do so.. What do you all think
about using attachments for this kind of thing?

Cheers,

Tom P

=============================================
<?xml version='1.0'?>
<xmlpep>
 <headers>
  <pep_number>xmlpep-1</pep_number>
  <pep_title>Values for Null Or Empty Namespace URIs</pep_title>
  <pep_version>0.20</pep_version>
  <cvs_version_string/>
  <list_of_authors>
   <author name='Thomas B. Passin' email='tpassin@home.com'/>
  </list_of_authors>
  <status>Draft</status>
  <type>Standards Track</type>
  <created>29-Jan-2001</created>
  <history>
   <post date='29-Jan-2001'/>
   <post date='4-Feb-2001'/>
  </history>
 </headers>
 <abstract>
  This PEP specifies the proper values of the Namespace URI property
  when its value might otherwise appear to be either "null", "None", or the
  empty string.

  Such Namespace URIs are discussed in SAX[1], DOM2[2], and XML-Namespaces[3]
  These three recommendations do not appear to be in full agreement.  This
fact,
  and differences between Java and Python, has lead to some confusion and
  some disagreement between various implementations supported by PyXML.  The
  language in these three Recommendations is reviewed.

  The recommendation is made to use None as the URI value in all cases where
  no URI applies to an element or attribute.

  The XMLPEP, when approved, will apply to all namespace-aware software
  maintained by the pyxml interest group.
 </abstract>

 <specification>
  <para title='Namespace-aware applications'>
   When no namespace has been declared whose scope applies to a
   particular element or attribute, the application MUST report the
   URI of the namespace of the element or attribute as None.  When there is no
   namespace prefix, the application MUST report the value of the prefix as
None.
  </para>

  <para title='Namespace-ignorant applications'>
   This requirement does not apply for applications that are not
   namespace-aware.
  </para>

  <para title='Applicability'>
   This requirement applies to all XML processing software maintained by the
PyXML
   interest group.
  </para>
 </specification>


 <rationale>
  <para title='Definitive Treatment Needed'>
  This PEP is needed because of continued uncertainty among varous PyXML
  developers as to the proper values to use, and because of inconsistency
  among various PyXML products.  Differences between Python, IDL, and Java
  make an unambiguous interpretation unclear.
  </para>

  <para>
  A definitive and consistent treatment is needed so that all the PyXML
  software may be made consistent.
  </para>

  <para title='W3C Namespaces Recommendation'>
   The Namespaces Recommendation recognizes that a namespace URI may
   be given no value - called "empty" in the Recommendation - even
   though a structure for a URI is provided in the document.  Two relevant
   passages are quoted here:

    <quote>Section 2. ...
      [Definition:] If the attribute name matches DefaultAttName,
      then the namespace name in the attribute value is that of the
      default namespace in the scope of the element to which the declaration
      is attached. In such a default declaration, the attribute value
      may be empty.
    </quote>
    <quote>5.2 Namespace Defaulting
      A default namespace is considered to apply to the element where
      it is declared (if that element has no namespace prefix), and to
      all elements with no prefix within the content of that element.
      If the URI reference in a default namespace declaration is empty,
      then unprefixed elements in the scope of the declaration are not
      considered to be in any namespace. Note that default namespaces
      do not apply directly to attributes.

      ...The default namespace can be set to the empty string. This has the
      same effect, within the scope of the declaration, of there being no
      default namespace.
    </quote>
  </para>

  <para>
     The term "empty" is not defined further, but in the context of the
     Recommendation, it must mean a missing string value.  The last
     fragment quoted above suggests, but does not require, that an
     empty string may be returned for an "empty" URI value.

     This has no direct applicability to values returned by implemenations,
     since
       1) the word "can" is used, rather than "must", and
       2) the Recommendation seems to apply to XML documents,
          not to implementations.
  </para>

  <para title='W3C DOM Level 2 Recommendation'>
    The W3C DOM Level 2 Recommendation refers to "null" namespaces in
    several places.  The thrust is clear and consistent: a "null" value
    is to be used to indicate a non-existent namespace URI value. Here
    are some relevant extracts from the Recommendation:

     <quote>Note that because the DOM does no lexical checking, the
       empty string will be treated as a real namespace URI in DOM Level 2
       methods. Applications must use the value null as the namespaceURI
       parameter for methods if they wish to have no namespace.
     </quote>
  </para>

  <para>
    The IDL definition for the createAttributeNS() method creates an
    attribute with these characteristics:
     <quote>
        A new Attr object with the following attributes:
Attribute    Value
Node.nodeName    qualifiedName
Node.namespaceURI   namespaceURI
Node.prefix    prefix, extracted from qualifiedName,
                                    or null if there is no prefix
Node.localName    local name, extracted from qualifiedName
Attr.name    qualifiedName
Node.nodeValue    the empty string
     </quote>
  </para>

  <para>For the older, non-NS aware createAttribute() method, the
Recommendation says
    <quote>...localName, prefix, and namespaceURI set to null. </quote>
  </para>

  <para>This is typical - a "null" is returned of there is no prefix or
URI.</para>

  <para>It is clear that the IDL specifies the use of "null" for empty
namespaces,
    rather that the empty string.  The java binding does not specify any
particular
    way value.
  </para>

  <para>
    Thus there seems to be nothing the the DOM Recommendation that suggests
that
    empty strings should be used, and there is clear language that "null"
values
    should be used.
  </para>

  <para title='SAX2'>
    The SAX2 java API clearly says that an empty string is to be
    returned.  The following extracts demonstrate this:

    <quote>In SAX2, the startElement and endElement callbacks in a content
handler
      look like this:
            public void startElement (String uri, String localName,
                 String qName, Attributes atts)
                 throws SAXException;

            public void endElement (String uri, String localName, String
qName)
                   throws SAXException;
      By default, an XML reader will report a Namespace URI and a local name
for
      every element, in both the start and end handler. Consider the following
      example:
        <html:hr xmlns:html="http://www.w3.org/1999/xhtml"/>
      With the default SAX2 Namespace processing, the XML reader would report
      a start and end element event with the Namespace URI
      "http://www.w3.org/1999/xhtml" and the local name "hr". The XML
       reader might also report the original qName "html:hr", but that
       parameter might simply be an empty string.
    </quote>

     <quote>
        <h:hello xmlns:h="http://www.greeting.com/ns/" id="a1"
h:person="David"/>
        If namespaces is true and namespace-prefixes is true,
        then a SAX2 XML reader will report the following:
           an element with the Namespace URI "http://www.greeting.com/ns/",
           the local name "hello", and the qName "h:hello";
           an attribute with no Namespace URI (empty string),
             no local name (empty string), and the qName "xmlns:h";
           an attribute with no Namespace URI (empty string), the
             local name "id", and the qName "id"; and an attribute
             with the Namespace URI "http://www.greeting.com/ns/",
             the local name "person", and the qName "h:person".
     </quote>
  </para>

  <para title='Discussion of The Three Recommendations'>
    To summarize, the Namespace Recommendation is essentially silent
    on the subject, the DOM clearly specifies "null" values, and SAX2
    clearly specifies the use of empty strings.
  </para>

  <para>

  </para>

  <para title='Arguments Favoring the Use of "None"'>
   The "highest" level Recommendation is presumably the DOM.
   Python offers a data object similar to "null" - the None object.
   The None object can be tested for exactly as for an empty string:

    <code>if uri:
              doYourThing()
    </code>

   Alternatively, None can be tested for explicitly, as in:

    <code>if uri is not None:
                  doYourThing()
    </code>

   Thus, None is flexible enough to be useful for this purpose.
  </para>

  <para>
    Many posts to the PyXML list have favored the use of None,
    although not all.  Either None or the empty string would seem to
    work in this context.  "None" agrees with the DOM Recommendation,
    and would seem (in a mnemonic sense)to suggest the absence of
    a prefix or URI.
  </para>

  <para title='4DOM Handling of None URIs and Prefixes'>
    The 4DOM code will handle a None URI correctly in many places,
     since it uses tests like this typical example:

      <code>
          if namespaceURI and namespaceURI != XML_NAMESPACE:
            # ...
      </code>

    This code works correctly if the namespaceURI is None.

  <para>Another test used in 4DOM is as follows:

    <code>def getElementsByTagNameNS(self,namespaceURI,localName):
        root = self.documentElement
        if root == None:
            return implementation.createNodeList([])
        py = root.getElementsByTagNameNS(namespaceURI,localName)
        if namespaceURI == '*' or namespaceURI == root.namespaceURI:
            if localName == '*' or localName == root.localName:
                py.insert(0,root)
        return py
     </code>

    The expression "namespaceURI == '*'" also evaluates correctly when
    the URI is None.
  </para>

  <para>If handling code is consistent throughout 4DOM, then it will handle
     None correctly.
  </para>

  <para title='SAX2'>
   [Need material here]
  </para>

 </rationale>
 <reference_implementation>[Should there be a reference here to one
  particular processor, such as xmlproc?]
 </reference_implementation>
 <notes></notes>
 <references></references>
 <copyright>This PEP may be used by anyone.</copyright>
</xmlpep>