writing Unicode objects to XML

Steven Taschuk staschuk at telusplanet.net
Tue May 6 12:05:59 EDT 2003


Quoth Alex Martelli:
  [...]
> XML does not forbid programs from supplying all kinds of information,
> but that doesn't make that information part of XML.  And the infoset
> does define exactly what information IS or ISN'T part of XML -- it's
> "just a specific kind of equivalence" which happens to be THE official
> definition of what information IS "XML" and what isn't.

How about content models in element declarations in (say) the
internal DTD subset?  They're not in the infoset, so by the
argument above, they're not XML information.

But a difference in content models is hardly mere syntactic
variation.  On the contrary, validating processors are required to
interpret content models and, for example, distinguish whitespace
in element content from whitespace elsewhere on that basis -- and
that distinction *is* in the infoset.  That is, we can have two
documents which differ only in that they declare different content
models, and yet which (a class of) conforming processors must
treat differently, in ways that are visible to the application.

Thus it seems absurd to me to consider the content model of an
element not to be XML information, its omission from the infoset
notwithstanding.

Besides this example, in the infoset recommendation I find the
following statements which strike me as relevant to this question:

    [This specification's] purpose is to provide a consistent set of
    definitions for use in other specifications that need to refer to
    the information in a well-formed XML document.

    *It does not attempt to be exhaustive* [...]

        -- <http://www.w3.org/TR/xml-infoset/#intro> (emphasis added)

    Since the purpose of the Information Set is to provide a set of
    definitions, conformance is a property of specifications that use
    those definitions, rather than of implementations.

    Specifications referring to the Infoset must:
    [...]
    - Note any information required from an XML document that is not
      defined by the Infoset.

        -- <http://www.w3.org/TR/xml-infoset/#conformance>

These statements do not seem to me to support in any way the idea
"if it's not in the infoset, it's not XML".

> [...] Which is why I have
> to reach the conclusion that you're just being deliberately stubborn
> rather than sincerely believing you are making any sort of real
> contribution to the discussion.

*blink*

Fwiw, I assure you that is not my intent.

-- 
Steven Taschuk              Aral: "Confusion to the enemy, boy."
staschuk at telusplanet.net    Mark: "Turn-about is fair play, sir."
                             -- _Mirror Dance_, Lois McMaster Bujold





More information about the Python-list mailing list