[Python-3000] String comparison

Stephen J. Turnbull stephen at xemacs.org
Sun Jun 10 10:03:19 CEST 2007


Rauli Ruohonen writes:

 > On 6/9/07, Stephen J. Turnbull <stephen at xemacs.org> wrote:
 > > Rauli Ruohonen writes:
 > >  > The ones it absolutely prohibits in interchange are surrogates.
 > >
 > > Excuse me?  Surrogates are code points with a specific interpretation
 > > if it is "purported that the stream is in UTF-16".  Otherwise, Unicode
 > > 4.0 explicitly says that there is nothing illegal about an isolated
 > > surrogate (p.75, where an example is given of how such a surrogate
 > > might occur).
 > 
 > I meant interchange instead of strings. Anything is allowed in
 > strings.

I think you misunderstand.  Anything in Unicode that is normative is
about interchange.  Strings are also a means of interchange---between
modules (separate Unicode processes) in a program (single OS process).
Python language and library implementation is going to be primarily
concerned with interchange in the intermodule sense.

Your complaint about Python mixing "pseudo-UTF-16" with "pseudo-UCS-2"
is precisely a statement that various modules in Python do not specify
what encoding forms they purport to accept or emit.  The purpose of
the definitions in chapter 3 is to clarify the requirements of
conformance.  The discussion of strings is implicitly about
interchange, otherwise it would be somewhere else than the chapter
about conformance.

 > My understanding is that it is a goal, but practicality beats purity.
 > I think the only disagreement is on what's practical.

It is not a goal of the *language*; there is no object in the
*language* that we can say is buggy if it doesn't conform to the
Unicode standard.  Unicode conformance for Python, as of today, is a
WIBNI.

As Guido points out, the goal is a language that can be used to write
efficient implementations of Unicode *if the users want to pay that
cost*, not to provide an implementation so the users don't have to.



More information about the Python-3000 mailing list