[Python-3000] String comparison
Stephen J. Turnbull
stephen at xemacs.org
Sun Jun 10 10:03:19 CEST 2007
Rauli Ruohonen writes:
> On 6/9/07, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> > Rauli Ruohonen writes:
> > > The ones it absolutely prohibits in interchange are surrogates.
> >
> > Excuse me? Surrogates are code points with a specific interpretation
> > if it is "purported that the stream is in UTF-16". Otherwise, Unicode
> > 4.0 explicitly says that there is nothing illegal about an isolated
> > surrogate (p.75, where an example is given of how such a surrogate
> > might occur).
>
> I meant interchange instead of strings. Anything is allowed in
> strings.
I think you misunderstand. Anything in Unicode that is normative is
about interchange. Strings are also a means of interchange---between
modules (separate Unicode processes) in a program (single OS process).
Python language and library implementation is going to be primarily
concerned with interchange in the intermodule sense.
Your complaint about Python mixing "pseudo-UTF-16" with "pseudo-UCS-2"
is precisely a statement that various modules in Python do not specify
what encoding forms they purport to accept or emit. The purpose of
the definitions in chapter 3 is to clarify the requirements of
conformance. The discussion of strings is implicitly about
interchange, otherwise it would be somewhere else than the chapter
about conformance.
> My understanding is that it is a goal, but practicality beats purity.
> I think the only disagreement is on what's practical.
It is not a goal of the *language*; there is no object in the
*language* that we can say is buggy if it doesn't conform to the
Unicode standard. Unicode conformance for Python, as of today, is a
WIBNI.
As Guido points out, the goal is a language that can be used to write
efficient implementations of Unicode *if the users want to pay that
cost*, not to provide an implementation so the users don't have to.
More information about the Python-3000
mailing list