[Python-Dev] PEP 393 Summer of Code Project

Guido van Rossum guido at python.org
Fri Aug 26 18:51:00 CEST 2011


On Fri, Aug 26, 2011 at 2:29 AM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> IronPython and Jython can retain UTF-16 as their native form if that
>> makes interop cleaner, but in doing so they need to ensure that basic
>> operations like indexing and len work in terms of code points, not
>> code units, if they are to conform.
>
> That means that they won't conform, period. There is no efficient
> maintainable implementation strategy to achieve that property, and
> it may take well years until somebody provides an efficient
> unmaintainable implementation.
>
>> Does this make sense, or have I completely misunderstood things?
>
> You seem to assume it is ok for Jython/IronPython to provide indexing in
> O(n). It is not.

Indeed.

> However, non-conformance may not be that much of an issue. They do not
> conform in many other aspects, either (such as not supporting Python 3,
> for example, or not supporting the C API) that they may well chose to
> ignore such a minor requirement if there was one. For BMP strings,
> they conform fine, and it may well be that Jython eithers either don't
> have non-BMP strings, or don't care whether len() or indexing of their
> non-BMP strings is "correct".

I think this is fine. I had been hoping that all Python
implementations claiming compatibility with version 3.3 of the
language reference would be free of worries about surrogates, but it
simply doesn't make sense.

And yes, I'm well aware that PEP 393 is only for CPython. It's just
that I had hoped that it would get rid of some of Tom C's specific
complaints for all Python implementations; but it really seems
impossible to do so.

One consequence may be that the standard library, to the extent it is
shared by other implementations, may still have to worry about
surrogates and other issues inherent in narrow builds or other
16-bit-based string types. We'll cross that bridge when we get to it.

-- 
--Guido van Rossum (python.org/~guido)


More information about the Python-Dev mailing list