[Python-Dev] PEP 393 Summer of Code Project

Fri Aug 26 19:13:42 CEST 2011

On 26 August 2011 17:51, Guido van Rossum <guido at python.org> wrote:
> On Fri, Aug 26, 2011 at 2:29 AM, "Martin v. Löwis" <martin at v.loewis.de> wrote:

(Regarding my comments on code point semantics)

>> You seem to assume it is ok for Jython/IronPython to provide indexing in
>> O(n). It is not.
>
> Indeed.

On 26 August 2011 18:02, Guido van Rossum <guido at python.org> wrote:

> Eek. No, please. Those platforms' native string types have length and
> slicing operations that are O(1) and work in terms of 16-bit code
> points. Python should use those. It would be awful if Java and Python
> code doing the same manipulations on the same string would come to
> different conclusions because Python tried to paper over surrogates.

*That* is actually the erroneous assumption I had made - that the Java
and .NET native string type had code point semantics (i.e., took
surrogates into account). As that isn't the case, my comments aren't
valid - and I agree that having common semantics (and hence exposing
surrogates) is too important to lose.

On the other hand, that pretty much establishes that whatever PEP 393
achieves in terms of allowing all builds of CPython to offer code
point semantics, the language definition can't mandate it.

Thanks for the clarification.
Paul.