[Python-3000] How will unicode get used?

"Martin v. Löwis" martin at v.loewis.de
Sat Sep 23 14:01:56 CEST 2006

Gábor Farkas schrieb:
> while i understand the constraints, i think it's not a good decision to 
> leave this to be implementation-dependent.
> the strings seem to me as such a basic functionality, that it's 
> behaviour should not depend on the platform.
> for example, how is an application developer then supposed to write 
> their applications?

An application developer should always know what the target platforms
are. For example, does the code need to work with IronPython or not?
Python is not aiming at 100% portability at all costs. Many aspects
are platform dependent, and while this has complicated some
applications, is has simplified others (which could make use of
platform details that otherwise would not have been exposed to the
Python programmer).

> should he write his own slicing/whatever functions to get consistent 
> behaviour on linux/windows?

Depends on the application, and the specific slicing operations.
If the slicing appears in the processing of .ini files (say),
no platform-dependent slicing should be necessary.

> i think this is not just a 'theoretical' issue. it's a very practical 
> issue. the only reason why it does not seem to be important, because 
> currently not much of the non-16-bit unicode characters are used.

No, there is a deeper reason. A typical program only performs substring
operations on selected boundaries (such as whitespace, or punctuation).
Those are typically in the BMP (not sure whether *any* punctuation
is outside the BMP).

> but the same way i could say, that because most of the unix-world is 
> utf-8, for those pythons the best way is to handle it internally as 
> utf-8, couldn't i?

I think you live in a free country: you can certainly say that.
I think you would be wrong. The common on-disk/on-wire representation
of text should not influence the design of an in-memory representation.

> it simply seems to me strange to make compromises that makes the life of 
> the cpython-users harder, just to make the life for the 
> jython/ironpython developers (i mean the 'creators') easier.

Guido didn't say that the life of the CPython user needs to be hard.
He said it will be implementation-dependent, referring to Jython
and IronPython. Whether or not CPython uses a consistent representation
or consistent python-level experience across platforms is a different
issue. CPython could behave absolutely consistently, and use four-byte
Unicode on all systems, and the length of a non-BMP string would
still be implementation-defined.


More information about the Python-3000 mailing list