[Python-Dev] Divorcing str and unicode (no more implicit conversions).
Neil Hodgson
nyamatongwe at gmail.com
Mon Oct 24 05:41:50 CEST 2005
Guido van Rossum:
> Folks, please focus on what Python 3000 should do.
>
> I'm thinking about making all character strings Unicode (possibly with
> different internal representations a la NSString in Apple's Objective
> C) and introduce a separate mutable bytes array data type. But I could
> use some validation or feedback on this idea from actual
> practitioners.
I'd like to more tightly define Unicode strings for Python 3000.
Currently, Unicode strings may be implemented with either 2 byte
(UCS-2) or 4 byte (UTF-32) elements. Python should allow strings to
contain any Unicode character and should be indexable yielding
characters rather than half characters. Therefore Python strings
should appear to be UTF-32. There could still be multiple
implementations (using UTF-16 or UTF-8) to preserve space but all
implementations should appear to be the same apart from speed and
memory use.
Neil
More information about the Python-Dev
mailing list