Re: [Python-Dev] Support for "wide" Unicode characters

July 1, 2001

      Guido van Rossum wrote:
...
...
<PEP: 261>
The problem I have with this PEP is that it is a compile time option
which makes it hard to work with both 32 bit and 16 bit strings in one
program. Can not the 32 bit string type be introduced as an additional type?
Not without an outrageous amount of additional coding (every place in
the code that currently uses PyUnicode_Check() would have to be
bifurcated in a 16-bit and a 32-bit variant).
Alternatively, a Unicode object could *internally* be either 8, 16 or 32 bits
wide (to be clear: not per character, but per string). Also a lot of work, but
it'll be a lot less wasteful.
...
I doubt that the desire to work with both 16- and 32-bit characters in
one program is typical for folks using Unicode -- that's mostly
limited to folks writing conversion tools.  Python will offer the
necessary codecs so you shouldn't have this need very often.
Not a lot of people will want to work with 16 or 32 bit chars directly, but I
think a less wasteful solution to the surrogate pair problem *will* be desired
by people. Why use 32 bits for all strings in a program when only a tiny
percentage actually *needs* more than 16? (Or even 8...)
...
...
Iteration through the code units in a string is a problem waiting to bite
you and string APIs should encourage behaviour which is correct when faced
with variable width characters, both DBCS and UTF style.
But this is not the Unicode philosophy.  All the variable-length
character manipulation is supposed to be taken care of by the codecs,
and then the application can deal in arrays of characteres.
Right: this is the way it should be.

My difficulty with PEP 261 is that I'm afraid few people will actually enable
32-bit support (*what*?! all unicode strings become 32 bits wide? no way!),
therefore making programs non-portable in very subtle ways.

Just

Re: [Python-Dev] Support for "wide" Unicode characters

Just van Rossum