[Python-Dev] Support for "wide" Unicode characters

Paul Prescod paulp@ActiveState.com
Sun, 01 Jul 2001 11:19:17 -0700

Just van Rossum wrote:
> Guido van Rossum wrote:
> > > <PEP: 261>
> > >
> > >    The problem I have with this PEP is that it is a compile time option
> > > which makes it hard to work with both 32 bit and 16 bit strings in one
> > > program. Can not the 32 bit string type be introduced as an additional type?
> >
> > Not without an outrageous amount of additional coding (every place in
> > the code that currently uses PyUnicode_Check() would have to be
> > bifurcated in a 16-bit and a 32-bit variant).
> Alternatively, a Unicode object could *internally* be either 8, 16 or 32 bits
> wide (to be clear: not per character, but per string). Also a lot of work, but
> it'll be a lot less wasteful.

I hope this is where we end up one day. But the compile-time option is
better than where we are today. Even though PEP 261 is not my favorite
solution, it buys us a couple of years of wait-and-see time.

Consider that computer memory is growing much faster than textual data.
People's text processing techniques get more and more "wasteful" because
it is now almost always possible to load the entire "text" into memory
at once. I remember how some text editors used to boast that they only
loaded your text "on demand". 

Maybe so much data will be passed to us from UCS-4 APIs that trying to
"compress it" will actually be inefficient.

Maybe two years from now Guido will make UCS-4 the default and only a
tiny minority will notice or care.

> ...
> My difficulty with PEP 261 is that I'm afraid few people will actually enable
> 32-bit support (*what*?! all unicode strings become 32 bits wide? no way!),
> therefore making programs non-portable in very subtle ways.

It really depends on what the default build option is.
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook