
Just van Rossum wrote:
Guido van Rossum wrote:
<PEP: 261>
The problem I have with this PEP is that it is a compile time option which makes it hard to work with both 32 bit and 16 bit strings in one program. Can not the 32 bit string type be introduced as an additional type?
Not without an outrageous amount of additional coding (every place in the code that currently uses PyUnicode_Check() would have to be bifurcated in a 16-bit and a 32-bit variant).
Alternatively, a Unicode object could *internally* be either 8, 16 or 32 bits wide (to be clear: not per character, but per string). Also a lot of work, but it'll be a lot less wasteful.
I hope this is where we end up one day. But the compile-time option is better than where we are today. Even though PEP 261 is not my favorite solution, it buys us a couple of years of wait-and-see time. Consider that computer memory is growing much faster than textual data. People's text processing techniques get more and more "wasteful" because it is now almost always possible to load the entire "text" into memory at once. I remember how some text editors used to boast that they only loaded your text "on demand". Maybe so much data will be passed to us from UCS-4 APIs that trying to "compress it" will actually be inefficient. Maybe two years from now Guido will make UCS-4 the default and only a tiny minority will notice or care.
... My difficulty with PEP 261 is that I'm afraid few people will actually enable 32-bit support (*what*?! all unicode strings become 32 bits wide? no way!), therefore making programs non-portable in very subtle ways.
It really depends on what the default build option is. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook