Support for "wide" Unicode characters
The problem I have with this PEP is that it is a compile time option which makes it hard to work with both 32 bit and 16 bit strings in one program.
Can you elaborate why you think this is a problem?
Can not the 32 bit string type be introduced as an additional type?
Yes, but not just "like that". You'd have to define an API for creating values of this type, you'd have to teach all functions which ought to accept it to process it, you'd have to define conversion operations and all that: In short, you'd have to go through all the trouble that introduction of the Unicode type gave us once again. Also, I cannot see any advantages in introducing yet another type. Implementing this PEP is straight forward, and with almost no visible effect to Python programs. People have suggested to make it a run-time decision, having the internal representation switch on demand, but that would give an API nightmare for C code that has to access such values.
u[i] is a character. If u is Unicode, then u[i] is a Python Unicode character.
This wasn't usefully true in the past for DBCS strings and is not the right way to think of either narrow or wide strings now. The idea that strings are arrays of characters gets in the way of dealing with many encodings and is the primary difficulty in localising software for Japanese.
While I don't know much about localising software for Japanese (*), I agree that 'u[i] is a character' isn't useful to say in many cases. If this is the old Python string type, I'd much prefer calling u[i] a 'byte'. Regards, Martin (*) Methinks that the primary difficulty still is translating all the documentation, and messages. Actually, keeping the translations up-to-date is even more challenging.
participants (2)
-
Martin von Loewis
-
Neil Hodgson