
Paul Prescod: <PEP: 261> The problem I have with this PEP is that it is a compile time option which makes it hard to work with both 32 bit and 16 bit strings in one program. Can not the 32 bit string type be introduced as an additional type?
Are we going to change chr() and unichr() to one_element_string() and unicode_one_element_string()
u[i] is a character. If u is Unicode, then u[i] is a Python Unicode character.
This wasn't usefully true in the past for DBCS strings and is not the right way to think of either narrow or wide strings now. The idea that strings are arrays of characters gets in the way of dealing with many encodings and is the primary difficulty in localising software for Japanese. Iteration through the code units in a string is a problem waiting to bite you and string APIs should encourage behaviour which is correct when faced with variable width characters, both DBCS and UTF style. Iteration over variable width characters should be performed in a way that preserves the integrity of the characters. M.-A. Lemburg's proposed set of iterators could be extended to indicate encoding "for c in s.asCharacters('utf-8')" and to provide for the various intended string uses such as "for c in s.inVisualOrder()" reversing the receipt of right-to-left substrings. Neil