Paul Prescod:
Are we going to change chr() and unichr() to one_element_string() and unicode_one_element_string()
u[i] is a character. If u is Unicode, then u[i] is a Python Unicode character.
This wasn't usefully true in the past for DBCS strings and is not the right way to think of either narrow or wide strings now. The idea that strings are arrays of characters gets in the way of dealing with many encodings and is the primary difficulty in localising software for Japanese. Iteration through the code units in a string is a problem waiting to bite you and string APIs should encourage behaviour which is correct when faced with variable width characters, both DBCS and UTF style. Iteration over variable width characters should be performed in a way that preserves the integrity of the characters. M.-A. Lemburg's proposed set of iterators could be extended to indicate encoding "for c in s.asCharacters('utf-8')" and to provide for the various intended string uses such as "for c in s.inVisualOrder()" reversing the receipt of right-to-left substrings. Neil