On 8/31/2011 10:12 AM, Guido van Rossum wrote:

On Wed, Aug 31, 2011 at 1:09 AM, Glenn Linderman <v+python@g.nevcal.com> wrote:

So from reading all this discussion, I think this point is rather a key
one... and it has been made repeatedly in different ways:  Arrays are not
suitable for manipulating Unicode character sequences, and the str type is
an array with a veneer of text manipulation operations, which do not, and
cannot, by themselves, efficiently implement Unicode character sequences.

I think this is too strong. The str type is indeed an array, and you
can build useful Unicode manipulation APIs on top of it. Just like
bytes are not UTF-8, but can be used to represent UTF-8 and a
fully-compliant UTF-8 codec can be implemented on top of it.

This statement is a logical conclusion of arguments presented in this thread.

1) Applications that wish to do grapheme access, wish to do it by grapheme array indexing, because that is the efficient way to do it.

2) As long as str is restricted to holding Unicode code units or code points, then it cannot support grapheme array indexing efficiently.

I have not declared that useful Unicode manipulations APIs cannot be built on top of str, only that efficiency will suffer.