[Python-Dev] PEP 393 Summer of Code Project

Guido van Rossum guido at python.org
Wed Aug 31 20:56:03 CEST 2011


On Wed, Aug 31, 2011 at 11:51 AM, Glenn Linderman <v+python at g.nevcal.com>wrote:

>  On 8/31/2011 10:12 AM, Guido van Rossum wrote:
>
> On Wed, Aug 31, 2011 at 1:09 AM, Glenn Linderman <v+python at g.nevcal.com> <v+python at g.nevcal.com> wrote:
>
>  So from reading all this discussion, I think this point is rather a key
> one... and it has been made repeatedly in different ways:  Arrays are not
> suitable for manipulating Unicode character sequences, and the str type is
> an array with a veneer of text manipulation operations, which do not, and
> cannot, by themselves, efficiently implement Unicode character sequences.
>
>  I think this is too strong. The str type is indeed an array, and you
> can build useful Unicode manipulation APIs on top of it. Just like
> bytes are not UTF-8, but can be used to represent UTF-8 and a
> fully-compliant UTF-8 codec can be implemented on top of it.
>
>
>
> This statement is a logical conclusion of arguments presented in this
> thread.
>
> 1) Applications that wish to do grapheme access, wish to do it by grapheme
> array indexing, because that is the efficient way to do it.
>

I don't believe that should be taken as gospel. In Perl, they don't do array
indexing on strings at all, and use regex matching instead. An API that uses
some kind of cursor on a string might work fine in Python too (for grapheme
matching).

2) As long as str is restricted to holding Unicode code units or code
> points, then it cannot support grapheme array indexing efficiently.
>
> I  have not declared that useful Unicode manipulations APIs cannot be built
> on top of str, only that efficiency will suffer.
>

But you have not proven it.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110831/1ecbcee5/attachment.html>


More information about the Python-Dev mailing list