tuples, index method, Python's design

Rhamphoryncus rhamph at gmail.com
Fri Apr 13 14:38:54 EDT 2007


On Apr 13, 1:32 am, Antoon Pardon <apar... at forel.vub.ac.be> wrote:
> Suppose someone writes a function that acts on a sequence.
> The algorithm used depending on the following invariant.
>
>   i = s.index(e) => s[i] = e
>
> Then this algorithm is no longer guaranteed to work with strings.

It never worked correctly on unicode strings anyway (which becomes the
canonical string in python 3.0).  The base unit exposed by the
implementation is rarely what you want to operate upon.

The terminology is pretty confusing, but let's see if I can lay out
the relationships here:

byte ≤ code unit ≤ code point ≤ scalar value ≤ grapheme cluster ~
character ≤ syllable ≤ word ≤ sentence ≤ paragraph

"12" in "123" allows you to handle bytes through scalar values the
same way, glossing over the implementation details (such as UTF-32 on
linux and UTF-16 on windows).

--
Adam Olsen, aka Rhamphoryncus




More information about the Python-list mailing list