[Python-3000] How will unicode get used?

Adam Olsen rhamph at gmail.com
Wed Sep 20 14:55:56 CEST 2006


Before we can decide on the internal representation of our unicode
objects, we need to decide on their external interface.  My thoughts
so far:

* Most transformation and testing methods (.lower(), .islower(), etc)
can be copied directly from 2.x.  They require no special
implementation to perform reasonably.
* Indexing and slicing is the big issue.  Do we need constant-time
integer slicing?  .find() could be changed to return a token that
could be used as a constant-time offset.  Incrementing the token would
have linear costs, but that's no big deal if the offsets are always
small.
* Grapheme clusters, words, lines, other groupings, do we need/want
ways to slice based on them too?
* Cheap slicing and concatenation (between O(1) and O(log(n))), do we
want to support them?  Now would be the time.

-- 
Adam Olsen, aka Rhamphoryncus


More information about the Python-3000 mailing list