[Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes)

Terry Reedy tjreedy at udel.edu
Tue Jul 9 23:17:31 CEST 2013


On 7/9/2013 12:51 PM, Bruce Leban wrote:

> If you want to do any operation on the clusters other than in iteration
> order, without indexed access you're going to end up doing
> list(grapheme_clusters(...)) first to give you indexed access. Maybe
> that's the right thing to do sometimes but I wouldn't force it on
> people. The string already provides indexed access but I need to know
> cluster boundaries.

I think the best alternative to a list subclass of grapheme substrings 
(a subclass so can add methods), might be a GraphemeSeq wrapper class 
that contains a string (perhaps in a known normal form) and a list of 
indexes to grapheme start positions. That would also allow 
grapheme-oriented methods. If not already done, either or both of these 
would be good pypi modules.

-- 
Terry Jan Reedy



More information about the Python-ideas mailing list