[Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes)

Masklinn masklinn at masklinn.net
Tue Jul 9 10:31:27 CEST 2013


On 2013-07-09, at 07:30 , Stephen J. Turnbull wrote:

> Bruce Leban writes:
> 
>> On Sun, Jul 7, 2013 at 3:29 AM, David Kendal <me at dpk.io> wrote:
>>> But there's no way to iterate over Unicode graphemes
> 
>>  A common case is wanting to extract the current grapheme or move
>> forward or backward one.  Please consider these other use cases
>> rather than just adding an iterator.
> 
>>   g = unicodedata.grapheme_cluster(str, i)
>>    # extracts cluster that includes index i (i may be in the middle
>>   # of the cluster)
> 
> Why is indexing a string and returning a grapheme a common case?

I don't know about that but I do know NSString provides two messages
for that (one takes an index in a string and returns the corresponding
grapheme boundaries — rangeOfComposedCharacterSequenceAtIndex:; and
the other takes a range and returns the range of all composing graphemes
— rangeOfComposedCharacterSequencesForRange:).

Of course that might just be because it does not provide a higher-level
iterator on graphemes.


More information about the Python-ideas mailing list