[Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes)

Bruce Leban bruce at leapyear.org
Tue Jul 9 18:51:07 CEST 2013


On Mon, Jul 8, 2013 at 10:30 PM, Stephen J. Turnbull <stephen at xemacs.org>wrote:

>
> Why is indexing a string and returning a grapheme a common case?  I
> would think the common case would be indexing or iterating over a
> grapheme sequence.  At least, if we provided such a type, it would
> be.[1]
>

If you want to do any operation on the clusters other than in iteration
order, without indexed access you're going to end up doing
list(grapheme_clusters(...)) first to give you indexed access. Maybe that's
the right thing to do sometimes but I wouldn't force it on people. The
string already provides indexed access but I need to know cluster
boundaries.

Note that str.find returns an int, not the found string. What do I do with
that index if I can't extract clusters in the middle?

Imagine you're writing code that works on English words. Would the only api
you provide be one that iterates over the words? How would you write the
function that finds the word after 'the' in a string?

--- Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130709/6374dfbd/attachment.html>


More information about the Python-ideas mailing list