[Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes)

M.-A. Lemburg mal at egenix.com
Mon Jul 8 13:54:22 CEST 2013


On 07.07.2013 12:29, David Kendal wrote:
> Hi,
> 
> Python provides a way to iterate characters of a string by using the string as an iterable. But there's no way to iterate over Unicode graphemes (a cluster of characters consisting of a base character plus a number of combining marks and other modifiers -- or what the human eye would consider to be one "character").
> 
> I think this ought to be provided either in the unicodedata library, (unicodedata.itergraphemes(string)) which exposes the character database information needed to make this work, or as a method on the built-in str type. (str.itergraphemes() or str.graphemes())
> 
> Below is my own implementation of this as a generator, as an example and for reference.
> 
> ---
> import unicodedata
> 
> def itergraphemes(string):
>     def ismodifier(char): return unicodedata.category(char)[0] == 'M'
>     start = 0
>     for end, char in enumerate(string):
>         if not ismodifier(char) and not start == end:
>             yield string[start:end]
>             start = end
>     yield string[start:]
> ---

Sounds like a good idea.

Could you open a ticket for this to hash out the details ?

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jul 08 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-07-16: Python Meeting Duesseldorf ...                  8 days to go

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


More information about the Python-ideas mailing list