[Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes)
M.-A. Lemburg
mal at egenix.com
Mon Jul 8 13:54:22 CEST 2013
On 07.07.2013 12:29, David Kendal wrote:
> Hi,
>
> Python provides a way to iterate characters of a string by using the string as an iterable. But there's no way to iterate over Unicode graphemes (a cluster of characters consisting of a base character plus a number of combining marks and other modifiers -- or what the human eye would consider to be one "character").
>
> I think this ought to be provided either in the unicodedata library, (unicodedata.itergraphemes(string)) which exposes the character database information needed to make this work, or as a method on the built-in str type. (str.itergraphemes() or str.graphemes())
>
> Below is my own implementation of this as a generator, as an example and for reference.
>
> ---
> import unicodedata
>
> def itergraphemes(string):
> def ismodifier(char): return unicodedata.category(char)[0] == 'M'
> start = 0
> for end, char in enumerate(string):
> if not ismodifier(char) and not start == end:
> yield string[start:end]
> start = end
> yield string[start:]
> ---
Sounds like a good idea.
Could you open a ticket for this to hash out the details ?
Thanks,
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Jul 08 2013)
>>> Python Projects, Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
2013-07-16: Python Meeting Duesseldorf ... 8 days to go
::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
More information about the Python-ideas
mailing list