[Python-ideas] unicodedata.itergraphemes (or str.itergraphemes / str.graphemes)
M.-A. Lemburg
mal at egenix.com
Tue Jul 9 09:16:43 CEST 2013
On 08.07.2013 20:52, Bruce Leban wrote:
> On Sun, Jul 7, 2013 at 3:29 AM, David Kendal <me at dpk.io> wrote:
>
>> Python provides a way to iterate characters of a string by using the
>> string as an iterable. But there's no way to iterate over Unicode graphemes
>> (a cluster of characters consisting of a base character plus a number of
>> combining marks and other modifiers -- or what the human eye would consider
>> to be one "character").
>>
>> I think this ought to be provided either in the unicodedata library,
>> (unicodedata.itergraphemes(string)) which exposes the character database
>> information needed to make this work, or as a method on the built-in str
>> type. (str.itergraphemes() or str.graphemes())
>
>
> A common case is wanting to extract the current grapheme or move forward or
> backward one. Please consider these other use cases rather than just adding
> an iterator.
>
> g = unicodedata.grapheme_cluster(str, i) # extracts cluster that includes
> index i (i may be in the middle of the cluster)
> i = unicodedata.grapheme_start(str, i) # if i is the start of the cluster,
> returns i; otherwise backs up to the start of the cluster
> i = unicodedata.previous_cluster(str, i) # moves i to the first index of
> the previous cluster; returns None if no previous cluster in the string
> i = unicodedata.next_cluster(str, i) # moves i to the first index of the
> next cluster; returns None if no next cluster in the String
>
>
> I think these belongs in unicodedata, not str.
FWIW: Here's a pre-PEP I once wrote for these things:
http://mail.python.org/pipermail/python-dev/2001-July/015938.html
At the time there was little interest, so I dropped the idea.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Jul 09 2013)
>>> Python Projects, Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
2013-07-16: Python Meeting Duesseldorf ... 7 days to go
::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
More information about the Python-ideas
mailing list