<div dir="ltr"><div><div><div>I think the API Bruce suggests, along with its module location in 'unicodedata' makes more sense than the iterator only.<br><br></div>But it seems to me that it would still be useful to explicitly break a string into its component clusters with a similar function. E.g.:<br>
<br></div> graphemes = unicodedata.grapheme_clusters(str) # Returns an iterator of strings, often single characters<br></div> for g in graphemes: ...<br><div><br><div><div><div><div><div class="gmail_extra">It wouldn't be very hard to implement 'grapheme_clusters' in terms of the API Bruce suggests, but I feel like it should have a standard name and API along with those others. Actually, I guess the implementation is just:<br>
<br></div><div class="gmail_extra"> def grapheme_clusters(s):<br></div> for i in range(len(str)):<br><div class="gmail_extra"> if i == unicodedata.grapheme_start(s, i):<br></div><div class="gmail_extra"><div class="gmail_extra">
yield unicodedata.grapheme_cluster(s, i)<br></div><br><div class="gmail_quote">On Mon, Jul 8, 2013 at 11:52 AM, Bruce Leban <span dir="ltr"><<a href="mailto:bruce@leapyear.org" target="_blank">bruce@leapyear.org</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><br></div><div>On Sun, Jul 7, 2013 at 3:29 AM, David Kendal <span dir="ltr"><<a href="mailto:me@dpk.io" target="_blank">me@dpk.io</a>></span> wrote:<div class="im">
<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Python provides a way to iterate characters of a string by using the string as an iterable. But there's no way to iterate over Unicode graphemes (a cluster of characters consisting of a base character plus a number of combining marks and other modifiers -- or what the human eye would consider to be one "character").<br>
<br>I think this ought to be provided either in the unicodedata library, (unicodedata.itergraphemes(string)) which exposes the character database information needed to make this work, or as a method on the built-in str type. (str.itergraphemes() or str.graphemes())</blockquote>
<div> </div></div></div><div>A common case is wanting to extract the current grapheme or move forward or backward one. Please consider these other use cases rather than just adding an iterator. </div><div><div><br></div>
</div>
<blockquote style="margin:0px 0px 0px 40px;border:medium none;padding:0px">
<div><div>g = unicodedata.grapheme_cluster(str, i) # extracts cluster that includes index i (i may be in the middle of the cluster)</div></div><div><div>i = unicodedata.grapheme_start(str, i) # if i is the start of the cluster, returns i; otherwise backs up to the start of the cluster</div>
</div><div><div>i = unicodedata.previous_cluster(str, i) # moves i to the first index of the previous cluster; returns None if no previous cluster in the string</div></div><div><div><div>i = unicodedata.next_cluster(str, i) # moves i to the first index of the next cluster; returns None if no next cluster in the String</div>
</div></div></blockquote><div><div><br></div><div>I think these belongs in unicodedata, not str.<div><br clear="all"><div><font face="arial, helvetica, sans-serif">--- Bruce<br></font><font face="arial, helvetica, sans-serif">I'm hiring: </font><a href="http://www.geekwork.com/opportunity/1225-job-software-developer-cadencemd" target="_blank">http://www.geekwork.com/opportunity/1225-job-software-developer-cadencemd</a><font face="arial, helvetica, sans-serif"><br>
</font><div><span style="font-family:arial,helvetica,sans-serif">Latest blog post: Alice's Puzzle Page </span><a href="http://www.vroospeak.com/" style="font-family:arial,helvetica,sans-serif" target="_blank">http://www.vroospeak.com</a></div>
<div><div><font face="arial, helvetica, sans-serif">Learn how hackers think: <a href="http://j.mp/gruyere-security" target="_blank">http://j.mp/gruyere-security</a></font></div></div></div>
<br><br></div></div></div>
<br>_______________________________________________<br>
Python-ideas mailing list<br>
<a href="mailto:Python-ideas@python.org">Python-ideas@python.org</a><br>
<a href="http://mail.python.org/mailman/listinfo/python-ideas" target="_blank">http://mail.python.org/mailman/listinfo/python-ideas</a><br>
<br></blockquote></div><br><br clear="all"><br>-- <br>Keeping medicines from the bloodstreams of the sick; food <br>from the bellies of the hungry; books from the hands of the <br>uneducated; technology from the underdeveloped; and putting <br>
advocates of freedom in prisons. Intellectual property is<br>to the 21st century what the slave trade was to the 16th.<br>
</div></div></div></div></div></div></div>