Python Unicode handling wins again -- mostly
ned at nedbatchelder.com
Mon Dec 2 22:14:13 CET 2013
On 12/2/13 3:38 PM, Ethan Furman wrote:
> On 11/29/2013 04:44 PM, Steven D'Aprano wrote:
>> Out of the nine tests, Python 3.3 passes six, with three tests being
>> failures or dubious. If you believe that the native string type should
>> operate on code-points, then you'll think that Python does the right
> I think Python is doing it correctly. If I want to operate on
> "clusters" I'll normalize the string first.
> Thanks for this excellent post.
This is where my knowledge about Unicode gets fuzzy. Isn't it the case
that some grapheme clusters (or whatever the right word is) can't be
normalized down to a single code point? Characters can accept many
accents, for example. In that case, you can't always normalize and use
the existing string methods, but would need more specialized code.
More information about the Python-list