Python Unicode handling wins again -- mostly

Ned Batchelder ned at
Mon Dec 2 22:14:13 CET 2013

On 12/2/13 3:38 PM, Ethan Furman wrote:
> On 11/29/2013 04:44 PM, Steven D'Aprano wrote:
>> Out of the nine tests, Python 3.3 passes six, with three tests being
>> failures or dubious. If you believe that the native string type should
>> operate on code-points, then you'll think that Python does the right
>> thing.
> I think Python is doing it correctly.  If I want to operate on
> "clusters" I'll normalize the string first.
> Thanks for this excellent post.
> --
> ~Ethan~

This is where my knowledge about Unicode gets fuzzy.  Isn't it the case 
that some grapheme clusters (or whatever the right word is) can't be 
normalized down to a single code point?  Characters can accept many 
accents, for example.  In that case, you can't always normalize and use 
the existing string methods, but would need more specialized code.


More information about the Python-list mailing list