Python Unicode handling wins again -- mostly

Ethan Furman ethan at
Tue Dec 3 15:26:45 CET 2013

On 12/02/2013 12:38 PM, Ethan Furman wrote:
> On 11/29/2013 04:44 PM, Steven D'Aprano wrote:
>> Out of the nine tests, Python 3.3 passes six, with three tests being
>> failures or dubious. If you believe that the native string type should
>> operate on code-points, then you'll think that Python does the right
>> thing.
> I think Python is doing it correctly.  If I want to operate on "clusters" I'll normalize the string first.

Hrmm, well, after being educated ;) I think I may have to reverse my position.  Given that not every cluster can be 
normalized to a single code point perhaps Python is doing it the best possible way.  On the other hand, we have a 
uni*code* type, not a uni*char* type.  Maybe 3.5 can have that.  ;)

At any rate, definitely good to be aware of the issue.


