Python Unicode handling wins again -- mostly

Ben Finney ben+python at
Mon Dec 2 23:56:57 CET 2013

Ned Batchelder <ned at> writes:

> This is where my knowledge about Unicode gets fuzzy.  Isn't it the
> case that some grapheme clusters (or whatever the right word is) can't
> be normalized down to a single code point?  Characters can accept many
> accents, for example.

That's true, but doesn't affect the point being made: that one can have
both “sequence of Unicode code points” in Python's ‘unicode’ (now ‘str’)
type, and also deal with “sequence of text the reader will see”.

> In that case, you can't always normalize and use the existing string
> methods, but would need more specialized code.

Specialised code may not be needed. It will at least be true that “any
two code-point sequences which normalise to the same value will be
visually the same for the reader”, which is an important assertion for
addressing the complaints from Mortoray's article.

