Python Unicode handling wins again -- mostly
ben+python at benfinney.id.au
Mon Dec 2 23:56:57 CET 2013
Ned Batchelder <ned at nedbatchelder.com> writes:
> This is where my knowledge about Unicode gets fuzzy. Isn't it the
> case that some grapheme clusters (or whatever the right word is) can't
> be normalized down to a single code point? Characters can accept many
> accents, for example.
That's true, but doesn't affect the point being made: that one can have
both “sequence of Unicode code points” in Python's ‘unicode’ (now ‘str’)
type, and also deal with “sequence of text the reader will see”.
> In that case, you can't always normalize and use the existing string
> methods, but would need more specialized code.
Specialised code may not be needed. It will at least be true that “any
two code-point sequences which normalise to the same value will be
visually the same for the reader”, which is an important assertion for
addressing the complaints from Mortoray's article.
\ “Pray, v. To ask that the laws of the universe be annulled in |
`\ behalf of a single petitioner confessedly unworthy.” —Ambrose |
_o__) Bierce, _The Devil's Dictionary_, 1906 |
More information about the Python-list