[I18n-sig] Re: [Python-Dev] Unicode debate
Tom Emerson
tree@basistech.com
Tue, 2 May 2000 13:14:24 -0400 (EDT)
M.-A. Lemburg writes:
> The details are on the www.unicode.org web-site burried
> in some of the tech reports on normalization and
> collation.
This is described in the Unicode standard itself, and in UTR #15 and
UTR #10. Normalization is an issue with wider imlications than just
handling glyph variants: indeed, it's irrelevant.
The question is this: should
U+00DC LATIN CAPITAL LETTER U WITH DIAERESIS
compare equal to
U+0055 LATIN CAPITAL LETTER U
U+0308 COMBINING DIAERESIS
or not? It depends on the application. Certainly in a database system
I would want these to compare equal.
Perhaps normalization form needs to be an option of the string comparator?
-tree
--
Tom Emerson Basis Technology Corp.
Language Hacker http://www.basistech.com
"Beware the lollipop of mediocrity: lick it once and you suck forever"