[Python-3000] String comparison
Jim Jewett
jimjjewett at gmail.com
Fri Jun 8 16:27:40 CEST 2007
On 6/8/07, Rauli Ruohonen <rauli.ruohonen at gmail.com> wrote:
> The additional field is 8 bits, two bits for each normalization (a
> Yes/Maybe/No value). In Unicode 4.1 only 5 different combinations are
> used, but I don't know if that's true of later versions.
There are no "Maybe" values for the Decomposed forms.
It is impossible to be Compatibility without also being Canonical.
(The definition of Compatibility includes folding as much as possible
under either form.)
So there are really 3 possibilities (both, canonical only, neither)
for the decomposed, and (at most) 6 for the composed forms. (I'm not
sure all 6 of those can occur in practice.)
But there are other normalization forms that may be added later. The
ones I found reference to are basically orthogonal (an existing
normalization may or may not meet them).
See the proposed changes at http://www.unicode.org/reports/tr15/tr15-28.html
-jJ
More information about the Python-3000
mailing list