[Python-3000] How will unicode get used?
David Hopwood
david.nospam.hopwood at blueyonder.co.uk
Thu Sep 21 21:41:54 CEST 2006
Fredrik Lundh wrote:
> David Hopwood wrote:
>
>>For example, "ö" can be represented either as the precomposed character U+00F6,
>>or as "o" followed by a combining diaeresis (U+006F U+0308).
>
> normalization is a good thing, though:
>
> http://www.w3.org/TR/charmod-norm/
>
> (it would probably be a good idea to turn unicodedata.normalize into a
> method for the new unicode string type).
Normalization is certainly a good thing to support. But that's orthogonal to
my point above -- that some abstract characters are representable by sequences
of more than one code point, which must not be split, and that avoidance of such
splitting automatically also avoids splitting within a code point representation.
Note that some abstract characters needed for living languages are representable
*only* by combining sequences.
--
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>
More information about the Python-3000
mailing list