[Python-3000] How will unicode get used?

David Hopwood david.nospam.hopwood at blueyonder.co.uk
Thu Sep 21 21:41:54 CEST 2006


Fredrik Lundh wrote:
> David Hopwood wrote:
> 
>>For example, "ö" can be represented either as the precomposed character U+00F6,
>>or as "o" followed by a combining diaeresis (U+006F U+0308).
> 
> normalization is a good thing, though:
> 
>      http://www.w3.org/TR/charmod-norm/
> 
> (it would probably be a good idea to turn unicodedata.normalize into a 
> method for the new unicode string type).

Normalization is certainly a good thing to support. But that's orthogonal to
my point above -- that some abstract characters are representable by sequences
of more than one code point, which must not be split, and that avoidance of such
splitting automatically also avoids splitting within a code point representation.

Note that some abstract characters needed for living languages are representable
*only* by combining sequences.

-- 
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>





More information about the Python-3000 mailing list