[Python-3000] How will unicode get used?

David Hopwood david.nospam.hopwood at blueyonder.co.uk
Tue Sep 26 01:19:54 CEST 2006


Paul Prescod wrote:
> On 9/25/06, Jim Jewett <jimjjewett at gmail.com> wrote:
> 
>> As David Hopwood pointed out, to be fully correct, you already have to
>> create a custom function even with bmp characters, because of
>> decomposed characters.  (Example:  Representing a c-cedilla as a c and
>> a combining cedilla, rather than as a single code point.)  Separating
>> those two would be wrong.  Counting them as two characters for slicing
>> purposes would usually be wrong.
> 
> Even 32-bit representations are permitted to use surrogate pairs; it
> just doesn't often make sense.
> 
> There is at least one big difference between surrogate pairs and decomposed
> characters. The user can typically normalize away decompositions.

That depends what script they're using. For some scripts, they can't.

-- 
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>




More information about the Python-3000 mailing list