[Python-3000] How will unicode get used?
David Hopwood
david.nospam.hopwood at blueyonder.co.uk
Tue Sep 26 01:19:54 CEST 2006
Paul Prescod wrote:
> On 9/25/06, Jim Jewett <jimjjewett at gmail.com> wrote:
>
>> As David Hopwood pointed out, to be fully correct, you already have to
>> create a custom function even with bmp characters, because of
>> decomposed characters. (Example: Representing a c-cedilla as a c and
>> a combining cedilla, rather than as a single code point.) Separating
>> those two would be wrong. Counting them as two characters for slicing
>> purposes would usually be wrong.
>
> Even 32-bit representations are permitted to use surrogate pairs; it
> just doesn't often make sense.
>
> There is at least one big difference between surrogate pairs and decomposed
> characters. The user can typically normalize away decompositions.
That depends what script they're using. For some scripts, they can't.
--
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>
More information about the Python-3000
mailing list