[Python-3000] How will unicode get used?

Paul Prescod paul at prescod.net
Mon Sep 25 17:50:16 CEST 2006


On 9/25/06, Jim Jewett <jimjjewett at gmail.com> wrote:
>
> As David Hopwood pointed out, to be fully correct, you already have to
> create a custom function even with bmp characters, because of
> decomposed characters.  (Example:  Representing a c-cedilla as a c and
> a combining cedilla, rather than as a single code point.)  Separating
> those two would be wrong.  Counting them as two characters for slicing
> purposes would usually be wrong.


Even 32-bit representations are permitted to use surrogate pairs; it
> just doesn't often make sense.


 There is at least one big difference between surrogate pairs and decomposed
characters. The user can typically normalize away decompositions. How do you
normalize away decompositions in a language that only supports 16-bit
representations?

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060925/bb100953/attachment.html 


More information about the Python-3000 mailing list