[Python-Dev] PEP 393 Summer of Code Project

Terry Reedy tjreedy at udel.edu
Fri Sep 9 07:39:21 CEST 2011


On 9/8/2011 6:15 PM, fwierzbicki at gmail.com wrote:
> Oops, forgot to add the link for the gory details for Java and>  2 byte unicode:
>
> http://java.sun.com/developer/technicalArticles/Intl/Supplementary/

This is dated 2004. Basically, they considered several options, tried 
out 4, and ended up sticking with char[] (sequences) as UTF-16 with char 
= 16 bit code unit and added 32-bit Character(int) class for low-level 
manipulation of code points.

I did not see the indexing problem mentioned. I get the impression that 
they encourage sequence forward-backward iteration (cursor-based access) 
rather than random-access indexing.

-- 
Terry Jan Reedy



More information about the Python-Dev mailing list