"Jeff Hobbs" jeffh@ActiveState.com writes:
BTW, I mentioned this because I'm not sure that the reasoning behind moving to a 32-bit integral type was due to RHs desire to support the extra chars in Unicode 4 (after all, without shipping fonts to display them ... what's the point?).
I guess a driving motivation is alignment with the C library, atleast that is what drove me to add UCS-4 support to Python. On Unix, traditionally, wchar_t, if interpreted as Unicode, is a four-byte data type. The Unicode spec performed an interesting dance about that: Unicode 2.0 claimed that it was outright non-conforming to use a four-byte wchar_t for Unicode. Unicode 3.0 said "well, you can". Unicode 3.2 now says "why not, it's a reasonable thing to do".
So for us in the Unix world, the impression is that the C library's decision was always "right", and we are eager to support that decision. For libraries such as iconv, there is a performance advantage gained from matching the interpreter's Unicoode type with the system's wchar_t.
Apart from that, there is also the feeling that ISO 10646 got it right and the Unicode consortium got it wrong. You really do need more than 64k code points if you want to unify all writing systems. From that viewpoint, UTF-16 is an ugly hack, which should be avoided whereever possible.