[Python-Dev] 2.2 Unicode questions

Fredrik Lundh fredrik@pythonware.com
Mon, 23 Jul 2001 12:00:16 +0200


MAL wrote:
> Please note that unichr() is a low-level API which is part
> of the Unicode implementation.

well, I thought unichr() was a built-in Python function...

> To simplify the picture: the implementation itself only sees
> UCS-2 or UCS-4 depending on the compile time option and these
> do not treat surrogates in any special way except reserve
> code points for their usage. Accordingly, unichr() should not
> create UTF-16 but UCS-2 for narrow builds and UCS-4 on wide
> builds

you didn't answer my question: is there any reason why
unichr(0xXXXXXXXX) shouldn't return exactly the same
thing as "\UXXXXXXXX" ?

in 2.0 and 2.1, it doesn't.  in 2.2, it does.

> (unichr() is a contructor for code units, not code points).

really?  according to the documentation, it creates unicode
*characters*.  so does \U, according to the documentation.

imo, it makes more sense to let "characters" mean code points
than code units, but that's me.  the important thing here is to
figure out if \U and unichr are the same thing, and fix the code
and the documentation to do/say what we mean.

</F>