[Python-Dev] UCS2/UCS4 default

"Martin v. Löwis" martin at v.loewis.de
Fri Jul 4 00:31:49 CEST 2008


> Wrong term - code units and code points are equivalent in UTF-16 and
> UTF-32.  What you're looking for is unicode scalar values.

How so? Section 2.5, UTF-16 says

"code points in the supplementary planes, in the range
U+10000..U+10FFFF, are represented as pairs of 16-bit code units."

So clearly, code points in Unicode range from U+0000..U+10FFFF,
independent of encoding form.

In UTF-16, code units range from 0..65535.

OTOH, "unicode scalar value" is nearly synonymous to "code point":

D76 Unicode Scalar Value. Any Unicode  code point except high-surrogate
and low-surrogate code points.

So codepoint in Terry's message was the right term.

Regards,
Martin


More information about the Python-Dev mailing list