[Python-Dev] UCS2/UCS4 default
"Martin v. Löwis"
martin at v.loewis.de
Fri Jul 4 00:31:49 CEST 2008
> Wrong term - code units and code points are equivalent in UTF-16 and
> UTF-32. What you're looking for is unicode scalar values.
How so? Section 2.5, UTF-16 says
"code points in the supplementary planes, in the range
U+10000..U+10FFFF, are represented as pairs of 16-bit code units."
So clearly, code points in Unicode range from U+0000..U+10FFFF,
independent of encoding form.
In UTF-16, code units range from 0..65535.
OTOH, "unicode scalar value" is nearly synonymous to "code point":
D76 Unicode Scalar Value. Any Unicode code point except high-surrogate
and low-surrogate code points.
So codepoint in Terry's message was the right term.
Regards,
Martin
More information about the Python-Dev
mailing list