[Python-Dev] len(chr(i)) = 2?

R. David Murray rdmurray at bitdance.com
Mon Nov 22 18:30:29 CET 2010


On Mon, 22 Nov 2010 12:00:14 -0500, Alexander Belopolsky <alexander.belopolsky at gmail.com> wrote:
> I recently updated  chr() and ord()  documentation and used
> "narrow/wide" terms.  I thought USC2/4 proponents objected to that on
> the basis that these terms are imprecise.

For reference, a grep in py3k/Doc reveals that there are currently exactly
23 lines mentioning UCS2 or UCS4 in the docs.  Most are in the unicode part
of the c-api, and 6 are in what's new for 2.2:

c-api/arg.rst:      Convert a null-terminated buffer of Unicode (UCS-2 or UCS-4) data to a Python
c-api/arg.rst:      Convert a Unicode (UCS-2 or UCS-4) data buffer and its length to a Python

c-api/unicode.rst:   for :c:type:`Py_UNICODE` and store Unicode values internally as UCS2. It is also
c-api/unicode.rst:   possible to build a UCS4 version of Python (most recent Linux distributions come
c-api/unicode.rst:   with UCS4 builds of Python). These builds then use a 32-bit type for
c-api/unicode.rst:   :c:type:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms
c-api/unicode.rst:   short` (UCS2) or :c:type:`unsigned long` (UCS4).
c-api/unicode.rst:Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep
c-api/unicode.rst:   values is interpreted as an UCS-2 character.

whatsnew/2.2.rst:usually stored as UCS-2, as 16-bit unsigned integers. Python 2.2 can also be
whatsnew/2.2.rst:compiled to use UCS-4, 32-bit unsigned integers, as its internal encoding by
whatsnew/2.2.rst:supplying :option:`--enable-unicode=ucs4` to the configure script.   (It's also
whatsnew/2.2.rst:When built to use UCS-4 (a "wide Python"), the interpreter can natively handle
whatsnew/2.2.rst:compiled to use UCS-2 (a "narrow Python"), values greater than 65535 will still
whatsnew/2.2.rst:Marc-André Lemburg.  The changes to support using UCS-4 internally were

howto/unicode.rst:.. comment Additional topic: building Python w/ UCS2 or UCS4 support
howto/unicode.rst:           - [ ] Building Python (UCS2, UCS4)

library/sys.rst:   characters are stored as UCS-2 or UCS-4.

library/json.rst:   specified.  Encodings that are not ASCII based (such as UCS-2) are not

faq/extending.rst:When importing module X, why do I get "undefined symbol: PyUnicodeUCS2*"?
faq/extending.rst:If instead the name of the undefined symbol starts with ``PyUnicodeUCS4``, the
faq/extending.rst:   ...     print('UCS4 build')
faq/extending.rst:   ...     print('UCS2 build')

--
R. David Murray                                      www.bitdance.com


More information about the Python-Dev mailing list