Unicode support in Python 2.7.8 - 16 bit
Steven D'Aprano
steve at pearwood.info
Wed Mar 8 02:32:29 EST 2017
On Tue, 07 Mar 2017 14:05:15 -0800, John Nagle wrote:
> How do I test if a Python 2.7.8 build was built for 32-bit Unicode?
sys.maxunicode will be 1114111 if it is a "wide" (32-bit) build and 65535
if it is a "narrow" (16-bit) build.
You can double-check with:
unichr(0x10FFFF) # will raise ValueError in a narrow build
len(u'\U0010FFFF') # return 1 in a wide build, or 2 in a narrow build
but the maxunicode test is the right way to do it.
> (I'm dealing with shared hosting, and I'm stuck with their provided
> versions.)
>
> If I give this to Python 2.7.x:
>
> sy = u'\U0001f60f'
>
> len(sy) is 1 on a Ubuntu 14.04LTS machine, but 2 on the Red Hat shared
> hosting machine. I assume "1" indicates 32-bit Unicode capability, and
> "2" indicates 16-bit.
> It looks like Python 2.x in 16-bit mode is using a UTF-16 pair
> encoding, like Java. Is that right?
Correct.
> Is it documented somewhere?
https://docs.python.org/2/library/sys.html#sys.maxunicode
https://docs.python.org/3/library/sys.html#sys.maxunicode
Here's the PEP that introduced the distinction in the first place:
https://www.python.org/dev/peps/pep-0261/
And here's the PEP that removes the distinction once and for all (at
least in CPython):
https://www.python.org/dev/peps/pep-0393/
I know the narrow/wide distinction was documented in the build
instructions for when you compiled Python from source; that's obsolete
since 3.3. I believe the compiler options were --enable-unicode=ucs4 and
--enable-unicode=ucs2 (but don't quote me on that).
--
Steve
More information about the Python-list
mailing list