[issue8781] 32-bit wchar_t doesn't need to be unsigned to be usable (I think)

Daniel Stutzbach report at bugs.python.org
Fri May 21 14:44:00 CEST 2010


New submission from Daniel Stutzbach <daniel at stutzbachenterprises.com>:

If ./configure detects that the system's wchar_t type is compatible, it will define "#define PY_UNICODE_TYPE wchar_t" and enable certain optimizations when converting between Py_UNICODE and wchar_t (i.e., it can just do a memcpy).

Right now, ./configure considers wchar_t to be compatible if it is the same bit-width as Py_UNICODE and if wchar_t is unsigned.  In practice, that means Python only uses wchar_t on Windows, which uses an unsigned 16-bit wchar_t.  On Linux, wchar_t is 32-bit and signed.

In the original Unicode implementation for Python, Py_UNICODE was always 16-bit.  I believe the "unsigned" requirement heralds back to that time.  A 32-bit wchar_t gives us plenty of space to hold the maximum Unicode code point of 0x10FFFF, regardless of whether wchar_t is signed or unsigned.

I believe the condition could be relaxed to the following:
- wchar_t must be the same bit-width as Py_UNICODE, and
- if wchar_t is 16-bit, it must be unsigned

That would allow a UCS4 Python to use wchar_t on Linux.

I experimented by manually tweaking my pyconfig.h to treat Linux's signed 32-bit wchar_t as compatible.  The unit test suite encountered no problems.

However, it's quite possible that I'm missing some important detail here.  Someone familiar with the guts of Python's Unicode implementation  will presumably have a much better idea of whether I have this right or not. ;-)

----------
components: Interpreter Core, Unicode
messages: 106235
nosy: stutzbach
priority: normal
severity: normal
stage: needs patch
status: open
title: 32-bit wchar_t doesn't need to be unsigned to be usable (I think)
type: performance
versions: Python 3.2

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8781>
_______________________________________


More information about the Python-bugs-list mailing list