[Python-Dev] HAVE_USABLE_WCHAR_T
Thomas Heller
theller at python.net
Fri Oct 8 20:09:30 CEST 2004
The Include/unicodeobject.h file says (line 103):
/* If the compiler provides a wchar_t type we try to support it
through the interface functions PyUnicode_FromWideChar() and
PyUnicode_AsWideChar(). */
This isn't true - grepping the CVS sources for this symbol shows that it
is used in these ways:
- When defined together with the WANT_WCTYPE_FUNCTIONS symbol, the
compiler's wctype.h functions are used instead of the ones supplied with
Python. Include/unicodeobject.h, line 294.
- When defined together with MS_WINDOWS, it makes available mbcs_enocde
and mbcs_decode functions (in Modules/_codecsmodule.c), plus the
PyUnicode_DecodeMBCS and PyUnicode_AsMBCSString functions in
Objects/unicodeobject.c.
- Contrary to the comment at the top of this message, the
PyUnicode_FromWideChar and PyUnicode_AsWideChar functions are compiled
when HAVE_WCHAR_H is defined. The HAVE_USABLE_WCHAR_T symbol is only
used to determine whether memcpy is used, or the unicode characters are
copied one by one.
- Finally, again when defined together with MS_WINDOWS, it sets the
filesystem encoding to mbcs.
So, it seems that the HAVE_USABLE_WCHAR_T symbol doesn't play any role
for the extension programmer *at all*. The preprocessor flag that plays
a role for extensions seem to be HAVE_WCHAR_H since they mark whether
the PyUnicode_FromWideChar and PyUnicode_AsWideChar are available or
not.
This has caused me quite some confusion, and so I suggest the comment
above in the Include/unicodeobject.h file should be fixed.
Finally, the docs also seem to get it wrong (although I haven't followed
that in detail). Can't reach python.org at the moment, but Python C/api
manual, section 7.3.2, unicode objects says:
Py_UNICODE
This type represents a 16-bit unsigned storage type which is used by
Python internally as basis for holding Unicode ordinals. On platforms
where wchar_t is available and also has 16-bits, Py_UNICODE is a
typedef alias for wchar_t to enhance native platform compatibility. On
all other platforms, Py_UNICODE is a typedef alias for unsigned short.
Isn't the size 32 bits for wide unicode builds?
Please, please fix this - unicode is already complicated enough even
without this confusion!
Thomas
More information about the Python-Dev
mailing list