[Python-Dev] HAVE_USABLE_WCHAR_T

Thomas Heller theller at python.net
Fri Oct 8 20:09:30 CEST 2004


The Include/unicodeobject.h file says (line 103):

/* If the compiler provides a wchar_t type we try to support it
   through the interface functions PyUnicode_FromWideChar() and
   PyUnicode_AsWideChar(). */

This isn't true - grepping the CVS sources for this symbol shows that it
is used in these ways:

- When defined together with the WANT_WCTYPE_FUNCTIONS symbol, the
compiler's wctype.h functions are used instead of the ones supplied with
Python.  Include/unicodeobject.h, line 294.

- When defined together with MS_WINDOWS, it makes available mbcs_enocde
and mbcs_decode functions (in Modules/_codecsmodule.c), plus the
PyUnicode_DecodeMBCS and PyUnicode_AsMBCSString functions in
Objects/unicodeobject.c.

- Contrary to the comment at the top of this message, the
PyUnicode_FromWideChar and PyUnicode_AsWideChar functions are compiled
when HAVE_WCHAR_H is defined.  The HAVE_USABLE_WCHAR_T symbol is only
used to determine whether memcpy is used, or the unicode characters are
copied one by one.

- Finally, again when defined together with MS_WINDOWS, it sets the
filesystem encoding to mbcs.


So, it seems that the HAVE_USABLE_WCHAR_T symbol doesn't play any role
for the extension programmer *at all*.  The preprocessor flag that plays
a role for extensions seem to be HAVE_WCHAR_H since they mark whether
the PyUnicode_FromWideChar and PyUnicode_AsWideChar are available or
not.

This has caused me quite some confusion, and so I suggest the comment
above in the Include/unicodeobject.h file should be fixed.

Finally, the docs also seem to get it wrong (although I haven't followed
that in detail).  Can't reach python.org at the moment, but Python C/api
manual, section 7.3.2, unicode objects says:

  Py_UNICODE

  This type represents a 16-bit unsigned storage type which is used by
  Python internally as basis for holding Unicode ordinals. On platforms
  where wchar_t is available and also has 16-bits, Py_UNICODE is a
  typedef alias for wchar_t to enhance native platform compatibility. On
  all other platforms, Py_UNICODE is a typedef alias for unsigned short.

Isn't the size 32 bits for wide unicode builds?

Please, please fix this - unicode is already complicated enough even
without this confusion!

Thomas



More information about the Python-Dev mailing list