M.-A. Lemburg mal at egenix.com
Fri Oct 8 21:30:11 CEST 2004

Thomas Heller wrote:
> The Include/unicodeobject.h file says (line 103):
> /* If the compiler provides a wchar_t type we try to support it
>    through the interface functions PyUnicode_FromWideChar() and
>    PyUnicode_AsWideChar(). */
> This isn't true - grepping the CVS sources for this symbol shows that it
> is used in these ways:
> - When defined together with the WANT_WCTYPE_FUNCTIONS symbol, the
> compiler's wctype.h functions are used instead of the ones supplied with
> Python.  Include/unicodeobject.h, line 294.
> - When defined together with MS_WINDOWS, it makes available mbcs_enocde
> and mbcs_decode functions (in Modules/_codecsmodule.c), plus the
> PyUnicode_DecodeMBCS and PyUnicode_AsMBCSString functions in
> Objects/unicodeobject.c.
> - Contrary to the comment at the top of this message, the
> PyUnicode_FromWideChar and PyUnicode_AsWideChar functions are compiled
> when HAVE_WCHAR_H is defined.  The HAVE_USABLE_WCHAR_T symbol is only
> used to determine whether memcpy is used, or the unicode characters are
> copied one by one.
> - Finally, again when defined together with MS_WINDOWS, it sets the
> filesystem encoding to mbcs.
> So, it seems that the HAVE_USABLE_WCHAR_T symbol doesn't play any role
> for the extension programmer *at all*.  

That symbol is defined by the configure script for use in the
interpreter - why did you think it is usable for extensions ?

The HAVE_USABLE_WCHAR_T symbol only means that we can use wchar_t
as synonym for Py_UNICODE and thus makes some APIs
more efficient, e.g. on Windows - nothing more.

> The preprocessor flag that plays
> a role for extensions seem to be HAVE_WCHAR_H since they mark whether
> the PyUnicode_FromWideChar and PyUnicode_AsWideChar are available or
> not.

Right, since wchar.h is the include file that defines the
wchar_t type.

> This has caused me quite some confusion, and so I suggest the comment
> above in the Include/unicodeobject.h file should be fixed.
> Finally, the docs also seem to get it wrong (although I haven't followed
> that in detail).  Can't reach python.org at the moment, but Python C/api
> manual, section 7.3.2, unicode objects says:
>   This type represents a 16-bit unsigned storage type which is used by
>   Python internally as basis for holding Unicode ordinals. On platforms
>   where wchar_t is available and also has 16-bits, Py_UNICODE is a
>   typedef alias for wchar_t to enhance native platform compatibility. On
>   all other platforms, Py_UNICODE is a typedef alias for unsigned short.
> Isn't the size 32 bits for wide unicode builds?


> Please, please fix this - unicode is already complicated enough even
> without this confusion!

Please add a bug report to SF for this.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Oct 08 2004)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

More information about the Python-Dev mailing list