The Include/unicodeobject.h file says (line 103): /* If the compiler provides a wchar_t type we try to support it through the interface functions PyUnicode_FromWideChar() and PyUnicode_AsWideChar(). */ This isn't true - grepping the CVS sources for this symbol shows that it is used in these ways: - When defined together with the WANT_WCTYPE_FUNCTIONS symbol, the compiler's wctype.h functions are used instead of the ones supplied with Python. Include/unicodeobject.h, line 294. - When defined together with MS_WINDOWS, it makes available mbcs_enocde and mbcs_decode functions (in Modules/_codecsmodule.c), plus the PyUnicode_DecodeMBCS and PyUnicode_AsMBCSString functions in Objects/unicodeobject.c. - Contrary to the comment at the top of this message, the PyUnicode_FromWideChar and PyUnicode_AsWideChar functions are compiled when HAVE_WCHAR_H is defined. The HAVE_USABLE_WCHAR_T symbol is only used to determine whether memcpy is used, or the unicode characters are copied one by one. - Finally, again when defined together with MS_WINDOWS, it sets the filesystem encoding to mbcs. So, it seems that the HAVE_USABLE_WCHAR_T symbol doesn't play any role for the extension programmer *at all*. The preprocessor flag that plays a role for extensions seem to be HAVE_WCHAR_H since they mark whether the PyUnicode_FromWideChar and PyUnicode_AsWideChar are available or not. This has caused me quite some confusion, and so I suggest the comment above in the Include/unicodeobject.h file should be fixed. Finally, the docs also seem to get it wrong (although I haven't followed that in detail). Can't reach python.org at the moment, but Python C/api manual, section 7.3.2, unicode objects says: Py_UNICODE This type represents a 16-bit unsigned storage type which is used by Python internally as basis for holding Unicode ordinals. On platforms where wchar_t is available and also has 16-bits, Py_UNICODE is a typedef alias for wchar_t to enhance native platform compatibility. On all other platforms, Py_UNICODE is a typedef alias for unsigned short. Isn't the size 32 bits for wide unicode builds? Please, please fix this - unicode is already complicated enough even without this confusion! Thomas
Thomas Heller wrote:
The Include/unicodeobject.h file says (line 103):
/* If the compiler provides a wchar_t type we try to support it through the interface functions PyUnicode_FromWideChar() and PyUnicode_AsWideChar(). */
This isn't true - grepping the CVS sources for this symbol shows that it is used in these ways:
- When defined together with the WANT_WCTYPE_FUNCTIONS symbol, the compiler's wctype.h functions are used instead of the ones supplied with Python. Include/unicodeobject.h, line 294.
- When defined together with MS_WINDOWS, it makes available mbcs_enocde and mbcs_decode functions (in Modules/_codecsmodule.c), plus the PyUnicode_DecodeMBCS and PyUnicode_AsMBCSString functions in Objects/unicodeobject.c.
- Contrary to the comment at the top of this message, the PyUnicode_FromWideChar and PyUnicode_AsWideChar functions are compiled when HAVE_WCHAR_H is defined. The HAVE_USABLE_WCHAR_T symbol is only used to determine whether memcpy is used, or the unicode characters are copied one by one.
- Finally, again when defined together with MS_WINDOWS, it sets the filesystem encoding to mbcs.
So, it seems that the HAVE_USABLE_WCHAR_T symbol doesn't play any role for the extension programmer *at all*.
That symbol is defined by the configure script for use in the interpreter - why did you think it is usable for extensions ? The HAVE_USABLE_WCHAR_T symbol only means that we can use wchar_t as synonym for Py_UNICODE and thus makes some APIs more efficient, e.g. on Windows - nothing more.
The preprocessor flag that plays a role for extensions seem to be HAVE_WCHAR_H since they mark whether the PyUnicode_FromWideChar and PyUnicode_AsWideChar are available or not.
Right, since wchar.h is the include file that defines the wchar_t type.
This has caused me quite some confusion, and so I suggest the comment above in the Include/unicodeobject.h file should be fixed.
Finally, the docs also seem to get it wrong (although I haven't followed that in detail). Can't reach python.org at the moment, but Python C/api manual, section 7.3.2, unicode objects says:
Py_UNICODE
This type represents a 16-bit unsigned storage type which is used by Python internally as basis for holding Unicode ordinals. On platforms where wchar_t is available and also has 16-bits, Py_UNICODE is a typedef alias for wchar_t to enhance native platform compatibility. On all other platforms, Py_UNICODE is a typedef alias for unsigned short.
Isn't the size 32 bits for wide unicode builds?
Yes.
Please, please fix this - unicode is already complicated enough even without this confusion!
Please add a bug report to SF for this. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 08 2004)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
participants (2)
-
M.-A. Lemburg
-
Thomas Heller