Yesterday I ran into a bug in the C API docs. The top of this page: http://docs.python.org/api/unicodeObjects.html says: Py_UNICODE This type represents a 16-bit unsigned storage type which is used by Python internally as basis for holding Unicode ordinals. On platforms where wchar_t is available and also has 16-bits, Py_UNICODE is a typedef alias for wchar_t to enhance native platform compatibility. On all other platforms, Py_UNICODE is a typedef alias for unsigned short. This is incorrect on some platforms: on Debian, Py_UNICODE turns out to be 32 bits. I'm not sure what the correct quote should be: Does python use wchar_t whenever it's available (16 bits or not)? I solved my problem by realizing that I was going about things entirely wrong, and that I should use the python codecs from C and not worry about what Py_UNICODE contains. However, I think we should fix the docs to avoid confusing others... or maybe it would be better to document what's in Py_UNICODE and suggest always using the codec methods? I don't have a strong opinion either way. robey
On 9/29/05, Robey Pointer
Yesterday I ran into a bug in the C API docs. The top of this page:
http://docs.python.org/api/unicodeObjects.html
says:
Py_UNICODE This type represents a 16-bit unsigned storage type which is used by Python internally as basis for holding Unicode ordinals. On platforms where wchar_t is available and also has 16-bits, Py_UNICODE is a typedef alias for wchar_t to enhance native platform compatibility. On all other platforms, Py_UNICODE is a typedef alias for unsigned short.
I believe this is the same issue that was brought up in May[1]. My impression was that people could not agree on a documentation patch. [1] http://www.python.org/dev/summary/2005-05-01_2005-05-15.html STeVe -- You can wordify anything if you just verb it. --- Bucky Katt, Get Fuzzy
Steven Bethard wrote:
On 9/29/05, Robey Pointer
wrote: Yesterday I ran into a bug in the C API docs. The top of this page:
http://docs.python.org/api/unicodeObjects.html
says:
Py_UNICODE This type represents a 16-bit unsigned storage type which is used by Python internally as basis for holding Unicode ordinals. On platforms where wchar_t is available and also has 16-bits, Py_UNICODE is a typedef alias for wchar_t to enhance native platform compatibility. On all other platforms, Py_UNICODE is a typedef alias for unsigned short.
I believe this is the same issue that was brought up in May[1]. My impression was that people could not agree on a documentation patch.
[1] http://www.python.org/dev/summary/2005-05-01_2005-05-15.html
The problem was not so much getting the documentation, but the fact that Python builds as UCS4 version in case it finds a TCL version built for UCS4 - contrary to the UCS2 default that is documented. If I ever get around to working on my Python todo list, this is one of the things I'd like to restore - UCS4 should always be an *explicit* compile time option due to the consequences that go with it. Unfortunately, many Linux distros nowadays build Python with UCS4 -- introducing yet another dimension to binary Python binaries. In case you wonder, we now have these dimensions: * Python version (2.3, 2.4, ...) * OS version (Linux, Solaris, Windows, Mac OS X, ...) * Architecture (PowerPC, x86, x86_64, SunSPARC, ...) * Unicode variant (UCS2, UCS4) Finding the right binary for his or her Python is getting increasingly more complicated for the Python user (and we are seeing this every day in support requests). Something we might want to introduce in Python 2.5 is a short identifier in the Python interpreter interactive startup printout that provides easy to find values for all of the above dimensions. It already includes Python version and OS name, but is missing the other bits and pieces. Perhaps a flag that fires up Python and runs platform.py would help too. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 29 2005)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
Fredrik Lundh wrote:
M.-A. Lemburg wrote:
* Unicode variant (UCS2, UCS4)
don't forget the "Py_UNICODE is wchar_t" subvariant.
True, but that's not relevant for binary compatibility of Python package (at least not AFAIK). UCS2 vs. UCS4 matters because the two versions use and expose different C APIs and thus an extension written for UCS2 doesn't run with a Python built for UCS4 and vice-versa. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 29 2005)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
Bob Ippolito wrote:
On Sep 29, 2005, at 3:53 PM, M.-A. Lemburg wrote:
Perhaps a flag that fires up Python and runs platform.py would help too.
python -mplatform
Cool :-) Now we only need to add some more information to it (like e.g. the Unicode variant). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 29 2005)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
On 29 Sep 2005, at 12:06, Steven Bethard wrote:
On 9/29/05, Robey Pointer
wrote: Yesterday I ran into a bug in the C API docs. The top of this page:
http://docs.python.org/api/unicodeObjects.html
says:
Py_UNICODE This type represents a 16-bit unsigned storage type which is used by Python internally as basis for holding Unicode ordinals. On platforms where wchar_t is available and also has 16-bits, Py_UNICODE is a typedef alias for wchar_t to enhance native platform compatibility. On all other platforms, Py_UNICODE is a typedef alias for unsigned short.
I believe this is the same issue that was brought up in May[1]. My impression was that people could not agree on a documentation patch.
Would it help if I tried my hand at it? My impression so far is that extension coders should probably try not to worry about the size or content of Py_UNICODE. (The thread seems to have wandered off into nowhere again...) Py_UNICODE This type represents an unsigned storage type at least 16-bits long (but sometimes more) which is used by Python internally as basis for holding Unicode ordinals. On platforms where wchar_t is available and also has 16-bits, Py_UNICODE is a typedef alias for wchar_t to enhance native platform compatibility. In general, you should use PyUnicode_FromEncodedObject and PyUnicode_AsEncodedString to convert strings to/from unicode objects, and consider Py_UNICODE to be an implementation detail. robey
Robey Pointer wrote:
On 29 Sep 2005, at 12:06, Steven Bethard wrote:
On 9/29/05, Robey Pointer
wrote: Yesterday I ran into a bug in the C API docs. The top of this page:
http://docs.python.org/api/unicodeObjects.html
says:
Py_UNICODE This type represents a 16-bit unsigned storage type which is used by Python internally as basis for holding Unicode ordinals. On platforms where wchar_t is available and also has 16-bits, Py_UNICODE is a typedef alias for wchar_t to enhance native platform compatibility. On all other platforms, Py_UNICODE is a typedef alias for unsigned short.
I believe this is the same issue that was brought up in May[1]. My impression was that people could not agree on a documentation patch.
FYI, I've fixed the Py_UNICODE description now. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 10 2005)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
participants (5)
-
Bob Ippolito
-
Fredrik Lundh
-
M.-A. Lemburg
-
Robey Pointer
-
Steven Bethard