RE: [Python-Dev] Re: [Python-checkins]python/dist/src/Objects unicodeobject.c, 2.197, 2.198

Sept. 21, 2003

      [Tim]
...
...
So what if MAL ammened his suggestion to
reject signed 2-byte wchar_t value as not-usable
+++++++ ?
[M.-A. Lemburg]
...
That would not solve the problem.
Then what is the problem, specifically?  I thought you agreed with Martin
that a signed 32-bit type doesn't hurt, since the sign bit remains clear
then in all cases of Unicode data.
...
Note that we have proper conversion routines that allow
converting between wchar_t and Py_UNICODE. These routines must
be used for conversions anyway (even if Py_UNICODE and wchar_t
happen to be the same type), so from a programmer perspective
changing Py_UNICODE to be unsigned won't be noticed and we
don't lose anything much.
Again, I don't see the point in using a signed type for data
that doesn't have any concept of signed values. It's just
bad design and we shouldn't try to go down the same route
if we don't have to.
I don't know why Martin favors wchar_t when possible.  The answer to that
isn't clear.  The answer to why there's an intractable problem if wchar_t
happens to be a signed type > 2 bytes also isn't clear.
...
The Unicode implementation has always defined Py_UNICODE to
be an unsigned type; see the Unicode PEP 100:
"""
Internal Format
The internal format for Unicode objects should use a Python
     specific fixed format <PythonUnicode> implemented as 'unsigned
     short' (or another unsigned numeric type having 16 bits).  Byte
     order is platform dependent.
...
The configure script should provide aid in deciding whether
     Python can use the native wchar_t type or not (it has to be a
     16-bit unsigned type).
"""
Python can also deal with UCS4 now, but the concept remains the
same.
Well, it doesn't have to be a 16-bit type either, even in a UCS2 build, and
we had a long argument about that one before, because a particular Cray
system didn't have any 16-bit type and the Unicode code wasn't working
there.  That got repaired when I rewrote the few bits of code that assumed
"exactly 16 bits" to live with the weaker "at least 16 bits".

In this iteration, Martin agreed that a signed 16-bit wchar_t can be
rejected.  The question remaining is what actual problem exists when there's
a signed wchar_t exceeding 16 bits.  Since Jeremy is running on exactly such
a system, and the tests pass for him, there's no *obvious* problem with it
(the segfault he experienced was due to reading uninitialized memory, and
that was a bug, and that's been fixed).

RE: [Python-Dev] Re: [Python-checkins]python/dist/src/Objects unicodeobject.c, 2.197, 2.198

Tim Peters