[Python-checkins] RE: [Python-Dev] Re: python/dist/src/Objectsunicodeobject.c, 2.204, 2.205

Fri Dec 19 10:40:13 EST 2003

[Hye-Shik Chang]
> BTW, do we really support architectures with 9bits-sized char?

I don't think so.  There are assumptions that a char is 8 bits scattered
throughout Python's code, not so much in the context of using characters
*as* characters, but more indirectly by assuming that the number of *bits*
in an object of a non-char type T can be computed as sizeof(T)*8.

Skip's idea of making config smarter about this is a good one, but instead
of trying to "fix stuff" for a case that's probably never going to arise,
and that can't really be tested anyway until it does, I'd add a block like
this everywhere we know we're relying on 8-bit char:

#ifdef HAS_FUNNY_SIZE_CHAR
#error "The following code needs rework when a char isn't 8 bits"
#endif
/* A comment explaining why the following code needs rework
 * when a char isn't 8 bits.
 */

Crays are a red herring here.  It's true that some Cray *hardware* can't
address anything smaller than 64 bits, and that's also true of some other
architectures.  char is nevertheless 8 bits on all such 64-bit boxes I know
of (and since I worked in a 64-bit world for 15 years, I know about most of
them <wink>).  On Crays, this is achieved (albeit at major expense) in
software:  by *software* convention, a pointer-to-char stores the byte
offset in the *most*-significant 3 bits of a pointer, and long-winded
generated coded picks that part at runtime, loading or storing 8 bytes at a
time (the HW can't do less than that), shifting and masking and or'ing to
give the illusion of byte addressing for char.  Some Alphas do something
similar, but that HW's loads and stores simply ignore the last 3 bits of a
memory address, and the CPU has special-purpose instructions to help
generated code do the subsequent extraction and insertion of 8-bit chunks
efficiently and succinctly.