[Python-Dev] ./configure support for the New Unicode Snapshot

M.-A. Lemburg mal@lemburg.com
Wed, 09 Feb 2000 10:11:14 +0100


Tim Peters wrote:
> 
> [M.-A. Lemburg]
> > ...
> > Also new in this snapshot is configuration code which figures
> > out the byte order on the installation machine... I looked
> > everywhere in the Python source code but couldn't find any
> > hint whether this was already done in some place,
> 
> [Tim]
> > There's a tiny bit of inline code for this in the "host byte
> > order" case of structmodule.c's function whichtable. ...
> 
> [MAL]
> > I looked there, but only found that it uses native byte order
> > by means of "letting the compiler do the right thing" -- there
> > doesn't seem to be any code which actually tests for it.
> 
> Here's the "tiny bit of (etc)":
> 
>                 int n = 1;
>                 char *p = (char *) &n;
>                 if (*p == 1)
>                         ...

Hmm, haven't noticed that one (but Jean posted the same idea
in private mail ;).

> > The autoconf stuff is pretty simple, BTW. The following code
> > is used for the test:
> >
> > main() {
> >  long x = 0x34333231; /* == "1234" on little endian machines */
> >  char *y = (char *)&x;
> >  if (strncmp(y,"1234",4))
> >   exit(0); /* big endian */
> >  else
> >   exit(1); /* little endian */
> > }
> 
> No, no, no -- that's one "no" for each distinct way I know of that can fail
> on platforms where sizeof(long) == 8 <wink>.  Don't *ever* use longs to test
> endianness; besides the obvious problems, it also sucks you into illusions
> unique to "mixed endian" architectures.  "ints" are iffy too, but less so.
> 
> Test what you're actually concerned about, as directly and simply as
> possible; e.g., if you're actually concerned about how the machine stores
> shorts, do what structmodule does but use a short instead of an int.  And if
> it's important, explicitly verify that sizeof(short)==2 (& raise an error if
> it's not).

I've turned to the autoconf predefined standard macro as
suggested by Fredrik. It does the above plus some other
magic as well to find out endianness. On big endian machines
the configure script now defines WORDS_BIGENDIAN.

The sizeof(Py_UNICODE)==2 assertion is currently tested at
init time of the Unicode implementation. I would like to
add Fredriks proposed sizeof checks to the configure script
too, but there's a catch: the config.h in PC/ is hand generated
and would need some updates for the various PC targets.
Any volunteer ? We'd need the following extra data:

/* The number of bytes in a char.  */
#define SIZEOF_CHAR 1

/* The number of bytes in a double.  */
#define SIZEOF_DOUBLE 8

/* The number of bytes in a float.  */
#define SIZEOF_FLOAT 4

/* The number of bytes in a short.  */
#define SIZEOF_SHORT 2

plus maybe

/* Endianness. PCs are usually little endian, so we don't define this
   here... */
/* #undef WORDS_BIGENDIAN */

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/