[Python-Dev] RE: Unicode character name hashing

Bill Tutt billtut@microsoft.com
Sun, 16 Jul 2000 01:35:56 -0700

Not that any of this is terribly important given F bot's new patch, except
for wrt to the perfect hash generation code.

> Tim wrote:
> [Bill Tutt]
> > I just had a rather unhappy epiphany this morning.
> > F1, and f2 in ucnhash.c might not work on machines where
> > sizeof(long) != 32 bits.

> If "not work" means "may not return the same answer as when a long does
> exactly 32 bits", then yes, it's certain not to work.  Else I don't know
> I don't understand the (undocumented) postconditions (== what does "work"
> mean, exactly?) for these functions.

"Works" means that f1, and f2 must always generate the same bits no matter
what platform they're executed on

> If getting the same bits is what's important, f1 can be repaired by
> inserting this new block:

>     /* cut back to 32 bits */
>     x &= 0xffffffffL;
>     if (x & 0x80000000L) {
>         /* if negative as a 32-bit thing, extend sign bit to full
precision */
>        x -= 0x80000000L;  /* subtract 2**32 in a portable way */
>         x -= 0x80000000L;  /* by subtracting 2**31 twice */
>     }

> between the existing
>     x ^= cch + 10;
> and
>     if (x == -1)

> This assumes that negative numbers are represented in 2's-complement, but
> should deliver the same bits in the end on any machine for which that's
> (I don't know of any Python platform for which it isn't).  The same shoe
> work for f2 after replacing its negative literal with a 0x...L bit pattern
> too.

> The assumption about 2's-comp, and the new "if" block, could be removed by
> making these functions compute with and return unsigned longs instead.  I
> don't know why they're using signed longs now (the bits produced are
> the same either way, up until the "%" operation, at which point C is
> ill-defined when using signed long).

The SF patch does indeed use unsigned longs.