[Python-Dev] RE: Unicode character name hashing

Sun, 16 Jul 2000 01:49:27 -0400

[Bill Tutt]
> After making use of the test drive Alphas by Compaq, I just
> uploaded a patch to SF that should fix this nasty issue.
> Ugh. Not fun....

Ah, I replied earlier before seeing this.  Please see my earlier response
for a simpler change that will actually work <wink>.  Briefly,

> >      len = cch;
> >      p = (unsigned char *) key;
> > -    x = 1694245428;
> > +    x = (long)0x64fc2234;

No point to this change (although there *is* for the *negative* literal in
f2).  And (for f2) note that the way to spell a long literal is with "L" at
the end, not via casting an int literal to long.

> >      while (--len >= 0)
> > -        x = (1000003*x) ^ toupper(*(p++));
> > +        x = ((0xf4243 * x) & 0xFFFFFFFF) ^ toupper(*(p++));

No point to either of the changes here (you're effectively doing a mod 2**32
on each step, but that isn't necessary -- doing just one at the end (after
the loop) is equivalent).

> >      x ^= cch + 10;
> > -    if (x == -1)
> > -        x = -2;
> > +    if (x == (long)0xFFFFFFFF)
> > +        x = (long)0xfffffffe;

This is wrong in too many ways to count <wink>.  But your tests won't show
the failure unless x *does* happen to equal -1 at the end of the preceding,
and you're extremely unlikely to bump into that in a small batch of test
cases.  Short course:  on the 32-bit long machine, x will be -2 after this
block then.  But on the 64-bit long machine, x will be 2**32-2 (a large and
unrelated positive long).

The change I suggested is simpler & avoids all those problems; alas, like
the above, it also assumes 2's-complement representation of negative longs.