[Python-Dev] RE: Unicode character name hashing
Sun, 16 Jul 2000 01:49:27 -0400
> After making use of the test drive Alphas by Compaq, I just
> uploaded a patch to SF that should fix this nasty issue.
> Ugh. Not fun....
Ah, I replied earlier before seeing this. Please see my earlier response
for a simpler change that will actually work <wink>. Briefly,
> > len = cch;
> > p = (unsigned char *) key;
> > - x = 1694245428;
> > + x = (long)0x64fc2234;
No point to this change (although there *is* for the *negative* literal in
f2). And (for f2) note that the way to spell a long literal is with "L" at
the end, not via casting an int literal to long.
> > while (--len >= 0)
> > - x = (1000003*x) ^ toupper(*(p++));
> > + x = ((0xf4243 * x) & 0xFFFFFFFF) ^ toupper(*(p++));
No point to either of the changes here (you're effectively doing a mod 2**32
on each step, but that isn't necessary -- doing just one at the end (after
the loop) is equivalent).
> > x ^= cch + 10;
> > - if (x == -1)
> > - x = -2;
> > + if (x == (long)0xFFFFFFFF)
> > + x = (long)0xfffffffe;
This is wrong in too many ways to count <wink>. But your tests won't show
the failure unless x *does* happen to equal -1 at the end of the preceding,
and you're extremely unlikely to bump into that in a small batch of test
cases. Short course: on the 32-bit long machine, x will be -2 after this
block then. But on the 64-bit long machine, x will be 2**32-2 (a large and
unrelated positive long).
The change I suggested is simpler & avoids all those problems; alas, like
the above, it also assumes 2's-complement representation of negative longs.