[Python-Dev] Re: [Patches] fix float_hash and complex_hash for 64-bit *nix
Trent Mick
trentm@activestate.com
Wed, 10 May 2000 13:14:46 -0700
On Wed, May 10, 2000 at 12:53:16AM -0400, Tim Peters wrote:
> [Trent Mick]
> > Discussion:
> >
> > Okay, it is debatable to call float_hash and complex_hash broken,
> > but their code presumed that sizeof(long) was 32-bits. As a result
> > the hashed values for floats and complex values were not the same
> > on a 64-bit *nix system as on a 32-bit *nix system. With this
> > patch they are.
>
> The goal is laudable but the analysis seems flawed. For example, this new
> comment:
Firstly, I should have admitted my ignorance with regards to hash functions.
> Looks to me like the real problem in the original was here:
>
> x = hipart + (long)fractpart + (long)intpart + (expo << 15);
> ^^^^^^^^^^^^^
>
> The difficulty is that intpart may *not* fit in 32 bits, so the cast of
> intpart to long is ill-defined when sizeof(long) == 4.
>
> That is, the hash function truly is broken for "large" values with a
> fractional part, and I expect your after-patch code suffers the same
> problem:
Yes it did.
> The
> solution to this is to break intpart in this branch into pieces no larger
> than 32 bits too
Okay here is another try (only for floatobject.c) for discussion. If it looks
good then I will submit a patch for float and complex objects. So do the same
for 'intpart' as was done for 'fractpart'.
static long
float_hash(v)
PyFloatObject *v;
{
double intpart, fractpart;
long x;
fractpart = modf(v->ob_fval, &intpart);
if (fractpart == 0.0) {
// ... snip ...
}
else {
int expo;
long hipart;
fractpart = frexp(fractpart, &expo);
fractpart = fractpart * 2147483648.0;
hipart = (long)fractpart;
fractpart = (fractpart - (double)hipart) * 2147483648.0;
x = hipart + (long)fractpart + (expo << 15); /* combine the fract parts */
intpart = frexp(intpart, &expo);
intpart = intpart * 2147483648.0;
hipart = (long)intpart;
intpart = (intpart - (double)hipart) * 2147483648.0;
x += hipart + (long)intpart + (expo << 15); /* add in the int parts */
}
if (x == -1)
x = -2;
return x;
}
> Note this consequence under the Win32 Python:
With this change, on Linux32:
>>> base = 2.**40 + 0.5
>>> base
1099511627776.5
>>> for i in range(32, 45):
... x = base + 2.**i
... print x, hash(x)
...
1.10380659507e+12 -2141945856
1.10810156237e+12 -2137751552
1.11669149696e+12 -2129362944
1.13387136614e+12 -2112585728
1.16823110451e+12 -2079031296
1.23695058125e+12 -2011922432
1.37438953472e+12 -1877704704
1.64926744166e+12 -1609269248
2.19902325555e+12 -2146107392
3.29853488333e+12 -1609236480
5.49755813888e+12 -1877639168
9.89560464998e+12 -2011824128
1.86916976722e+13 -2078900224
On Linux64:
>>> base = 2.**40 + 0.5
>>> base
1099511627776.5
>>> for i in range(32, 45):
... x = base + 2.**i
... print x, hash(x)
...
1.10380659507e+12 2153021440
1.10810156237e+12 2157215744
1.11669149696e+12 2165604352
1.13387136614e+12 2182381568
1.16823110451e+12 2215936000
1.23695058125e+12 2283044864
1.37438953472e+12 2417262592
1.64926744166e+12 2685698048
2.19902325555e+12 2148859904
3.29853488333e+12 2685730816
5.49755813888e+12 2417328128
9.89560464998e+12 2283143168
1.86916976722e+13 2216067072
>-- and that should also fix your 64-bit woes "by magic".
>
As you can see it did not, but for another reason. The summation of the parts
overflows 'x'. Is this a problem? I.e., does it matter if a hash function
returns an overflowed integral value (my hash function ignorance is showing)?
And if this does not matter, does it matter that a hash returns different
values on different platforms?
> a hash function should never ignore any bit in its input.
Which brings up a question regarding instance_hash(), func_hash(),
meth_hash(), HKEY_hash() [or whatever it is called], and other which cast a
pointer to a long (discarding the upperhalf of the pointer on Win64). Do
these really need to be fixed. Am I nitpicking too much on this whole thing?
Thanks,
Trent
--
Trent Mick
trentm@activestate.com