[Python-Dev] Re: [Patches] fix float_hash and complex_hash for 64-bit *nix

Wed, 10 May 2000 13:14:46 -0700

On Wed, May 10, 2000 at 12:53:16AM -0400, Tim Peters wrote:
> [Trent Mick]
> > Discussion:
> >
> > Okay, it is debatable to call float_hash and complex_hash broken,
> > but their code presumed that sizeof(long) was 32-bits. As a result
> > the hashed values for floats and complex values were not the same
> > on a 64-bit *nix system as on a 32-bit *nix system. With this
> > patch they are.
> 
> The goal is laudable but the analysis seems flawed.  For example, this new
> comment:

Firstly, I should have admitted my ignorance with regards to hash functions.

> Looks to me like the real problem in the original was here:
> 
>     x = hipart + (long)fractpart + (long)intpart + (expo << 15);
>                                    ^^^^^^^^^^^^^
> 
> The difficulty is that intpart may *not* fit in 32 bits, so the cast of
> intpart to long is ill-defined when sizeof(long) == 4.

> 
> That is, the hash function truly is broken for "large" values with a
> fractional part, and I expect your after-patch code suffers the same
> problem: 

Yes it did.

> The
> solution to this is to break intpart in this branch into pieces no larger
> than 32 bits too 

Okay here is another try (only for floatobject.c) for discussion. If it looks
good then I will submit a patch for float and complex objects. So do the same
for 'intpart' as was done for 'fractpart'.

static long
float_hash(v)
    PyFloatObject *v;
{
    double intpart, fractpart;
    long x;

    fractpart = modf(v->ob_fval, &intpart);

    if (fractpart == 0.0) {
		// ... snip ...
    }
    else {
        int expo;
        long hipart;

        fractpart = frexp(fractpart, &expo);
        fractpart = fractpart * 2147483648.0; 
        hipart = (long)fractpart; 
        fractpart = (fractpart - (double)hipart) * 2147483648.0;

        x = hipart + (long)fractpart + (expo << 15); /* combine the fract parts */

        intpart = frexp(intpart, &expo);
        intpart = intpart * 2147483648.0;
        hipart = (long)intpart;
        intpart = (intpart - (double)hipart) * 2147483648.0;

        x += hipart + (long)intpart + (expo << 15); /* add in the int parts */
    }
    if (x == -1)
        x = -2;
    return x;
}

> Note this consequence under the Win32 Python:

With this change, on Linux32:

>>> base = 2.**40 + 0.5
>>> base
1099511627776.5
>>> for i in range(32, 45):
...     x = base + 2.**i
...     print x, hash(x)
...
1.10380659507e+12 -2141945856
1.10810156237e+12 -2137751552
1.11669149696e+12 -2129362944
1.13387136614e+12 -2112585728
1.16823110451e+12 -2079031296
1.23695058125e+12 -2011922432
1.37438953472e+12 -1877704704
1.64926744166e+12 -1609269248
2.19902325555e+12 -2146107392
3.29853488333e+12 -1609236480
5.49755813888e+12 -1877639168
9.89560464998e+12 -2011824128
1.86916976722e+13 -2078900224

On Linux64:

>>> base = 2.**40 + 0.5
>>> base
1099511627776.5
>>> for i in range(32, 45):
...     x = base + 2.**i
...     print x, hash(x)
...
1.10380659507e+12 2153021440
1.10810156237e+12 2157215744
1.11669149696e+12 2165604352
1.13387136614e+12 2182381568
1.16823110451e+12 2215936000
1.23695058125e+12 2283044864
1.37438953472e+12 2417262592
1.64926744166e+12 2685698048
2.19902325555e+12 2148859904
3.29853488333e+12 2685730816
5.49755813888e+12 2417328128
9.89560464998e+12 2283143168
1.86916976722e+13 2216067072

>-- and that should also fix your 64-bit woes "by magic".
> 

As you can see it did not, but for another reason. The summation of the parts
overflows 'x'. Is this a problem? I.e., does it matter if a hash function
returns an overflowed integral value (my hash function ignorance is showing)?
And if this does not matter, does it matter that a hash returns different
values on different platforms?

> a hash function should never ignore any bit in its input. 

Which brings up a question regarding instance_hash(), func_hash(),
meth_hash(), HKEY_hash() [or whatever it is called], and other which cast a
pointer to a long (discarding the upperhalf of the pointer on Win64). Do
these really need to be fixed. Am I nitpicking too much on this whole thing?

Thanks,
Trent

-- 
Trent Mick
trentm@activestate.com