[Python-Dev] Re: [Patches] fix float_hash and complex_hash for 64-bit *nix

Thu, 11 May 2000 22:58:35 -0400

[Guido]
> I have to admit I have no clue about the details of this debate any
> more,

Na, there's no debate here.  I believe I confused things by misunderstanding
what Trent's original claim was (sorry, Trent!), but we bumped into real
flaws in the current hash anyway (even on 32-bit machines).  I don't think
there's any actual disagreement about anything here.

> and I'm cowardly awaiting a patch submission that Tim approves
> of.

As am I <wink>.

> (I'm hoping a day will come when Tim can check it in himself. :-)

Well, all you have to do to make that happen is get a real job and then hire
me <wink>.

> In the mean time, I'd like to emphasize the key invariant here: we
> must ensure that (a==b) => (hash(a)==hash(b)).

Absolutely.  That's already true, and is so non-controversial that Trent
elided ("...") the code for that in his last post.

> One quick way to deal with this could be the following pseudo C:
>
>     PyObject *double_hash(double x)
>     {
>         long l = (long)x;
>         if ((double)l == x)
> 	    return long_hash(l);
> 	...double-specific code...
>     }
>
> This code makes one assumption: that if there exists a long l equal to
> a double x, the cast (long)x should yield l...

No, that fails on two counts:

1.  If x is "too big" to fit in a long (and a great many doubles are),
    the cast to long is undefined.  Don't know about all current platforms,
    but on the KSR platform such casts raised a fatal hardware
    exception.  The current code already accomplishes this part in a
    safe way (which Trent's patch improves by using a symbol instead of
    the current hard-coded hex constant).

2.  The key invariant needs to be preserved also when x is an exact
    integral value that happens to be (possibly very!) much bigger than
    a C long; e.g.,

>>> long(1.23e300)  # 1.23e300 is an integer! albeit not the one you think
12299999999999999456195024356787918820614965027709909500456844293279
60298864608335541984218516600989160291306221939122973741400364055485
57167627474369519296563706976894811817595986395177079943535811102573
51951343133141138298152217970719263233891682157645730823560232757272
73837119288529943287157489664L
>>> hash(1.23e300) == hash(_)
1
>>>

The current code already handles that correctly too.  All the problems occur
when the double has a non-zero fractional part, and Trent knows how to fix
that now.  hash(x) may differ across platforms because sizeof(long) differs
across platforms, but that's just as true of strings as floats (i.e., Python
has never computed platform-independent hashes -- if that bothers *you*
(doesn't bother me), that's the part you should chime in on).