[issue43475] Worst-case behaviour of hash collision with float NaN

Sat Mar 13 06:23:46 EST 2021

Cong Ma <m.cong at protonmail.ch> added the comment:

> Idea:  We could make this problem go away by making NaN a singleton.

Apart from the fact that NaN is not a specific value/object (as pointed out in the previous message by @mark.dickinson), currently each builtin singleton (None, True, False, etc.) in Python satisfies the following predicate:

`if s is a singleton, then s == s`

This is also satisfied by "flyweight" objects such as small ints.

It feels natural and unsurprising to have this, if not officially promised. Requiring NaN to be a singleton would violate this implicit understanding about singletons, and cause another surprise on top of the other surprises with NaNs (or with rich comparison in general).

Performance-wise, I think the current behaviour (returning _PyHASH_NAN == 0) is already nested inside two layers of conditionals in `_Py_HashDouble()` in Python/pyhash.c:

```
    if (!Py_IS_FINITE(v)) {
        if (Py_IS_INFINITY(v))
            return v > 0 ? _PyHASH_INF : -_PyHASH_INF;
        else
            return _PyHASH_NAN;
    }
```
(v is the underlying C `double`, field `ob_fval` of `PyFloatObject`).

I don't feel performance would be hurt by rewriting `float_hash()` in Objects/floatobject.c to the effect of

```
    if (!Py_IS_NAN(v->ob_fval)) {
        return _Py_HashDouble(v->ob_fval)
    }
    else {
        return _Py_HashPointer(v);
    }
```
and simplify the conditionals in `_Py_HashDouble()`. But of course, only real benchmarking could tell. (Also, other float-like types would have to be adjusted, too).

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue43475>
_______________________________________