[issue43475] Worst-case behaviour of hash collision with float NaN
Cong Ma
report at bugs.python.org
Sat Mar 13 06:23:46 EST 2021
Cong Ma <m.cong at protonmail.ch> added the comment:
> Idea: We could make this problem go away by making NaN a singleton.
Apart from the fact that NaN is not a specific value/object (as pointed out in the previous message by @mark.dickinson), currently each builtin singleton (None, True, False, etc.) in Python satisfies the following predicate:
`if s is a singleton, then s == s`
This is also satisfied by "flyweight" objects such as small ints.
It feels natural and unsurprising to have this, if not officially promised. Requiring NaN to be a singleton would violate this implicit understanding about singletons, and cause another surprise on top of the other surprises with NaNs (or with rich comparison in general).
Performance-wise, I think the current behaviour (returning _PyHASH_NAN == 0) is already nested inside two layers of conditionals in `_Py_HashDouble()` in Python/pyhash.c:
```
if (!Py_IS_FINITE(v)) {
if (Py_IS_INFINITY(v))
return v > 0 ? _PyHASH_INF : -_PyHASH_INF;
else
return _PyHASH_NAN;
}
```
(v is the underlying C `double`, field `ob_fval` of `PyFloatObject`).
I don't feel performance would be hurt by rewriting `float_hash()` in Objects/floatobject.c to the effect of
```
if (!Py_IS_NAN(v->ob_fval)) {
return _Py_HashDouble(v->ob_fval)
}
else {
return _Py_HashPointer(v);
}
```
and simplify the conditionals in `_Py_HashDouble()`. But of course, only real benchmarking could tell. (Also, other float-like types would have to be adjusted, too).
----------
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue43475>
_______________________________________
More information about the Python-bugs-list
mailing list