[Numpy-discussion] NaN as dictionary key?

Thu Apr 23 09:59:39 EDT 2009

2009/4/20 Wes McKinney <wesmckinn at gmail.com>:
> I assume that, because NaN != NaN, even though both have the same hash value
> (hash(NaN) == -32768), that Python treats any NaN double as a distinct key
> in a dictionary.
>
> In [76]: a = np.repeat(nan, 10)
>
> In [77]: d = {}
>
> In [78]: for i, v in enumerate(a):
>    ....:     d[v] = i
>    ....:
>    ....:
>
> In [79]: d
> Out[79]:
> {nan: 0,
>  nan: 1,
>  nan: 6,
>  nan: 4,
>  nan: 3,
>  nan: 9,
>  nan: 7,
>  nan: 2,
>  nan: 8,
>  nan: 5}
>
> I'm not sure if this ever worked in a past version of NumPy, however, I have
> code which does a "group by value" and occasionally in the real world those
> values are NaN. Any ideas or a way around this problem?

For non hashable keys, I convert them to string, eg with repr or str
or some other string representation for floating point.

I use it for example to feed it to unique1d.

Josef

>>> a
array([ NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN])
>>> np.unique1d(a)
array([ NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN,  NaN])

using type string is not good with nan (automatic conversion of nans in casting)
>>> np.unique1d(a.astype(str))
array(['1'],
      dtype='|S1')
>>> a.astype(str)
array(['1', '1', '1', '1', '1', '1', '1', '1', '1', '1'],
      dtype='|S1')

>>> np.unique1d([repr(ii) for ii in a])
array(['nan'],
      dtype='|S3')

but nans don't round trip, is this intended (at least not on windows

>>> np.unique1d(np.arange(10).astype(str)).astype(float)
array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])
>>> np.all(np.array([repr(ii) for ii in np.pi*np.arange(10)]).astype(float) == np.pi*np.arange(10))
True

>>> np.unique1d([repr(ii) for ii in a]).astype(float)
Traceback (most recent call last):
  File "<pyshell#120>", line 1, in <module>
ValueError: invalid literal for float(): nan