[Numpy-discussion] A change with minor compatibility questions

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Wed Oct 17 12:56:36 EDT 2012


On 10/17/2012 05:22 PM, Travis Oliphant wrote:
> Hey all,
>
> https://github.com/numpy/numpy/pull/482
>
> is  a pull request that changes the hash function for numpy void
> scalars.   These are the objects returned from fully indexing a
> structured array:  array[i] if array is a 1-d structured array.
>
> Currently their hash function just hashes the pointer to the underlying
> data.    This means that void scalars can be used as keys in a
> dictionary but the behavior is non-intuitive because another void scalar
> with the same data but pointing to a different region of memory will
> hash differently.
>
> The pull request makes it so that two void scalars with the same data
> will hash to the same value (using the same algorithm as a tuple hash).
>     This pull request also only allows read-only scalars to be hashed.
>
> There is a small chance this will break someone's code if they relied on
> this behavior.  I don't believe anyone is currently relying on this
> behavior -- but I've been proven wrong before.   What do people on this
> list think?

I support working on fixing this, but if I understand your fix correctly 
this change just breaks things in a different way.

Specifically, in this example:

arr = np.ones(4, dtype=[('a', np.int64)])
x = arr[0]
d = { x : 'value' }
arr[0]['a'] = 4
print d[x]

Does the last line raise a KeyError? If I understand correctly it does.

(Of course, the current situation just breaks lookups in another situation.)

I propose to BOTH make "x" unhashable (thus being a good Python citizen 
and following Python rules) AND provide "x.askey()" or "x.immutable()" 
which returns something immutable you can use as a key.

The places where that breaks things is probably buggy code that must be 
fixed (either one way or the other) anyway. Perhaps a warning period is 
in order then (one would raise a warning in __hash__, telling people to 
use the "askey()" method).

(I would really prefer to always have "x" be immutable, but that 
probably breaks working code.)

Dag Sverre



More information about the NumPy-Discussion mailing list