[Numpy-discussion] A change with minor compatibility questions
Dag Sverre Seljebotn
d.s.seljebotn at astro.uio.no
Wed Oct 17 12:56:36 EDT 2012
On 10/17/2012 05:22 PM, Travis Oliphant wrote:
> Hey all,
>
> https://github.com/numpy/numpy/pull/482
>
> is a pull request that changes the hash function for numpy void
> scalars. These are the objects returned from fully indexing a
> structured array: array[i] if array is a 1-d structured array.
>
> Currently their hash function just hashes the pointer to the underlying
> data. This means that void scalars can be used as keys in a
> dictionary but the behavior is non-intuitive because another void scalar
> with the same data but pointing to a different region of memory will
> hash differently.
>
> The pull request makes it so that two void scalars with the same data
> will hash to the same value (using the same algorithm as a tuple hash).
> This pull request also only allows read-only scalars to be hashed.
>
> There is a small chance this will break someone's code if they relied on
> this behavior. I don't believe anyone is currently relying on this
> behavior -- but I've been proven wrong before. What do people on this
> list think?
I support working on fixing this, but if I understand your fix correctly
this change just breaks things in a different way.
Specifically, in this example:
arr = np.ones(4, dtype=[('a', np.int64)])
x = arr[0]
d = { x : 'value' }
arr[0]['a'] = 4
print d[x]
Does the last line raise a KeyError? If I understand correctly it does.
(Of course, the current situation just breaks lookups in another situation.)
I propose to BOTH make "x" unhashable (thus being a good Python citizen
and following Python rules) AND provide "x.askey()" or "x.immutable()"
which returns something immutable you can use as a key.
The places where that breaks things is probably buggy code that must be
fixed (either one way or the other) anyway. Perhaps a warning period is
in order then (one would raise a warning in __hash__, telling people to
use the "askey()" method).
(I would really prefer to always have "x" be immutable, but that
probably breaks working code.)
Dag Sverre
More information about the NumPy-Discussion
mailing list