A change with minor compatibility questions
Hey all, https://github.com/numpy/numpy/pull/482 is a pull request that changes the hash function for numpy void scalars. These are the objects returned from fully indexing a structured array: array[i] if array is a 1-d structured array. Currently their hash function just hashes the pointer to the underlying data. This means that void scalars can be used as keys in a dictionary but the behavior is non-intuitive because another void scalar with the same data but pointing to a different region of memory will hash differently. The pull request makes it so that two void scalars with the same data will hash to the same value (using the same algorithm as a tuple hash). This pull request also only allows read-only scalars to be hashed. There is a small chance this will break someone's code if they relied on this behavior. I don't believe anyone is currently relying on this behavior -- but I've been proven wrong before. What do people on this list think? Should we raise a warning in the next release when a hash function on a void scalar is called or just make the change, put it in the release notes and make a few people change their code if needed. The problem was identified by a couple of users of NumPy currently which is why I think that people who have tried using numpy void scalars as keys aren't doing it right now but are instead converting them to tuples first. -Travis
On Wed, Oct 17, 2012 at 10:22:28AM -0500, Travis Oliphant wrote:
There is a small chance this will break someone's code if they relied on this behavior. I don't believe anyone is currently relying on this behavior -- but I've been proven wrong before. What do people on this list think?
I think that the change is acceptable. Thanks for asking!
Should we raise a warning in the next release when a hash function on a void scalar is called or just make the change, put it in the release notes and make a few people change their code if needed.
I think that there should be a section in the release notes that list such changes in a very visible way. I don't think that a warning is necessary here, but I guess that others might have a different point of view. G
On 10/17/2012 05:22 PM, Travis Oliphant wrote:
Hey all,
https://github.com/numpy/numpy/pull/482
is a pull request that changes the hash function for numpy void scalars. These are the objects returned from fully indexing a structured array: array[i] if array is a 1-d structured array.
Currently their hash function just hashes the pointer to the underlying data. This means that void scalars can be used as keys in a dictionary but the behavior is non-intuitive because another void scalar with the same data but pointing to a different region of memory will hash differently.
The pull request makes it so that two void scalars with the same data will hash to the same value (using the same algorithm as a tuple hash). This pull request also only allows read-only scalars to be hashed.
There is a small chance this will break someone's code if they relied on this behavior. I don't believe anyone is currently relying on this behavior -- but I've been proven wrong before. What do people on this list think?
I support working on fixing this, but if I understand your fix correctly this change just breaks things in a different way. Specifically, in this example: arr = np.ones(4, dtype=[('a', np.int64)]) x = arr[0] d = { x : 'value' } arr[0]['a'] = 4 print d[x] Does the last line raise a KeyError? If I understand correctly it does. (Of course, the current situation just breaks lookups in another situation.) I propose to BOTH make "x" unhashable (thus being a good Python citizen and following Python rules) AND provide "x.askey()" or "x.immutable()" which returns something immutable you can use as a key. The places where that breaks things is probably buggy code that must be fixed (either one way or the other) anyway. Perhaps a warning period is in order then (one would raise a warning in __hash__, telling people to use the "askey()" method). (I would really prefer to always have "x" be immutable, but that probably breaks working code.) Dag Sverre
On 10/17/2012 06:56 PM, Dag Sverre Seljebotn wrote:
On 10/17/2012 05:22 PM, Travis Oliphant wrote:
Hey all,
https://github.com/numpy/numpy/pull/482
is a pull request that changes the hash function for numpy void scalars. These are the objects returned from fully indexing a structured array: array[i] if array is a 1-d structured array.
Currently their hash function just hashes the pointer to the underlying data. This means that void scalars can be used as keys in a dictionary but the behavior is non-intuitive because another void scalar with the same data but pointing to a different region of memory will hash differently.
The pull request makes it so that two void scalars with the same data will hash to the same value (using the same algorithm as a tuple hash). This pull request also only allows read-only scalars to be hashed.
There is a small chance this will break someone's code if they relied on this behavior. I don't believe anyone is currently relying on this behavior -- but I've been proven wrong before. What do people on this list think?
I support working on fixing this, but if I understand your fix correctly this change just breaks things in a different way.
Specifically, in this example:
arr = np.ones(4, dtype=[('a', np.int64)]) x = arr[0] d = { x : 'value' } arr[0]['a'] = 4 print d[x]
Does the last line raise a KeyError? If I understand correctly it does.
Argh. I overlooked both Travis' second commit, and the explicit mention of read-only above. Isn't it possible to produce a read-only array from a writeable one though, and so get a read-only scalar whose underlying value can still change? Anyway, sorry about being so quick to post. Dag Sverre
On Oct 17, 2012, at 12:48 PM, Dag Sverre Seljebotn wrote:
On 10/17/2012 06:56 PM, Dag Sverre Seljebotn wrote:
On 10/17/2012 05:22 PM, Travis Oliphant wrote:
Hey all,
https://github.com/numpy/numpy/pull/482
is a pull request that changes the hash function for numpy void scalars. These are the objects returned from fully indexing a structured array: array[i] if array is a 1-d structured array.
Currently their hash function just hashes the pointer to the underlying data. This means that void scalars can be used as keys in a dictionary but the behavior is non-intuitive because another void scalar with the same data but pointing to a different region of memory will hash differently.
The pull request makes it so that two void scalars with the same data will hash to the same value (using the same algorithm as a tuple hash). This pull request also only allows read-only scalars to be hashed.
There is a small chance this will break someone's code if they relied on this behavior. I don't believe anyone is currently relying on this behavior -- but I've been proven wrong before. What do people on this list think?
I support working on fixing this, but if I understand your fix correctly this change just breaks things in a different way.
Specifically, in this example:
arr = np.ones(4, dtype=[('a', np.int64)]) x = arr[0] d = { x : 'value' } arr[0]['a'] = 4 print d[x]
Does the last line raise a KeyError? If I understand correctly it does.
Argh. I overlooked both Travis' second commit, and the explicit mention of read-only above.
Isn't it possible to produce a read-only array from a writeable one though, and so get a read-only scalar whose underlying value can still change?
Yes, it is possible to do that (just like it is currently possible to change a tuple with a C-extension or even Cython or a string with NumPy). We won't be able to prevent people from writing code that will have odd behavior, but we can communicate correctly about what one should do. -Travis
Anyway, sorry about being so quick to post.
Dag Sverre _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (4)
-
Dag Sverre Seljebotn
-
Gael Varoquaux
-
Travis Oliphant
-
Travis Oliphant