
On Tue, Jan 17, 2012 at 05:11, Andreas Kloeckner lists@informa.tiker.net wrote:
Hi Robert,
On Fri, 30 Dec 2011 20:05:14 +0000, Robert Kern robert.kern@gmail.com wrote:
On Fri, Dec 30, 2011 at 18:57, Andreas Kloeckner lists@informa.tiker.net wrote:
Hi Robert,
On Tue, 27 Dec 2011 10:17:41 +0000, Robert Kern robert.kern@gmail.com wrote:
On Tue, Dec 27, 2011 at 01:22, Andreas Kloeckner lists@informa.tiker.net wrote:
Hi all,
Two questions:
- Are dtypes supposed to be comparable (i.e. implement '==', '!=')?
Yes.
- Are dtypes supposed to be hashable?
Yes, with caveats. Strictly speaking, we violate the condition that objects that equal each other should hash equal since we define == to be rather free. Namely,
np.dtype(x) == x
for all objects x that can be converted to a dtype.
np.dtype(float) == np.dtype('float') np.dtype(float) == float np.dtype(float) == 'float'
Since hash(float) != hash('float') we cannot implement np.dtype.__hash__() to follow the stricture that objects that compare equal should hash equal.
However, if you restrict the domain of objects to just dtypes (i.e. only consider dicts that use only actual dtype objects as keys instead of arbitrary mixtures of objects), then the stricture is obeyed. This is a useful domain that is used internally in numpy.
Is this the problem that you found?
Thanks for the reply.
It doesn't seem like this is our issue--instead, we're encountering two different dtype objects that claim to be float64, compare as equal, but don't hash to the same value.
I've asked the user who encountered the user to investigate, and I'll be back with more detail in a bit.
I think we've run into this before and tried to fix it. Try to find the version of numpy the user has and a minimal example, if you can.
This is what Thomas found:
It looks like the .flags attribute is different between np.uintp and np.uint32. The .flags attribute forms part of the hashed information about the dtype (or PyArray_Descr at the C-level).
[~] |15> np.dtype(np.uintp).flags 1536
[~] |16> np.dtype(np.uint32).flags 2048
The same goes for np.intp and np.int32 in numpy 1.6.1 on OS X, so unlike the comment in the ticket, they do have different hashes for me.
However, diving through the source a bit, I'm not entirely sure I trust the values being given at the Python level. It appears that the flag member of the PyArray_Descr struct is declared as a char. However, it is exposed as a T_INT member in the PyMemberDef table by direct addressing. Basically, a Python descriptor gets added to the np.dtype type that will look up sizeof(long) bytes from the starting position of the flags member in the struct. This includes 3 bytes of the following type_num member. Obviously, 2048 does not fit into a char. Nonetheless, the type_num is also part of the hash, so either the flags member or the type_num member is different between the two.
Two bugs for the price of one!