On Tue, Jan 17, 2012 at 9:28 AM, Robert Kern <robert.kern@gmail.com> wrote:
On Tue, Jan 17, 2012 at 05:11, Andreas Kloeckner <lists@informa.tiker.net> wrote:
Hi Robert,
On Fri, 30 Dec 2011 20:05:14 +0000, Robert Kern <robert.kern@gmail.com> wrote:
On Fri, Dec 30, 2011 at 18:57, Andreas Kloeckner <lists@informa.tiker.net> wrote:
Hi Robert,
On Tue, 27 Dec 2011 10:17:41 +0000, Robert Kern <robert.kern@gmail.com> wrote:
On Tue, Dec 27, 2011 at 01:22, Andreas Kloeckner <lists@informa.tiker.net> wrote:
Hi all,
Two questions:
- Are dtypes supposed to be comparable (i.e. implement '==', '!=')?
Yes.
- Are dtypes supposed to be hashable?
Yes, with caveats. Strictly speaking, we violate the condition that objects that equal each other should hash equal since we define == to be rather free. Namely,
np.dtype(x) == x
for all objects x that can be converted to a dtype.
np.dtype(float) == np.dtype('float') np.dtype(float) == float np.dtype(float) == 'float'
Since hash(float) != hash('float') we cannot implement np.dtype.__hash__() to follow the stricture that objects that compare equal should hash equal.
However, if you restrict the domain of objects to just dtypes (i.e. only consider dicts that use only actual dtype objects as keys instead of arbitrary mixtures of objects), then the stricture is obeyed. This is a useful domain that is used internally in numpy.
Is this the problem that you found?
Thanks for the reply.
It doesn't seem like this is our issue--instead, we're encountering two different dtype objects that claim to be float64, compare as equal, but don't hash to the same value.
I've asked the user who encountered the user to investigate, and I'll be back with more detail in a bit.
I think we've run into this before and tried to fix it. Try to find the version of numpy the user has and a minimal example, if you can.
This is what Thomas found:
It looks like the .flags attribute is different between np.uintp and np.uint32. The .flags attribute forms part of the hashed information about the dtype (or PyArray_Descr at the C-level).
[~] |15> np.dtype(np.uintp).flags 1536
[~] |16> np.dtype(np.uint32).flags 2048
The same goes for np.intp and np.int32 in numpy 1.6.1 on OS X, so unlike the comment in the ticket, they do have different hashes for me.
However, diving through the source a bit, I'm not entirely sure I trust the values being given at the Python level. It appears that the flag member of the PyArray_Descr struct is declared as a char. However, it is exposed as a T_INT member in the PyMemberDef table by direct addressing. Basically, a Python descriptor gets added to the np.dtype type that will look up sizeof(long) bytes from the starting position of the flags member in the struct. This includes 3 bytes of the following type_num member. Obviously, 2048 does not fit into a char. Nonetheless, the type_num is also part of the hash, so either the flags member or the type_num member is different between the two.
Two bugs for the price of one!
Good catch ! So basically, the flag was changed from a char to an int back to a char, and some of the code did not follow. I could not really follow the exact history from the log alone, but basically: - there is indeed a char vs int discrepency (T_INT vs char) - in most dtype functions handling the flag variable, temporary computation were made with an int (but every possible flag combination can fit in a char) - quite a few usage of "i" instead of "c" in PyArg_ParseTuple and PyBuild_Value. Even after all those things, the original bug is there, because uintp and uin32 have different typenum, even in 32 bits. I would actually consider this a big in PyArray_EquivTypes, but changing this now may be quite disrupting. Shall I remove type_num from the hash input (in which case the bug would be fixed) ? David