Re: [Numpy-discussion] dtype comparison, hash

17 Jan 2012

      On Tue, Jan 17, 2012 at 05:11, Andreas Kloeckner
 wrote:
...
Hi Robert,
On Fri, 30 Dec 2011 20:05:14 +0000, Robert Kern  wrote:
...
On Fri, Dec 30, 2011 at 18:57, Andreas Kloeckner
 wrote:
...
Hi Robert,
On Tue, 27 Dec 2011 10:17:41 +0000, Robert Kern  wrote:
...
On Tue, Dec 27, 2011 at 01:22, Andreas Kloeckner
 wrote:
...
Hi all,
Two questions:
- Are dtypes supposed to be comparable (i.e. implement '==', '!=')?
Yes.
...
- Are dtypes supposed to be hashable?
Yes, with caveats. Strictly speaking, we violate the condition that
objects that equal each other should hash equal since we define == to
be rather free. Namely,
  np.dtype(x) == x
for all objects x that can be converted to a dtype.
  np.dtype(float) == np.dtype('float')
  np.dtype(float) == float
  np.dtype(float) == 'float'
Since hash(float) != hash('float') we cannot implement
np.dtype.__hash__() to follow the stricture that objects that compare
equal should hash equal.
However, if you restrict the domain of objects to just dtypes (i.e.
only consider dicts that use only actual dtype objects as keys instead
of arbitrary mixtures of objects), then the stricture is obeyed. This
is a useful domain that is used internally in numpy.
Is this the problem that you found?
Thanks for the reply.
It doesn't seem like this is our issue--instead, we're encountering two
different dtype objects that claim to be float64, compare as equal, but
don't hash to the same value.
I've asked the user who encountered the user to investigate, and I'll
be back with more detail in a bit.
I think we've run into this before and tried to fix it. Try to find
the version of numpy the user has and a minimal example, if you can.
This is what Thomas found:
http://projects.scipy.org/numpy/ticket/2017
It looks like the .flags attribute is different between np.uintp and
np.uint32. The .flags attribute forms part of the hashed information
about the dtype (or PyArray_Descr at the C-level).

[~]
|15> np.dtype(np.uintp).flags
1536

[~]
|16> np.dtype(np.uint32).flags
2048

The same goes for np.intp and np.int32 in numpy 1.6.1 on OS X, so
unlike the comment in the ticket, they do have different hashes for
me.

However, diving through the source a bit, I'm not entirely sure I
trust the values being given at the Python level. It appears that the
flag member of the PyArray_Descr struct is declared as a char.
However, it is exposed as a T_INT member in the PyMemberDef table by
direct addressing. Basically, a Python descriptor gets added to the
np.dtype type that will look up sizeof(long) bytes from the starting
position of the flags member in the struct. This includes 3 bytes of
the following type_num member. Obviously, 2048 does not fit into a
char. Nonetheless, the type_num is also part of the hash, so either
the flags member or the type_num member is different between the two.

Two bugs for the price of one!

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco