[Numpy-discussion] numpy type mismatch

Fri Jun 10 23:34:05 EDT 2011

2011/6/10 Benjamin Root <ben.root at ou.edu>

>
>
> On Fri, Jun 10, 2011 at 9:29 PM, Olivier Delalleau <shish at keba.be> wrote:
>
>>
>> 2011/6/10 Olivier Delalleau <shish at keba.be>
>>
>>> 2011/6/10 Charles R Harris <charlesr.harris at gmail.com>
>>>
>>>>
>>>>
>>>> On Fri, Jun 10, 2011 at 5:19 PM, Olivier Delalleau <shish at keba.be>wrote:
>>>>
>>>>> 2011/6/10 Charles R Harris <charlesr.harris at gmail.com>
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 10, 2011 at 3:43 PM, Benjamin Root <ben.root at ou.edu>wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jun 10, 2011 at 3:24 PM, Charles R Harris <
>>>>>>> charlesr.harris at gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jun 10, 2011 at 2:17 PM, Benjamin Root <ben.root at ou.edu>wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jun 10, 2011 at 3:02 PM, Charles R Harris <
>>>>>>>>> charlesr.harris at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jun 10, 2011 at 1:50 PM, Benjamin Root <ben.root at ou.edu>wrote:
>>>>>>>>>>
>>>>>>>>>>> Came across an odd error while using numpy master.  Note, my
>>>>>>>>>>> system is 32-bits.
>>>>>>>>>>>
>>>>>>>>>>> >>> import numpy as np
>>>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.int32)) == np.int32
>>>>>>>>>>> False
>>>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.int64)) == np.int64
>>>>>>>>>>> True
>>>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.float32)) == np.float32
>>>>>>>>>>> True
>>>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.float64)) == np.float64
>>>>>>>>>>> True
>>>>>>>>>>>
>>>>>>>>>>> So, only the summation performed with a np.int32 accumulator
>>>>>>>>>>> results in a type that doesn't match the expected type.  Now, for even more
>>>>>>>>>>> strangeness:
>>>>>>>>>>>
>>>>>>>>>>> >>> type(np.sum([1, 2, 3], dtype=np.int32))
>>>>>>>>>>> <type 'numpy.int32'>
>>>>>>>>>>> >>> hex(id(type(np.sum([1, 2, 3], dtype=np.int32))))
>>>>>>>>>>> '0x9599a0'
>>>>>>>>>>> >>> hex(id(np.int32))
>>>>>>>>>>> '0x959a80'
>>>>>>>>>>>
>>>>>>>>>>> So, the type from the sum() reports itself as a numpy int, but
>>>>>>>>>>> its memory address is different from the memory address for np.int32.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> One of them is probably a long, print out the typecode,
>>>>>>>>>> dtype.char.
>>>>>>>>>>
>>>>>>>>>> Chuck
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Good intuition, but odd result...
>>>>>>>>>
>>>>>>>>> >>> import numpy as np
>>>>>>>>> >>> a = np.sum([1, 2, 3], dtype=np.int32)
>>>>>>>>> >>> b = np.int32(6)
>>>>>>>>> >>> type(a)
>>>>>>>>> <type 'numpy.int32'>
>>>>>>>>> >>> type(b)
>>>>>>>>> <type 'numpy.int32'>
>>>>>>>>> >>> a.dtype.char
>>>>>>>>> 'i'
>>>>>>>>> >>> b.dtype.char
>>>>>>>>> 'l'
>>>>>>>>>
>>>>>>>>> So, the standard np.int32 is getting listed as a long somehow?  To
>>>>>>>>> further investigate:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Yes, long shifts around from int32 to int64 depending on the OS. For
>>>>>>>> instance, in 64 bit Windows it's 32 bits while in 64 bit Linux it's 64 bits.
>>>>>>>> On 32 bit systems it is 32 bits.
>>>>>>>>
>>>>>>>> Chuck
>>>>>>>>
>>>>>>>>
>>>>>>> Right, that makes sense.  But, the question is why does sum() put out
>>>>>>> a result dtype that is not identical to the dtype that I requested, or even
>>>>>>> the dtype of the input array?  Could this be an indication of a bug
>>>>>>> somewhere?  Even if the bug is harmless (it was only noticed within the test
>>>>>>> suite of larry), is this unexpected?
>>>>>>>
>>>>>>>
>>>>>> I expect sum is using a ufunc and it acts differently on account of
>>>>>> the cleanup of the ufunc casting rules. And yes, a long *is* int32 on your
>>>>>> machine. On mine
>>>>>>
>>>>>> In [4]: dtype('q') # long long
>>>>>> Out[4]: dtype('int64')
>>>>>>
>>>>>> In [5]: dtype('l') # long
>>>>>> Out[5]: dtype('int64')
>>>>>>
>>>>>> The mapping from C types to numpy width types isn't 1-1. Personally, I
>>>>>> think we should drop long ;) But it used to be the standard Python type in
>>>>>> the C API. Mark has also pointed out the problems/confusion this ambiguity
>>>>>> causes and someday we should probably think it out and fix it. But I don't
>>>>>> think it is the most pressing problem.
>>>>>>
>>>>>> Chuck
>>>>>>
>>>>>>
>>>>> But isn't it a bug if numpy.dtype('i') != numpy.dtype('l') on a 32 bit
>>>>> computer where both are int32?
>>>>>
>>>>>
>>>> Maybe yes, maybe no ;) They have different descriptors, so from numpy's
>>>> perspective they are different, but at the hardware/precision level they are
>>>> the same. It's more of a decision as to what  != means in this case. Since
>>>> numpy started as Numeric with only the c types the current behavior is
>>>> consistent, but that doesn't mean it shouldn't change at some point.
>>>>
>>>> Chuck
>>>>
>>>
>>> Well apparently it was actually changed recently, since in Numpy 1.5.1 on
>>> a Windows 32 bit machine, they are considered equal with '=='.
>>> Personally I think if the string representation of two dtypes is "int32",
>>> then they should be ==, otherwise it wouldn't make much sense given that you
>>> can directly test the equality of a dtype with a string like "int32" (like
>>> dtype('i') == "int32" and dtype('l') == "int32").
>>>
>>
>> I also just checked on a fresh install of numpy 1.6.0 on python 3.2, and
>> both types are equal as well.
>>
>
> Are you talking about the release of 1.6, or the continued development
> branch?  This is happening to me on the master branch, but I have not tried
> earlier versions.  Again, I think this bolsters the evidence that this is
> from a (very) recent change.
>
>
>> I've been playing quite a bit with numpy dtypes and it's the first time I
>> hear two dtypes representing the exact same kind of data do not compare
>> equal, so I'm still enclined to believe it should be considered a bug.
>>
>>
> Quite honestly, I really don't care that the dtypes aren't equal.  I
> usually work at a purely python level and performing actions based on types
> is generally bad practice anyway.  Anytime that I (rarely) check types, I
> would use isinstance() against one of the core numerical types rather than a
> numpy type.  The fact that I even found this issue was completely by
> accident while investigating a test failure in larry.
>
> What concerns me more is that the type coming from the ufunc is not the
> same type that went in, or even requested through the dtype argument.  I
> think *that* should be the main concern here, and should probably be tested
> for in the unit tests.
>
> Ben Root
>

The project I'm working on (http://deeplearning.net/software/theano/)
heavily relies on dtype.__eq__, because it uses typed objects associated to
data of e.g. int32 or float64 types, and it needs to know if the provided
numpy arrays are of the proper type.
So we do a lot of comparisons like:
   array.dtype == "int32"

I'd be curious to know, in your case, what is the output of the following
lines:
numpy.dtype('i') == "int32"
numpy.dtype('l') == "int32"
str(numpy.dtype('i'))
str(numpy.dtype('l'))

-=- Olivier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110610/056cec89/attachment.html>