find_common_type broken?
np.find_common_type([np.ndarray], [])
np.find_common_type([np.ndarray, np.ma.MaskedArray, np.recarray], [])
np.find_common_type([], [np.int64, np.float64])
np.find_common_type([], [np.int64, np.float32])
Hi, While documenting find_common_type I found two problems. As I understand it, "common type" is a type to which all input types can be cast without loss of precision. 1. using any array types always returns "dtype('object')": dtype('object') dtype('object') 2. the second example below seems to be wrong, it should return dtype('float64'): dtype('float64') dtype('int64') One other question, why do type comparisons for numpy types and python built-ins do the opposite:
np.int32 > np.float32 False np.int64 > np.float64 False int > float True
The numpy result makes more sense to me, what's going on with the builtins? Ralf
Hi, I am not very confident with types but I will try to give you my opinion. As for the first part of the question
np.find_common_type([np.ndarray, np.ma.MaskedArray, np.recarray], []) dtype('object')
np.find_common_type([np.float64, np.int32], [])
I think that the first argument of np.find_common_type should be the type of an _element_ of the array, not the type of the array. In your case you are asking np.find_common_type the common type between an array of arrays, an array of masked arrays, and an array of record arrays. Therefore the best thing np can do is to find object as common type. Correctly: dtype('float64') As for the second part of the question np.find_common_type internally uses np.dtype(t) for each type in input. While the comparison between types work as expected:
np.complex128 > np.complex64 > np.float64 > np.float32 > np.int64 > np.int32 True
the comparison between dtype(t) gives different results:
np.dtype(np.float64) > np.dtype(np.int64) True np.dtype(np.float32) > np.dtype(np.int64) False np.dtype(np.float32) > np.dtype(np.int32) False np.dtype(np.float32) > np.dtype(np.int16) True
At first I thought the comparison was made based on the number of bits in the mantissa or the highest integer N for which N-1 was still representable. But then I could not explain the first result. What is surprising is that
np.dtype(np.float32) > np.dtype(np.int32) False np.dtype(np.float32) < np.dtype(np.int32) False np.dtype(np.float32) == np.dtype(np.int32) False
np.find_common_type([], [np.int64, np.float32])
np.find_common_type([], [np.float32, np.int64])
therefore the max() function in np.find_common_type cannot tell which to return, and returns the first. In fact: dtype('int64') but dtype('float32') which is unexpected. Best, Luca
On Sun, Jul 12, 2009 at 6:54 AM, Citi, Luca <lciti@essex.ac.uk> wrote:
Hi, I am not very confident with types but I will try to give you my opinion.
As for the first part of the question
np.find_common_type([np.ndarray, np.ma.MaskedArray, np.recarray], []) dtype('object')
I think that the first argument of np.find_common_type should be the type of an _element_ of the array, not the type of the array. In your case you are asking np.find_common_type the common type between an array of arrays, an array of masked arrays, and an array of record arrays. Therefore the best thing np can do is to find object as common type.
That is what I thought at first, but then what is the difference between array_types and scalar_types? Function signature is: *find_common_type(array_types, scalar_types)* np.float64, np.int32 etc are scalar types so I thought they should go in the second argument. Maybe something else is supposed to go into the array_types list, but I have no clue what if not actual array types.
Correctly:
np.find_common_type([np.float64, np.int32], []) dtype('float64')
As for the second part of the question np.find_common_type internally uses np.dtype(t) for each type in input.
While the comparison between types work as expected:
np.complex128 > np.complex64 > np.float64 > np.float32 > np.int64 > np.int32 True
yes, this makes sense.
the comparison between dtype(t) gives different results:
np.dtype(np.float64) > np.dtype(np.int64) True np.dtype(np.float32) > np.dtype(np.int64) False np.dtype(np.float32) > np.dtype(np.int32) False np.dtype(np.float32) > np.dtype(np.int16) True
At first I thought the comparison was made based on the number of bits in the mantissa or the highest integer N for which N-1 was still representable. But then I could not explain the first result.
What is surprising is that
np.dtype(np.float32) > np.dtype(np.int32) False np.dtype(np.float32) < np.dtype(np.int32) False np.dtype(np.float32) == np.dtype(np.int32) False
that is confusing. so I guess the dtype(t) conversion should not happen?
therefore the max() function in np.find_common_type cannot tell which to return, and returns the first.
In fact:
np.find_common_type([], [np.int64, np.float32]) dtype('int64')
but
np.find_common_type([], [np.float32, np.int64]) dtype('float32')
which is unexpected.
ah, missed that part. thanks, Luca. Ralf
Best, Luca _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
That is what I thought at first, but then what is the difference between array_types and scalar_types? Function signature is: *find_common_type(array_types, scalar_types)* As I understand it, the difference is that in the following case: np.choose(range(5), [np.arange(1,6), np.zeros(5, dtype=np.uint8), 1j*np.arange(5), 22, 1.5]) one should call: find_common_type([np.int64,np.uint8,np.complex128], [int,float])
np.find_common_type([np.int8,np.uint8], [])
np.find_common_type([np.uint8,np.int8], [])
I had a look at the code and it looks like dtype1 < dtype2 if dtype1 can safely be broadcasted to dtype2 As this is not the case, in either direction, for int32 and float32, then neither dtype(int32) < dtype(float32) nor dtype(int32) > dtype(float32) and this causes the problem you highlighted. I think in this case find_common_type should return float64. The same problem arises with: dtype('int8') dtype('uint8') here too, I think find_common_type should return e third type which is the "smallest" to which both can be safely broadcasted: int16. Best, Luca
On Sun, Jul 12, 2009 at 1:24 PM, Citi, Luca <lciti@essex.ac.uk> wrote:
That is what I thought at first, but then what is the difference between array_types and scalar_types? Function signature is: *find_common_type(array_types, scalar_types)* As I understand it, the difference is that in the following case: np.choose(range(5), [np.arange(1,6), np.zeros(5, dtype=np.uint8), 1j*np.arange(5), 22, 1.5]) one should call: find_common_type([np.int64,np.uint8,np.complex128], [int,float])
I had a look at the code and it looks like dtype1 < dtype2 if dtype1 can safely be broadcasted to dtype2
As this is not the case, in either direction, for int32 and float32, then neither dtype(int32) < dtype(float32) nor dtype(int32) > dtype(float32) and this causes the problem you highlighted.
I think in this case find_common_type should return float64. The same problem arises with:
np.find_common_type([np.int8,np.uint8], []) dtype('int8') np.find_common_type([np.uint8,np.int8], []) dtype('uint8')
here too, I think find_common_type should return e third type which is the "smallest" to which both can be safely broadcasted: int16.
Best, Luca
find_common_type() was added after a problem with r_ was reported in ticket 728. r_ still has a problem as well:
np.r_[1+1e-10, np.arange(2, dtype=np.float32)] - 1 array([ 0., -1., 0.], dtype=float32)
I summarized this discussion on the ticket and reopened it. Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Jul 13, 2009, at 1:54 PM, Ralf Gommers wrote:
On Sun, Jul 12, 2009 at 1:24 PM, Citi, Luca <lciti@essex.ac.uk> wrote:
That is what I thought at first, but then what is the difference between array_types and scalar_types? Function signature is: *find_common_type(array_types, scalar_types)* As I understand it, the difference is that in the following case: np.choose(range(5), [np.arange(1,6), np.zeros(5, dtype=np.uint8), 1j*np.arange(5), 22, 1.5]) one should call: find_common_type([np.int64,np.uint8,np.complex128], [int,float])
I had a look at the code and it looks like dtype1 < dtype2 if dtype1 can safely be broadcasted to dtype2
As this is not the case, in either direction, for int32 and float32, then neither dtype(int32) < dtype(float32) nor dtype(int32) > dtype(float32) and this causes the problem you highlighted.
I think in this case find_common_type should return float64. The same problem arises with:
np.find_common_type([np.int8,np.uint8], []) dtype('int8') np.find_common_type([np.uint8,np.int8], []) dtype('uint8')
here too, I think find_common_type should return e third type which is the "smallest" to which both can be safely broadcasted: int16.
Best, Luca
find_common_type() was added after a problem with r_ was reported in ticket 728. r_ still has a problem as well:
np.r_[1+1e-10, np.arange(2, dtype=np.float32)] - 1 array([ 0., -1., 0.], dtype=float32)
This is not a problem with r_. This is correct behavior. A scalar "float" will not cause an array "float32" to be upcast. Nonetheless, the OP did point out a flaw in find_common_type that has been fixed in r7133. Best regards, -Travis
On Thu, Jul 16, 2009 at 12:39 AM, Travis Oliphant <oliphant@enthought.com>wrote:
On Jul 13, 2009, at 1:54 PM, Ralf Gommers wrote:
On Sun, Jul 12, 2009 at 1:24 PM, Citi, Luca <lciti@essex.ac.uk> wrote:
That is what I thought at first, but then what is the difference between array_types and scalar_types? Function signature is: *find_common_type(array_types, scalar_types)* As I understand it, the difference is that in the following case: np.choose(range(5), [np.arange(1,6), np.zeros(5, dtype=np.uint8), 1j*np.arange(5), 22, 1.5]) one should call: find_common_type([np.int64,np.uint8,np.complex128], [int,float])
I had a look at the code and it looks like dtype1 < dtype2 if dtype1 can safely be broadcasted to dtype2
As this is not the case, in either direction, for int32 and float32, then neither dtype(int32) < dtype(float32) nor dtype(int32) > dtype(float32) and this causes the problem you highlighted.
I think in this case find_common_type should return float64. The same problem arises with:
np.find_common_type([np.int8,np.uint8], []) dtype('int8') np.find_common_type([np.uint8,np.int8], []) dtype('uint8')
here too, I think find_common_type should return e third type which is the "smallest" to which both can be safely broadcasted: int16.
Best, Luca
find_common_type() was added after a problem with r_ was reported in ticket 728. r_ still has a problem as well:
np.r_[1+1e-10, np.arange(2, dtype=np.float32)] - 1 array([ 0., -1., 0.], dtype=float32)
This is not a problem with r_. This is correct behavior. A scalar "float" will not cause an array "float32" to be upcast.
This was at first counter-intuitive but I found the reason for it in Guide to Numpy now: "Mixed scalar-array operations use a different set of casting rules that ensure that a scalar cannot upcast an array unless the scalar is of a fundamentally different kind of data (i.e. under a different hierarchy in the data type hierarchy) then the array. This rule enables you to use scalar constants in your code (which as Python types are interpreted accordingly in ufuncs) without worrying about whether the precision of the scalar constant will cause upcasting on your large (small precision) array." Makes sense.
Nonetheless, the OP did point out a flaw in find_common_type that has been fixed in r7133.
Great, it works for me now. There is still one rule I do not understand the reason for. Out of curiosity, what is the reason for this: In [16]: can_cast(int32, float32) Out[16]: False In [17]: can_cast(int64, float64) Out[17]: True Thanks, Ralf
Best regards,
-Travis
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Jul 16, 2009, at 12:59 AM, Ralf Gommers wrote:
This is not a problem with r_. This is correct behavior. A scalar "float" will not cause an array "float32" to be upcast.
This was at first counter-intuitive but I found the reason for it in Guide to Numpy now:
"Mixed scalar-array operations use a different set of casting rules that ensure that a scalar cannot upcast an array unless the scalar is of a fundamentally different kind of data (i.e. under a different hierarchy in the data type hierarchy) then the array. This rule enables you to use scalar constants in your code (which as Python types are interpreted accordingly in ufuncs) without worrying about whether the precision of the scalar constant will cause upcasting on your large (small precision) array."
Makes sense.
Nonetheless, the OP did point out a flaw in find_common_type that has been fixed in r7133.
Great, it works for me now.
There is still one rule I do not understand the reason for. Out of curiosity, what is the reason for this: In [16]: can_cast(int32, float32) Out[16]: False In [17]: can_cast(int64, float64) Out[17]: True
To prevent proliferation of float128 or float96 (i.e. longdouble's in a commonly used case). Not a very pretty exceptional case, but definitely useful. Thanks, -Travis -- Travis Oliphant Enthought Inc. 1-512-536-1057 http://www.enthought.com oliphant@enthought.com
participants (3)
-
Citi, Luca
-
Ralf Gommers
-
Travis Oliphant