Mailman 3 find_common_type broken? - NumPy-Discussion

newer
Re: [Numpy-discussion] My identity

find_common_type broken?

older
Update numpy.lib.ufunclike.log2

Ralf Gommers

12 Jul 2009 12 Jul '09

5:16 a.m.

...

...
...
np.find_common_type([np.ndarray], [])

...

...
...
np.find_common_type([np.ndarray, np.ma.MaskedArray, np.recarray], [])

...

...
...
np.find_common_type([], [np.int64, np.float64])

...

...
...
np.find_common_type([], [np.int64, np.float32])

Hi, While documenting find_common_type I found two problems. As I understand it, "common type" is a type to which all input types can be cast without loss of precision. 1. using any array types always returns "dtype('object')": dtype('object') dtype('object') 2. the second example below seems to be wrong, it should return dtype('float64'): dtype('float64') dtype('int64') One other question, why do type comparisons for numpy types and python built-ins do the opposite:

...

...
...
np.int32 > np.float32 False np.int64 > np.float64 False int > float True

The numpy result makes more sense to me, what's going on with the builtins? Ralf

Attachments:

attachment.htm (text/html — 1.1 KB)

Show replies by date

Citi, Luca

12 Jul 12 Jul

10:54 a.m.

Hi, I am not very confident with types but I will try to give you my opinion. As for the first part of the question

...

...
...
...
np.find_common_type([np.ndarray, np.ma.MaskedArray, np.recarray], []) dtype('object')

...

...
...
np.find_common_type([np.float64, np.int32], [])

I think that the first argument of np.find_common_type should be the type of an _element_ of the array, not the type of the array. In your case you are asking np.find_common_type the common type between an array of arrays, an array of masked arrays, and an array of record arrays. Therefore the best thing np can do is to find object as common type. Correctly: dtype('float64') As for the second part of the question np.find_common_type internally uses np.dtype(t) for each type in input. While the comparison between types work as expected:

...

...
...
np.complex128 > np.complex64 > np.float64 > np.float32 > np.int64 > np.int32 True

the comparison between dtype(t) gives different results:

...

...
...
np.dtype(np.float64) > np.dtype(np.int64) True np.dtype(np.float32) > np.dtype(np.int64) False np.dtype(np.float32) > np.dtype(np.int32) False np.dtype(np.float32) > np.dtype(np.int16) True

At first I thought the comparison was made based on the number of bits in the mantissa or the highest integer N for which N-1 was still representable. But then I could not explain the first result. What is surprising is that

...

...
...
np.dtype(np.float32) > np.dtype(np.int32) False np.dtype(np.float32) < np.dtype(np.int32) False np.dtype(np.float32) == np.dtype(np.int32) False

...

...
...
np.find_common_type([], [np.int64, np.float32])

...

...
...
np.find_common_type([], [np.float32, np.int64])

therefore the max() function in np.find_common_type cannot tell which to return, and returns the first. In fact: dtype('int64') but dtype('float32') which is unexpected. Best, Luca

Ralf Gommers

3:54 p.m.

On Sun, Jul 12, 2009 at 6:54 AM, Citi, Luca <lciti@essex.ac.uk> wrote:

...

Hi, I am not very confident with types but I will try to give you my opinion.

As for the first part of the question

...
...
...
...
np.find_common_type([np.ndarray, np.ma.MaskedArray, np.recarray], []) dtype('object')

I think that the first argument of np.find_common_type should be the type of an _element_ of the array, not the type of the array. In your case you are asking np.find_common_type the common type between an array of arrays, an array of masked arrays, and an array of record arrays. Therefore the best thing np can do is to find object as common type.

That is what I thought at first, but then what is the difference between array_types and scalar_types? Function signature is: *find_common_type(array_types, scalar_types)* np.float64, np.int32 etc are scalar types so I thought they should go in the second argument. Maybe something else is supposed to go into the array_types list, but I have no clue what if not actual array types.

...

Correctly:

...
...
...
np.find_common_type([np.float64, np.int32], []) dtype('float64')

As for the second part of the question np.find_common_type internally uses np.dtype(t) for each type in input.

While the comparison between types work as expected:

...
...
...
np.complex128 > np.complex64 > np.float64 > np.float32 > np.int64 > np.int32 True

yes, this makes sense.

...

the comparison between dtype(t) gives different results:

...
...
...
np.dtype(np.float64) > np.dtype(np.int64) True np.dtype(np.float32) > np.dtype(np.int64) False np.dtype(np.float32) > np.dtype(np.int32) False np.dtype(np.float32) > np.dtype(np.int16) True

At first I thought the comparison was made based on the number of bits in the mantissa or the highest integer N for which N-1 was still representable. But then I could not explain the first result.

What is surprising is that

...
...
...
np.dtype(np.float32) > np.dtype(np.int32) False np.dtype(np.float32) < np.dtype(np.int32) False np.dtype(np.float32) == np.dtype(np.int32) False

that is confusing. so I guess the dtype(t) conversion should not happen?

...

therefore the max() function in np.find_common_type cannot tell which to return, and returns the first.

In fact:

...
...
...
np.find_common_type([], [np.int64, np.float32]) dtype('int64')

but

...
...
...
np.find_common_type([], [np.float32, np.int64]) dtype('float32')

which is unexpected.

ah, missed that part. thanks, Luca. Ralf

...

Best, Luca _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Citi, Luca

5:24 p.m.

...

That is what I thought at first, but then what is the difference between array_types and scalar_types? Function signature is: *find_common_type(array_types, scalar_types)* As I understand it, the difference is that in the following case: np.choose(range(5), [np.arange(1,6), np.zeros(5, dtype=np.uint8), 1j*np.arange(5), 22, 1.5]) one should call: find_common_type([np.int64,np.uint8,np.complex128], [int,float])

...

...
...
np.find_common_type([np.int8,np.uint8], [])

...

...
...
np.find_common_type([np.uint8,np.int8], [])

I had a look at the code and it looks like dtype1 < dtype2 if dtype1 can safely be broadcasted to dtype2 As this is not the case, in either direction, for int32 and float32, then neither dtype(int32) < dtype(float32) nor dtype(int32) > dtype(float32) and this causes the problem you highlighted. I think in this case find_common_type should return float64. The same problem arises with: dtype('int8') dtype('uint8') here too, I think find_common_type should return e third type which is the "smallest" to which both can be safely broadcasted: int16. Best, Luca

Ralf Gommers

13 Jul 13 Jul

6:54 p.m.

On Sun, Jul 12, 2009 at 1:24 PM, Citi, Luca <lciti@essex.ac.uk> wrote:

...

...
That is what I thought at first, but then what is the difference between array_types and scalar_types? Function signature is: *find_common_type(array_types, scalar_types)* As I understand it, the difference is that in the following case: np.choose(range(5), [np.arange(1,6), np.zeros(5, dtype=np.uint8), 1j*np.arange(5), 22, 1.5]) one should call: find_common_type([np.int64,np.uint8,np.complex128], [int,float])

I had a look at the code and it looks like dtype1 < dtype2 if dtype1 can safely be broadcasted to dtype2

As this is not the case, in either direction, for int32 and float32, then neither dtype(int32) < dtype(float32) nor dtype(int32) > dtype(float32) and this causes the problem you highlighted.

I think in this case find_common_type should return float64. The same problem arises with:

...
...
...
np.find_common_type([np.int8,np.uint8], []) dtype('int8') np.find_common_type([np.uint8,np.int8], []) dtype('uint8')

here too, I think find_common_type should return e third type which is the "smallest" to which both can be safely broadcasted: int16.

...

Best, Luca

find_common_type() was added after a problem with r_ was reported in ticket 728. r_ still has a problem as well:

...

...
...
np.r_[1+1e-10, np.arange(2, dtype=np.float32)] - 1 array([ 0., -1., 0.], dtype=float32)

I summarized this discussion on the ticket and reopened it. Cheers, Ralf

...

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Travis Oliphant

16 Jul 16 Jul

4:39 a.m.

On Jul 13, 2009, at 1:54 PM, Ralf Gommers wrote:

...

On Sun, Jul 12, 2009 at 1:24 PM, Citi, Luca <lciti@essex.ac.uk> wrote:

...
That is what I thought at first, but then what is the difference between array_types and scalar_types? Function signature is: *find_common_type(array_types, scalar_types)* As I understand it, the difference is that in the following case: np.choose(range(5), [np.arange(1,6), np.zeros(5, dtype=np.uint8), 1j*np.arange(5), 22, 1.5]) one should call: find_common_type([np.int64,np.uint8,np.complex128], [int,float])

I had a look at the code and it looks like dtype1 < dtype2 if dtype1 can safely be broadcasted to dtype2

As this is not the case, in either direction, for int32 and float32, then neither dtype(int32) < dtype(float32) nor dtype(int32) > dtype(float32) and this causes the problem you highlighted.

I think in this case find_common_type should return float64. The same problem arises with:

...
...
...
np.find_common_type([np.int8,np.uint8], []) dtype('int8') np.find_common_type([np.uint8,np.int8], []) dtype('uint8')

here too, I think find_common_type should return e third type which is the "smallest" to which both can be safely broadcasted: int16.

Best, Luca

find_common_type() was added after a problem with r_ was reported in ticket 728. r_ still has a problem as well:

...
...
...
np.r_[1+1e-10, np.arange(2, dtype=np.float32)] - 1 array([ 0., -1., 0.], dtype=float32)

This is not a problem with r_. This is correct behavior. A scalar "float" will not cause an array "float32" to be upcast. Nonetheless, the OP did point out a flaw in find_common_type that has been fixed in r7133. Best regards, -Travis

Ralf Gommers

5:59 a.m.

On Thu, Jul 16, 2009 at 12:39 AM, Travis Oliphant <oliphant@enthought.com>wrote:

...

On Jul 13, 2009, at 1:54 PM, Ralf Gommers wrote:

On Sun, Jul 12, 2009 at 1:24 PM, Citi, Luca <lciti@essex.ac.uk> wrote:

...
...
That is what I thought at first, but then what is the difference between array_types and scalar_types? Function signature is: *find_common_type(array_types, scalar_types)* As I understand it, the difference is that in the following case: np.choose(range(5), [np.arange(1,6), np.zeros(5, dtype=np.uint8), 1j*np.arange(5), 22, 1.5]) one should call: find_common_type([np.int64,np.uint8,np.complex128], [int,float])

I had a look at the code and it looks like dtype1 < dtype2 if dtype1 can safely be broadcasted to dtype2

As this is not the case, in either direction, for int32 and float32, then neither dtype(int32) < dtype(float32) nor dtype(int32) > dtype(float32) and this causes the problem you highlighted.

I think in this case find_common_type should return float64. The same problem arises with:

...
...
...
np.find_common_type([np.int8,np.uint8], []) dtype('int8') np.find_common_type([np.uint8,np.int8], []) dtype('uint8')

here too, I think find_common_type should return e third type which is the "smallest" to which both can be safely broadcasted: int16.

...
Best, Luca

find_common_type() was added after a problem with r_ was reported in ticket 728. r_ still has a problem as well:

...
...
...
np.r_[1+1e-10, np.arange(2, dtype=np.float32)] - 1 array([ 0., -1., 0.], dtype=float32)

This is not a problem with r_. This is correct behavior. A scalar "float" will not cause an array "float32" to be upcast.

This was at first counter-intuitive but I found the reason for it in Guide to Numpy now: "Mixed scalar-array operations use a different set of casting rules that ensure that a scalar cannot upcast an array unless the scalar is of a fundamentally different kind of data (i.e. under a different hierarchy in the data type hierarchy) then the array. This rule enables you to use scalar constants in your code (which as Python types are interpreted accordingly in ufuncs) without worrying about whether the precision of the scalar constant will cause upcasting on your large (small precision) array." Makes sense.

...

Nonetheless, the OP did point out a flaw in find_common_type that has been fixed in r7133.

Great, it works for me now. There is still one rule I do not understand the reason for. Out of curiosity, what is the reason for this: In [16]: can_cast(int32, float32) Out[16]: False In [17]: can_cast(int64, float64) Out[17]: True Thanks, Ralf

...

Best regards,

-Travis

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Travis Oliphant

20 Jul 20 Jul

8 p.m.

On Jul 16, 2009, at 12:59 AM, Ralf Gommers wrote:

...

This is not a problem with r_. This is correct behavior. A scalar "float" will not cause an array "float32" to be upcast.

This was at first counter-intuitive but I found the reason for it in Guide to Numpy now:

"Mixed scalar-array operations use a different set of casting rules that ensure that a scalar cannot upcast an array unless the scalar is of a fundamentally different kind of data (i.e. under a different hierarchy in the data type hierarchy) then the array. This rule enables you to use scalar constants in your code (which as Python types are interpreted accordingly in ufuncs) without worrying about whether the precision of the scalar constant will cause upcasting on your large (small precision) array."

Makes sense.

Nonetheless, the OP did point out a flaw in find_common_type that has been fixed in r7133.

Great, it works for me now.

There is still one rule I do not understand the reason for. Out of curiosity, what is the reason for this: In [16]: can_cast(int32, float32) Out[16]: False In [17]: can_cast(int64, float64) Out[17]: True

To prevent proliferation of float128 or float96 (i.e. longdouble's in a commonly used case). Not a very pretty exceptional case, but definitely useful. Thanks, -Travis -- Travis Oliphant Enthought Inc. 1-512-536-1057 http://www.enthought.com oliphant@enthought.com

5404

Age (days ago)

5412

Last active (days ago)

List overview

Download

7 comments

3 participants

participants (3)

Citi, Luca
Ralf Gommers
Travis Oliphant

find_common_type broken?

Ralf Gommers

Citi, Luca

Ralf Gommers

Citi, Luca

Ralf Gommers

Travis Oliphant

Ralf Gommers

Travis Oliphant

tags

participants (3)