
Hi, On Mon, Nov 12, 2012 at 1:11 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Mon, Nov 12, 2012 at 8:54 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
I wanted to check that everyone knows about and is happy with the scalar casting changes from 1.6.0.
Specifically, the rules for (array, scalar) casting have changed such that the resulting dtype depends on the _value_ of the scalar.
Mark W has documented these changes here:
http://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html http://docs.scipy.org/doc/numpy/reference/generated/numpy.promote_types.html
Specifically, as of 1.6.0:
In [19]: arr = np.array([1.], dtype=np.float32)
In [20]: (arr + (2**16-1)).dtype Out[20]: dtype('float32')
In [21]: (arr + (2**16)).dtype Out[21]: dtype('float64')
In [25]: arr = np.array([1.], dtype=np.int8)
In [26]: (arr + 127).dtype Out[26]: dtype('int8')
In [27]: (arr + 128).dtype Out[27]: dtype('int16')
There's discussion about the changes here:
http://mail.scipy.org/pipermail/numpy-discussion/2011-September/058563.html http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html http://mail.scipy.org/pipermail/numpy-discussion/2012-February/060381.html
It seems to me that this change is hard to explain, and does what you want only some of the time, making it a false friend.
The old behaviour was that in these cases, the scalar was always cast to the type of the array, right? So np.array([1], dtype=np.int8) + 256 returned 1? Is that the behaviour you prefer?
Right. In that case of course, I'm getting something a bit nasty. But if you're working with int8, I think you expect to be careful of overflow. And you may well not want an automatic and maybe surprising upcast to int16.
I agree that the 1.6 behaviour is surprising and somewhat inconsistent. There are many places where you can get an overflow in numpy, and in all the other cases we just let the overflow happen. And in fact you can still get an overflow with arr + scalar operations, so this doesn't really fix anything.
Right - it's a half-fix, which seems to me worse than no fix.
I find the specific handling of unsigned -> signed and float32 -> float64 upcasting confusing as well. (Sure, 2**16 isn't exactly representable as a float32, but it doesn't *overflow*, it just gives you 2.0**16... if I'm using float32 then I presumably don't care that much about exact representability, so it's surprising that numpy is working to enforce it, and definitely a separate decision from what to do about overflow.)
None of those threads seem to really get into the question of what the best behaviour here *is*, though.
Possibly the most defensible choice is to treat ufunc(arr, scalar) operations as performing an implicit cast of the scalar to arr's dtype, and using the standard implicit casting rules -- which I think means, raising an error if !can_cast(scalar, arr.dtype, casting="safe")
You mean: In [25]: arr = np.array([1.], dtype=np.int8) In [27]: arr + 128 ValueError - cannot safely cast 128 to array dtype int8? That would be a major change. If I really wanted to do that, would you then suggest I cast to an array? arr + np.array([128]) It would be very good to make a well-argued long-term decision, whatever the chosen outcome. Maybe this is the place for a partly retrospective NEP? Best, Matthew