Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

3 Jan 2013

      2013/1/3 Andrew Collette <andrew.collette@gmail.com>:
...
Hi Dag,
...
If neither is objectively better, I think that is a very good reason to
kick it down to the user. "Explicit is better than implicit".
I agree with you, up to a point.  However, we are talking about an
extremely common operation that I think most people (myself included)
would not expect to raise an exception: namely, adding a number to an
array.
...
It's a good solution to encourage bug-free code. It may not be a good
solution to avoid typing.
Ha!  But seriously, checking every time I make an addition?  And in
the current version of numpy it's not buggy code to add 128 to an int8
array; it's documented to give you an int16 with the result of the
addition.  Maybe it shouldn't, but that's what it does.
...
I think you usually have a bug in your program when this happens, since
either the dtype is wrong, or the value one is trying to store is wrong.
I know that's true for myself, though I don't claim to know everybody
elses usecases.
I don't think it's unreasonable to add a number to an int16 array (or
int32), and rely on specific, documented behavior if the number is
outside the range.  For example, IDL will clip the value.  Up until
1.6, in NumPy it would roll over. Currently it upcasts.
I won't make the case for upcasting vs rollover again, as I think
that's dealt with extensively in the threads linked in the bug.  I am
concerned about the tests I need to add wherever I might have a
scalar, or the program blows up.
It occurs to me that, if I have "a = b + c" in my code, and "c" is
sometimes a scalar and sometimes an array, I will get different
behavior.  If I have this right, if "c" is an array of larger dtype,
including a 1-element array, it will upcast, if it's the same dtype,
it will roll over regardless, but if it's a scalar and the result
won't fit, it will raise ValueError.
By the way, how do I test for this?  I can't test just the scalar
because the proposed behavior (as I understand it) considers the
result of the addition.  Should I always compute amax (nanmax)? Do I
need to try adding them and look for ValueError?
And things like this suddenly become dangerous:
try:
    some_function(myarray + something)
except ValueError:
   print "Problem in some_function!"
Actually, the proposed behavior considers only the value of the
scalar, not the result of the addition.
So the correct way to do things with this proposal would be to be sure
you don't add to an array a scalar value that can't fit in the array's
dtype.

In 1.6.1, you should make this check anyway, since otherwise your
computation can be doing something completely different without
telling you (and I doubt it's what you'd want):
    In [50]: np.array([2], dtype='int8') + 127
    Out[50]: array([-127], dtype=int8)
    In [51]: np.array([2], dtype='int8') + 128
    Out[51]: array([130], dtype=int16)

If the decision is to always roll-over, the first thing to decide is
whether this means the scalar is downcasted, or the output of the
computation. It doesn't matter for +, but for instance for the
"maximum" ufunc, I don't think it makes sense to perform the
computation at higher precision then downcast the output, as you would
otherwise have:
    np.maximum(np.ones(1, dtype='int8'), 128)) == [-128]
So out of consistency (across ufuncs) I think it should always
downcast the scalar (it has the advantage of being more efficient too,
since you don't need to do an upcast to perform the computation). But
then you're up for some nasty surprise if your scalar overflows and
you didn't expect it. For instance the "maximum" example above would
return [1], which may be expected... or not (maybe you wanted to
obtain [128] instead?).

Another solution is to forget about trying to be smart and always
upcast the operation. That would be my 2nd preferred solution, but it
would make it very annoying to deal with Python scalars (typically
int64 / float64) that would be upcasting lots of things, potentially
breaking a significant amount of existing code.

So, personally, I don't see a straightforward solution without
warning/error, that would be safe enough for programmers.

-=- Olivier
...
Nathaniel asked:
...
But if this is something you're running into in practice then you may have a better idea than us about the practical effects. Do you have any examples where this has come up that you can share?
The only time I really ran into the 1.5/1.6 change was some old code
ported from IDL which did odd things with the wrapping behavior.  But
what I'm really trying to get a handle on here is the proposed future
behavior.  I am coming to this from the perspective of both a user and
a library developer (h5py) trying to work out what if anything I have
to do when handling arrays and values I get from users.
Andrew

Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

Olivier Delalleau