[Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

Andrew Collette andrew.collette at gmail.com
Thu Jan 3 20:15:41 EST 2013


Hi Dag,

> If neither is objectively better, I think that is a very good reason to
> kick it down to the user. "Explicit is better than implicit".

I agree with you, up to a point.  However, we are talking about an
extremely common operation that I think most people (myself included)
would not expect to raise an exception: namely, adding a number to an
array.

> It's a good solution to encourage bug-free code. It may not be a good
> solution to avoid typing.

Ha!  But seriously, checking every time I make an addition?  And in
the current version of numpy it's not buggy code to add 128 to an int8
array; it's documented to give you an int16 with the result of the
addition.  Maybe it shouldn't, but that's what it does.

> I think you usually have a bug in your program when this happens, since
> either the dtype is wrong, or the value one is trying to store is wrong.
> I know that's true for myself, though I don't claim to know everybody
> elses usecases.

I don't think it's unreasonable to add a number to an int16 array (or
int32), and rely on specific, documented behavior if the number is
outside the range.  For example, IDL will clip the value.  Up until
1.6, in NumPy it would roll over. Currently it upcasts.

I won't make the case for upcasting vs rollover again, as I think
that's dealt with extensively in the threads linked in the bug.  I am
concerned about the tests I need to add wherever I might have a
scalar, or the program blows up.

It occurs to me that, if I have "a = b + c" in my code, and "c" is
sometimes a scalar and sometimes an array, I will get different
behavior.  If I have this right, if "c" is an array of larger dtype,
including a 1-element array, it will upcast, if it's the same dtype,
it will roll over regardless, but if it's a scalar and the result
won't fit, it will raise ValueError.

By the way, how do I test for this?  I can't test just the scalar
because the proposed behavior (as I understand it) considers the
result of the addition.  Should I always compute amax (nanmax)? Do I
need to try adding them and look for ValueError?

And things like this suddenly become dangerous:

try:
    some_function(myarray + something)
except ValueError:
   print "Problem in some_function!"

Nathaniel asked:

> But if this is something you're running into in practice then you may have a better idea than us about the practical effects. Do you have any examples where this has come up that you can share?

The only time I really ran into the 1.5/1.6 change was some old code
ported from IDL which did odd things with the wrapping behavior.  But
what I'm really trying to get a handle on here is the proposed future
behavior.  I am coming to this from the perspective of both a user and
a library developer (h5py) trying to work out what if anything I have
to do when handling arrays and values I get from users.

Andrew



More information about the NumPy-Discussion mailing list