2013/1/3 Andrew Collette <andrew.collette@gmail.com>:
Hi Dag,
If neither is objectively better, I think that is a very good reason to kick it down to the user. "Explicit is better than implicit".
I agree with you, up to a point. However, we are talking about an extremely common operation that I think most people (myself included) would not expect to raise an exception: namely, adding a number to an array.
It's a good solution to encourage bug-free code. It may not be a good solution to avoid typing.
Ha! But seriously, checking every time I make an addition? And in the current version of numpy it's not buggy code to add 128 to an int8 array; it's documented to give you an int16 with the result of the addition. Maybe it shouldn't, but that's what it does.
I think you usually have a bug in your program when this happens, since either the dtype is wrong, or the value one is trying to store is wrong. I know that's true for myself, though I don't claim to know everybody elses usecases.
I don't think it's unreasonable to add a number to an int16 array (or int32), and rely on specific, documented behavior if the number is outside the range. For example, IDL will clip the value. Up until 1.6, in NumPy it would roll over. Currently it upcasts.
I won't make the case for upcasting vs rollover again, as I think that's dealt with extensively in the threads linked in the bug. I am concerned about the tests I need to add wherever I might have a scalar, or the program blows up.
It occurs to me that, if I have "a = b + c" in my code, and "c" is sometimes a scalar and sometimes an array, I will get different behavior. If I have this right, if "c" is an array of larger dtype, including a 1-element array, it will upcast, if it's the same dtype, it will roll over regardless, but if it's a scalar and the result won't fit, it will raise ValueError.
By the way, how do I test for this? I can't test just the scalar because the proposed behavior (as I understand it) considers the result of the addition. Should I always compute amax (nanmax)? Do I need to try adding them and look for ValueError?
And things like this suddenly become dangerous:
try: some_function(myarray + something) except ValueError: print "Problem in some_function!"
Actually, the proposed behavior considers only the value of the scalar, not the result of the addition. So the correct way to do things with this proposal would be to be sure you don't add to an array a scalar value that can't fit in the array's dtype. In 1.6.1, you should make this check anyway, since otherwise your computation can be doing something completely different without telling you (and I doubt it's what you'd want): In [50]: np.array([2], dtype='int8') + 127 Out[50]: array([-127], dtype=int8) In [51]: np.array([2], dtype='int8') + 128 Out[51]: array([130], dtype=int16) If the decision is to always roll-over, the first thing to decide is whether this means the scalar is downcasted, or the output of the computation. It doesn't matter for +, but for instance for the "maximum" ufunc, I don't think it makes sense to perform the computation at higher precision then downcast the output, as you would otherwise have: np.maximum(np.ones(1, dtype='int8'), 128)) == [-128] So out of consistency (across ufuncs) I think it should always downcast the scalar (it has the advantage of being more efficient too, since you don't need to do an upcast to perform the computation). But then you're up for some nasty surprise if your scalar overflows and you didn't expect it. For instance the "maximum" example above would return [1], which may be expected... or not (maybe you wanted to obtain [128] instead?). Another solution is to forget about trying to be smart and always upcast the operation. That would be my 2nd preferred solution, but it would make it very annoying to deal with Python scalars (typically int64 / float64) that would be upcasting lots of things, potentially breaking a significant amount of existing code. So, personally, I don't see a straightforward solution without warning/error, that would be safe enough for programmers. -=- Olivier
Nathaniel asked:
But if this is something you're running into in practice then you may have a better idea than us about the practical effects. Do you have any examples where this has come up that you can share?
The only time I really ran into the 1.5/1.6 change was some old code ported from IDL which did odd things with the wrapping behavior. But what I'm really trying to get a handle on here is the proposed future behavior. I am coming to this from the perspective of both a user and a library developer (h5py) trying to work out what if anything I have to do when handling arrays and values I get from users.
Andrew