Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

4 Jan 2013

      2013/1/3 Andrew Collette :
...
...
Another solution is to forget about trying to be smart and always
upcast the operation. That would be my 2nd preferred solution, but it
would make it very annoying to deal with Python scalars (typically
int64 / float64) that would be upcasting lots of things, potentially
breaking a significant amount of existing code.
So, personally, I don't see a straightforward solution without
warning/error, that would be safe enough for programmers.
I guess what's really confusing me here is that I had assumed that this:
result = myarray + scalar
was equivalent to this:
result = myarray + numpy.array(scalar)
where the dtype of the converted scalar was chosen to be "just big
enough" for it to fit.  Then you proceed using the normal rules for
array addition.  Yes, you can have upcasting or rollover depending on
the values involved, but you have that anyway with array addition;
it's just how arrays work in NumPy.
A key difference is that with arrays, the dtype is not chosen "just
big enough" for your data to fit. Either you set the dtype yourself,
or you're using the default inferred dtype (int/float). In both cases
you should know what to expect, and it doesn't depend on the actual
numeric values (except for the auto int/float distinction).
...
Also, have I got this (proposed behavior) right?
array([127], dtype=int8) + 128 -> ValueError
array([127], dtype=int8) + 127 -> -2
It seems like all this does is raise an error when the current rules
would require upcasting, but still allows rollover for smaller values.
 What error condition, specifically, is the ValueError designed to
tell me about?   You can still get "unexpected" data (if you're not
expecting rollover) with no exception.
The ValueError is here to warn you that the operation may not be doing
what you want. The rollover for smaller values would be the documented
(and thus hopefully expected) behavior.

Taking the addition as an example may be misleading, as it makes it
look like we could just "always rollover" to obtain consistent
behavior, and programmers are to some extent used to integer rollover
on this kind of operation. However, I gave examples with "maximum"
that I believe show it's not that easy (this behavior would just
appear "wrong"). Another example is with the integer division, where
casting the scalar silently would result in
    array([-128], dtype=int8) // 128 -> [1]
which is unlikely to be something someone would like to obtain.

To summarize the goals of the proposal (in my mind):
1. Low cognitive load (simple and consistent across ufuncs).
2. Low risk of doing something unexpected.
3. Efficient by default.
4. Most existing (non buggy) code should not be affected.

If we always do the silent cast, it will significantly break existing
code relying on the 1.6 behavior, and increases the risk of doing
something unexpected (bad on #2 & #4)
If we always upcast, we may break existing code and lose efficiency
(bad on #3 and #4).
If we keep current behavior, we stay with something that's difficult
to understand and has high risk of doing weird things (bad on #1 and
#2).

-=- Olivier

Re: [Numpy-discussion] Do we want scalar casting to behave as it does at the moment?

Olivier Delalleau