[Numpy-discussion] Regression: in-place operations (possibly intentional)
Travis Oliphant
travis at continuum.io
Tue Sep 18 16:33:10 EDT 2012
On Sep 18, 2012, at 2:44 PM, Charles R Harris wrote:
>
>
> On Tue, Sep 18, 2012 at 1:35 PM, Benjamin Root <ben.root at ou.edu> wrote:
>
>
> On Tue, Sep 18, 2012 at 3:25 PM, Charles R Harris <charlesr.harris at gmail.com> wrote:
>
>
> On Tue, Sep 18, 2012 at 1:13 PM, Benjamin Root <ben.root at ou.edu> wrote:
>
>
> On Tue, Sep 18, 2012 at 2:47 PM, Charles R Harris <charlesr.harris at gmail.com> wrote:
>
>
> On Tue, Sep 18, 2012 at 11:39 AM, Benjamin Root <ben.root at ou.edu> wrote:
>
>
> On Mon, Sep 17, 2012 at 9:33 PM, Charles R Harris <charlesr.harris at gmail.com> wrote:
>
>
> On Mon, Sep 17, 2012 at 3:40 PM, Travis Oliphant <travis at continuum.io> wrote:
>
> On Sep 17, 2012, at 8:42 AM, Benjamin Root wrote:
>
> > Consider the following code:
> >
> > import numpy as np
> > a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
> > a *= float(255) / 15
> >
> > In v1.6.x, this yields:
> > array([17, 34, 51, 68, 85], dtype=int16)
> >
> > But in master, this throws an exception about failing to cast via same_kind.
> >
> > Note that numpy was smart about this operation before, consider:
> > a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
> > a *= float(128) / 256
>
> > yields:
> > array([0, 1, 1, 2, 2], dtype=int16)
> >
> > Of course, this is different than if one does it in a non-in-place manner:
> > np.array([1, 2, 3, 4, 5], dtype=np.int16) * 0.5
> >
> > which yields an array with floating point dtype in both versions. I can appreciate the arguments for preventing this kind of implicit casting between non-same_kind dtypes, but I argue that because the operation is in-place, then I (as the programmer) am explicitly stating that I desire to utilize the current array to store the results of the operation, dtype and all. Obviously, we can't completely turn off this rule (for example, an in-place addition between integer array and a datetime64 makes no sense), but surely there is some sort of happy medium that would allow these sort of operations to take place?
> >
> > Lastly, if it is determined that it is desirable to allow in-place operations to continue working like they have before, I would like to see such a fix in v1.7 because if it isn't in 1.7, then other libraries (such as matplotlib, where this issue was first found) would have to change their code anyway just to be compatible with numpy.
>
> I agree that in-place operations should allow different casting rules. There are different opinions on this, of course, but generally this is how NumPy has worked in the past.
>
> We did decide to change the default casting rule to "same_kind" but making an exception for in-place seems reasonable.
>
> I think that in these cases same_kind will flag what are most likely programming errors and sloppy code. It is easy to be explicit and doing so will make the code more readable because it will be immediately obvious what the multiplicand is without the need to recall what the numpy casting rules are in this exceptional case. IISTR several mentions of this before (Gael?), and in some of those cases it turned out that bugs were being turned up. Catching bugs with minimal effort is a good thing.
>
> Chuck
>
>
> True, it is quite likely to be a programming error, but then again, there are many cases where it isn't. Is the problem strictly that we are trying to downcast the float to an int, or is it that we are trying to downcast to a lower precision? Is there a way for one to explicitly relax the same_kind restriction?
>
> I think the problem is down casting across kinds, with the result that floats are truncated and the imaginary parts of imaginaries might be discarded. That is, the value, not just the precision, of the rhs changes. So I'd favor an explicit cast in code like this, i.e., cast the rhs to an integer.
>
> It is true that this forces downstream to code up to a higher standard, but I don't see that as a bad thing, especially if it exposes bugs. And it isn't difficult to fix.
>
> Chuck
>
>
> Mind you, in my case, casting the rhs as an integer before doing the multiplication would be a bug, since our value for the rhs is usually between zero and one. Multiplying first by the integer numerator before dividing by the integer denominator would likely cause issues with overflowing the 16 bit integer.
>
>
> For the case in point I'd do
>
> In [1]: a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>
> In [2]: a //= 2
>
> In [3]: a
> Out[3]: array([0, 1, 1, 2, 2], dtype=int16)
>
> Although I expect you would want something different in practice. But the current code already looks fragile to me and I think it is a good thing you are taking a closer look at it. If you really intend going through a float, then it should be something like
>
> a = (a*(float(128)/256)).astype(int16)
>
> Chuck
>
>
> And thereby losing the memory benefit of an in-place multiplication?
>
> What makes you think you are getting that? I'd have to check the numpy C source, but I expect the multiplication is handled just as I wrote it out. I don't recall any loops that handle mixed types likes that. I'd like to see some, though, scaling integers is a common problem.
>
> That is sort of the point of all this. We are using 16 bit integers because we wanted to be as efficient as possible and didn't need anything larger. Note, that is what we changed the code to, I am just wondering if we are being too cautious. The casting kwarg looks to be what I might want, though it isn't as clean as just writing an "*=" statement.
>
>
> I think even there you will have an intermediate float array followed by a cast.
This is true, but it is done in chunks of a fixed size (controllable by a thread-local variable or keyword argument to the ufunc).
How difficult would it be to change in-place operations back to the "unsafe" default?
-Travis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120918/22a1594a/attachment.html>
More information about the NumPy-Discussion
mailing list