[Numpy-discussion] Regression: in-place operations (possibly intentional)

Benjamin Root ben.root at ou.edu
Tue Sep 18 16:52:05 EDT 2012


On Tue, Sep 18, 2012 at 4:42 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Tue, Sep 18, 2012 at 2:33 PM, Travis Oliphant <travis at continuum.io>wrote:
>
>>
>> On Sep 18, 2012, at 2:44 PM, Charles R Harris wrote:
>>
>>
>>
>> On Tue, Sep 18, 2012 at 1:35 PM, Benjamin Root <ben.root at ou.edu> wrote:
>>
>>>
>>>
>>> On Tue, Sep 18, 2012 at 3:25 PM, Charles R Harris <
>>> charlesr.harris at gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Sep 18, 2012 at 1:13 PM, Benjamin Root <ben.root at ou.edu> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Sep 18, 2012 at 2:47 PM, Charles R Harris <
>>>>> charlesr.harris at gmail.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 18, 2012 at 11:39 AM, Benjamin Root <ben.root at ou.edu>wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Sep 17, 2012 at 9:33 PM, Charles R Harris <
>>>>>>> charlesr.harris at gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Sep 17, 2012 at 3:40 PM, Travis Oliphant <
>>>>>>>> travis at continuum.io> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sep 17, 2012, at 8:42 AM, Benjamin Root wrote:
>>>>>>>>>
>>>>>>>>> > Consider the following code:
>>>>>>>>> >
>>>>>>>>> > import numpy as np
>>>>>>>>> > a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>>>>>>>>> > a *= float(255) / 15
>>>>>>>>> >
>>>>>>>>> > In v1.6.x, this yields:
>>>>>>>>> > array([17, 34, 51, 68, 85], dtype=int16)
>>>>>>>>> >
>>>>>>>>> > But in master, this throws an exception about failing to cast
>>>>>>>>> via same_kind.
>>>>>>>>> >
>>>>>>>>> > Note that numpy was smart about this operation before, consider:
>>>>>>>>> > a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>>>>>>>>> > a *= float(128) / 256
>>>>>>>>>
>>>>>>>>> > yields:
>>>>>>>>> > array([0, 1, 1, 2, 2], dtype=int16)
>>>>>>>>> >
>>>>>>>>> > Of course, this is different than if one does it in a
>>>>>>>>> non-in-place manner:
>>>>>>>>> > np.array([1, 2, 3, 4, 5], dtype=np.int16) * 0.5
>>>>>>>>> >
>>>>>>>>> > which yields an array with floating point dtype in both
>>>>>>>>> versions.  I can appreciate the arguments for preventing this kind of
>>>>>>>>> implicit casting between non-same_kind dtypes, but I argue that because the
>>>>>>>>> operation is in-place, then I (as the programmer) am explicitly stating
>>>>>>>>> that I desire to utilize the current array to store the results of the
>>>>>>>>> operation, dtype and all.  Obviously, we can't completely turn off this
>>>>>>>>> rule (for example, an in-place addition between integer array and a
>>>>>>>>> datetime64 makes no sense), but surely there is some sort of happy medium
>>>>>>>>> that would allow these sort of operations to take place?
>>>>>>>>> >
>>>>>>>>> > Lastly, if it is determined that it is desirable to allow
>>>>>>>>> in-place operations to continue working like they have before, I would like
>>>>>>>>> to see such a fix in v1.7 because if it isn't in 1.7, then other libraries
>>>>>>>>> (such as matplotlib, where this issue was first found) would have to change
>>>>>>>>> their code anyway just to be compatible with numpy.
>>>>>>>>>
>>>>>>>>> I agree that in-place operations should allow different casting
>>>>>>>>> rules.  There are different opinions on this, of course, but generally this
>>>>>>>>> is how NumPy has worked in the past.
>>>>>>>>>
>>>>>>>>> We did decide to change the default casting rule to "same_kind"
>>>>>>>>> but making an exception for in-place seems reasonable.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think that in these cases same_kind will flag what are most
>>>>>>>> likely programming errors and sloppy code. It is easy to be explicit and
>>>>>>>> doing so will make the code more readable because it will be immediately
>>>>>>>> obvious what the multiplicand is without the need to recall what the numpy
>>>>>>>> casting rules are in this exceptional case. IISTR several mentions of this
>>>>>>>> before (Gael?), and in some of those cases it turned out that bugs were
>>>>>>>> being turned up. Catching bugs with minimal effort is a good thing.
>>>>>>>>
>>>>>>>> Chuck
>>>>>>>>
>>>>>>>>
>>>>>>> True, it is quite likely to be a programming error, but then again,
>>>>>>> there are many cases where it isn't.  Is the problem strictly that we are
>>>>>>> trying to downcast the float to an int, or is it that we are trying to
>>>>>>> downcast to a lower precision?  Is there a way for one to explicitly relax
>>>>>>> the same_kind restriction?
>>>>>>>
>>>>>>
>>>>>> I think the problem is down casting across kinds, with the result
>>>>>> that floats are truncated and the imaginary parts of imaginaries might be
>>>>>> discarded. That is, the value, not just the precision, of the rhs changes.
>>>>>> So I'd favor an explicit cast in code like this, i.e., cast the rhs to an
>>>>>> integer.
>>>>>>
>>>>>> It is true that this forces downstream to code up to a higher
>>>>>> standard, but I don't see that as a bad thing, especially if it exposes
>>>>>> bugs. And it isn't difficult to fix.
>>>>>>
>>>>>> Chuck
>>>>>>
>>>>>>
>>>>> Mind you, in my case, casting the rhs as an integer before doing the
>>>>> multiplication would be a bug, since our value for the rhs is usually
>>>>> between zero and one.  Multiplying first by the integer numerator before
>>>>> dividing by the integer denominator would likely cause issues with
>>>>> overflowing the 16 bit integer.
>>>>>
>>>>>
>>>> For the case in point I'd do
>>>>
>>>> In [1]: a = np.array([1, 2, 3, 4, 5], dtype=np.int16)
>>>>
>>>> In [2]: a //= 2
>>>>
>>>> In [3]: a
>>>> Out[3]: array([0, 1, 1, 2, 2], dtype=int16)
>>>>
>>>> Although I expect you would want something different in practice. But
>>>> the current code already looks fragile to me and I think it is a good thing
>>>> you are taking a closer look at it. If you really intend going through a
>>>> float, then it should be something like
>>>>
>>>> a = (a*(float(128)/256)).astype(int16)
>>>>
>>>> Chuck
>>>>
>>>>
>>> And thereby losing the memory benefit of an in-place multiplication?
>>>
>>
>> What makes you think you are getting that? I'd have to check the numpy  C
>> source, but I expect the multiplication is handled just as I wrote it out.
>> I don't recall any loops that handle mixed types likes that. I'd like to
>> see some, though, scaling integers is a common problem.
>>
>>
>>
>>
>>> That is sort of the point of all this.  We are using 16 bit integers
>>> because we wanted to be as efficient as possible and didn't need anything
>>> larger.  Note, that is what we changed the code to, I am just wondering if
>>> we are being too cautious.  The casting kwarg looks to be what I might
>>> want, though it isn't as clean as just writing an "*=" statement.
>>>
>>>
>> I think even there you will have an intermediate float array followed by
>> a cast.
>>
>>
>> This is true, but it is done in chunks of a fixed size (controllable by a
>> thread-local variable or keyword argument to the ufunc).
>>
>> How difficult would it be to change in-place operations back to the
>> "unsafe" default?
>>
>
> Probably not too difficult, but I think it would be a mistake. What
> keyword argument are you referring to? In the current case, I think what is
> wanted is a scaling function that will actually do things in place. The
> matplotlib folks would probably be happier with the result if they simply
> coded up a couple of small Cython routines to do that.
>
> Chuck
>
>
As far as matplotlib is concerned, the problem was solved when we reverted
a change.  The issue that I am raising is that it was such an innocuous,
and frankly, obvious change to do an in-place operation in the first
place.  I have to wonder if we are being overly cautious with "same_kind".
You are right, we probably would benefit greatly from creating some CXX
scaling functions (contrary to popular belief, we don't use Cython),
however, I would imagine that such general-purpose function would fare
better within NumPy.  But, ultimately, Python is about there being one
right way of doing something, and so I think the goal should be to have a
somewhat more restrictive casting rule than "unsafe" for in-place
operations, but restrictive enough to catch the sort of errors "same_kind"
was catching.  This way, I have one way of doing an inplace operation,
regardless of the types of my operands.

Cheers,
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120918/b8657d85/attachment.html>


More information about the NumPy-Discussion mailing list