[Numpy-discussion] bug with assignment into an indexed array?

Mark Wiebe mwwiebe at gmail.com
Wed Aug 17 15:12:28 EDT 2011


On Wed, Aug 17, 2011 at 11:54 AM, Benjamin Root <ben.root at ou.edu> wrote:

> On Sat, Aug 13, 2011 at 7:17 PM, Mark Wiebe <mwwiebe at gmail.com> wrote:
>
>> On Thu, Aug 11, 2011 at 1:37 PM, Benjamin Root <ben.root at ou.edu> wrote:
>>
>>> On Thu, Aug 11, 2011 at 10:33 AM, Olivier Delalleau <shish at keba.be>wrote:
>>>
>>>> 2011/8/11 Benjamin Root <ben.root at ou.edu>
>>>>
>>>>>
>>>>>
>>>>> On Thu, Aug 11, 2011 at 8:37 AM, Olivier Delalleau <shish at keba.be>wrote:
>>>>>
>>>>>> Maybe confusing, but working as expected.
>>>>>>
>>>>>>
>>>>>> When you write:
>>>>>>   matched_to[np.array([0, 1, 2])] = 3
>>>>>> it calls __setitem__ on matched_to, with arguments (np.array([0, 1,
>>>>>> 2]), 3). So numpy understand you want to write 3 at these indices.
>>>>>>
>>>>>>
>>>>>> When you write:
>>>>>> matched_to[:3][match] = 3
>>>>>> it first calls __getitem__ with the slice as argument, which returns a
>>>>>> view of your array, then it calls __setitem__ on this view, and it fills
>>>>>> your matched_to array at the same time.
>>>>>>
>>>>>>
>>>>>> But when you write:
>>>>>>   matched_to[np.array([0, 1, 2])][match] = 3
>>>>>> it first calls __getitem__ with the array as argument, which retunrs a
>>>>>> *copy* of your array, so that calling __setitem__ on this copy has no effect
>>>>>> on your original array.
>>>>>>
>>>>>> -=- Olivier
>>>>>>
>>>>>>
>>>>> Right, but I guess my question is does it *have* to be that way?  I
>>>>> guess it makes some sense with respect to indexing with a numpy array like I
>>>>> did with the last example, because an element could be referred to multiple
>>>>> times (which explains the common surprise with '+='), but with boolean
>>>>> indexing, we are guaranteed that each element of the view will appear at
>>>>> most once.  Therefore, shouldn't boolean indexing always return a view, not
>>>>> a copy?  Is the general case of arbitrary array selection inherently
>>>>> impossible to encode in a view versus a slice with a regular spacing?
>>>>>
>>>>
>>>> Yes, due to the fact the array interface only supports regular spacing
>>>> (otherwise it is more difficult to get efficient access to arbitrary array
>>>> positions).
>>>>
>>>> -=- Olivier
>>>>
>>>>
>>> This still bothers me, though.  I imagine that it is next to impossible
>>> to detect this situation from numpy's perspective, so it can't even emit a
>>> warning or error. Furthermore, for someone who makes a general function to
>>> modify the contents of some externally provided array, there is a
>>> possibility that the provided array is actually a copy not a view.
>>> Although, I guess it is the responsibility of the user to know the
>>> difference.
>>>
>>> I guess that is the key problem.  The key advantage we are taught about
>>> numpy arrays is the use of views for efficient access.  It would seem that
>>> most access operations would use it, but in reality, only sliced access do.
>>> Everything else is a copy (unless you are doing fancy indexing with
>>> assignment).  Maybe with some of the forthcoming changes that have been done
>>> with respect to nditer and ufuncs (in particular, I am thinking of the
>>> "where" kwarg), maybe we could consider an enhancement allowing fancy
>>> indexing (or at least boolean indexing) to produce a view?  Even if it is
>>> less efficient than a view from slicing, it would bring better consistency
>>> in behavior between the different forms of indexing.
>>>
>>> Just my 2 cents,
>>> Ben Root
>>>
>>
>> I think it would be nice to evolve the NumPy indexing and array
>> representation towards the goal of indexing returning a view in all cases
>> with no exceptions. This would provide a much nicer mental model to program
>> with. Accomplishing such a transition will take a fair bit of time, though.
>>
>> -Mark
>>
>>
>
> Mark,
>
> It is good to know that there is a chance to make this possible,
> eventually.  However, I just thought of a possible barrier that might have
> to be overcome before achieving this.  Because it has always been very clear
> that non-slicing produces copies, I can easily imagine situations where
> developers have come to depend on this copying behavior.  While I think most
> copies are unintended (but unnoticed because it was read-only), it is quite
> possible that there are situations where this copy behavior is entirely
> intended.  Therefore, changing this behavior may break code in subtle ways.
>
> I am not saying that it shouldn't be done (clarity and simplicity should be
> paramount), but one should tread carefully here.
>

Absolutely. It would necessarily be very long term and the specifics of how
it could be done are nontrivial, but I figured it was worth mentioning the
idea.

-Mark


>
> My 2 cents,
> Ben Root
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110817/f8dac8aa/attachment.html>


More information about the NumPy-Discussion mailing list