[Numpy-discussion] bug with assignment into an indexed array?

Thu Aug 11 16:37:26 EDT 2011

On Thu, Aug 11, 2011 at 10:33 AM, Olivier Delalleau <shish at keba.be> wrote:

> 2011/8/11 Benjamin Root <ben.root at ou.edu>
>
>>
>>
>> On Thu, Aug 11, 2011 at 8:37 AM, Olivier Delalleau <shish at keba.be> wrote:
>>
>>> Maybe confusing, but working as expected.
>>>
>>>
>>> When you write:
>>>   matched_to[np.array([0, 1, 2])] = 3
>>> it calls __setitem__ on matched_to, with arguments (np.array([0, 1, 2]),
>>> 3). So numpy understand you want to write 3 at these indices.
>>>
>>>
>>> When you write:
>>> matched_to[:3][match] = 3
>>> it first calls __getitem__ with the slice as argument, which returns a
>>> view of your array, then it calls __setitem__ on this view, and it fills
>>> your matched_to array at the same time.
>>>
>>>
>>> But when you write:
>>>   matched_to[np.array([0, 1, 2])][match] = 3
>>> it first calls __getitem__ with the array as argument, which retunrs a
>>> *copy* of your array, so that calling __setitem__ on this copy has no effect
>>> on your original array.
>>>
>>> -=- Olivier
>>>
>>>
>> Right, but I guess my question is does it *have* to be that way?  I guess
>> it makes some sense with respect to indexing with a numpy array like I did
>> with the last example, because an element could be referred to multiple
>> times (which explains the common surprise with '+='), but with boolean
>> indexing, we are guaranteed that each element of the view will appear at
>> most once.  Therefore, shouldn't boolean indexing always return a view, not
>> a copy?  Is the general case of arbitrary array selection inherently
>> impossible to encode in a view versus a slice with a regular spacing?
>>
>
> Yes, due to the fact the array interface only supports regular spacing
> (otherwise it is more difficult to get efficient access to arbitrary array
> positions).
>
> -=- Olivier
>
>
This still bothers me, though.  I imagine that it is next to impossible to
detect this situation from numpy's perspective, so it can't even emit a
warning or error. Furthermore, for someone who makes a general function to
modify the contents of some externally provided array, there is a
possibility that the provided array is actually a copy not a view.
Although, I guess it is the responsibility of the user to know the
difference.

I guess that is the key problem.  The key advantage we are taught about
numpy arrays is the use of views for efficient access.  It would seem that
most access operations would use it, but in reality, only sliced access do.
Everything else is a copy (unless you are doing fancy indexing with
assignment).  Maybe with some of the forthcoming changes that have been done
with respect to nditer and ufuncs (in particular, I am thinking of the
"where" kwarg), maybe we could consider an enhancement allowing fancy
indexing (or at least boolean indexing) to produce a view?  Even if it is
less efficient than a view from slicing, it would bring better consistency
in behavior between the different forms of indexing.

Just my 2 cents,
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110811/cc8fe4d8/attachment.html>