[Numpy-discussion] Logical indexing and higher-dimensional arrays.

Wed Feb 8 13:25:09 EST 2012

On Wed, Feb 8, 2012 at 8:49 AM, Travis Oliphant <travis at continuum.io> wrote:

>
> On Feb 8, 2012, at 8:29 AM, Sturla Molden wrote:
>
> > On 08.02.2012 06:01, Travis Oliphant wrote:
> >
> >> Recall that the shape of the output with fancy indexing is determined
> by broadcasting together the indexing objects and using that as the shape
> of the output:
> >>
> >> x[ind1, ind2] will produce an output with the shape of "broadcast(ind1,
> ind2)" whose elements are selected by the broadcasted tuple.
> >
> >
> >
> >
> > Thank you for the explanation, Travis.
> >
> > I think my main confusion is why we broadcast indices at all.
> > Broadcasting is something I would expect to happen for binary operations
> > between ndarrays, not for indexing.
>
> We broadcast because of the "zip-based" semantics of current
> fancy-indexing which is basically an element-by-element operation:
>  x[ind1, ind2]  will choose it's elements out of x by taken elements from
> ind1 as the first index and elements out of ind2 as the second.
>
> As an essentially element-by-element operation, if the shapes of the input
> arrays are not exactly the same, it is natural to broadcast them to the
> same shape.   This is fairly intuitive if you are used to broadcasting in
> NumPy.   The problem is that it does not coincide other intuitions in
> certain cases.
>
> This behavior actually allows easy-to-access "cross-product" behavior by
> broadcasting a 1-d ind1 shaped as (4,) with a ind2 shaped as (4,1).  The
> numpy.ix_  function does the basic reshaping needed so that 0:2, 0:2
> matches [0,1],[0,1] indexing:   x[ix_([0,1],[0,1])] is the same as
> x[0:2,0:2].
>
> There are also some very nice applications where you can select out of a
> 3-d volume a depth-surface defined by indexes like so:
>
>        arr[ i[:,newaxis], j, depth]
>
> where arr is a 3-d array,  i and j are 1-d index arrays: i =
> arange(arr.shape[0]) and j = arange(arr.shape[1]), and depth is a 2-d array
> of "depths".   The selected result will be 2-d.
>
> When you understand what is happening, it is consistent and it does make
> sense and it has some very nice applications.    I recognize that people
> get confused because their expectations are different, but the current
> behavior can be understood with a fairly simple mental model.
>
> I could support, however, a move to push this style of indexing to a
> method, and make fancy-indexing use cross-product semantics if:
>
>        * we do it in a phased-manner with lot's of deprecation warnings
> and even a tool that helps you change code (the tool would have to "play"
> your code and make a note of the locations where this style of indexing was
> used --- because static code analysis wouldn't know which objects were
> arrays and which weren't).
>
>        * we also make the ndarray object general enough so that
> fancy-indexing could be returned as a view on the array (the same as
> regular indexing)
>
> This sort of thing would take time, but is not out of the question in my
> mind because I suspect the number of users and use-cases of "broadcasted"
> fancy-indexing is small.
>
> -Travis
>
>

Speaking for myself, I've used both methods quite often. For the
broadcasting method, it is very useful for image processing scenarios where
you want to extract a small subregion with an arbitrary shape. It is also
extremely useful for this to work both ways, for example if I wanted to do
some processing on that subregion and then put it back in the larger image:

sub_reg_vals = img[ ind1, ind2 ]
do_stuff(sub_reg_vals)
img[ ind1, ind2 ] = sub_reg_vals

Of course this may require awareness of the coordinates versus flat index,
but the ravel/unravel_index functions do this for you. Note that IDL works
in this way, unlike MATLAB. The "quick and dirty" way I've been doing it in
MATLAB is

sub_reg_vals = diag( img[ind1, ind2] )

Which works but is obviously a bit inefficient.

I have no convincing ideas about which method should be the "default", but
I think both methods get a fair bit of use, so as long as there are fast
and well documented methods for doing either, I would be happy. I actually
had no idea that np.ix_ did the other method in NumPy. (Thanks for
mentioning that! I had been using A[ind1,:][:,ind2]). There is no generic
terminology for these operations, so I think that generates a lot of
confusion.

Thanks,
Aronne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120208/f92ea733/attachment.html>