[Numpy-discussion] Selecting columns of a matrix

Thu Jun 22 06:26:18 EDT 2006

'''
The following mail is a bit long and tedious to read, sorry about
that. Here is the abstract:
   "I would like boolean indexing to work like slices and not like
arrays of indices"
'''

hi,

I'm _really_ sorry to insist, but I have been thinking on it and I
don't feel like replacing <boolean> with nonzero(<boolean>) is what we
want.

For me this is a bad trick equivalent to replacing slices to arrays of
indices with r_[<slice>]:
- it works only if you do that for a single axis.

Let me explain:
if i have an array,

>>> from numpy import *
>>> a = arange(12).reshape(3,4)

i can slice it:

>>> a[1:3,0:3]
array([[ 4,  5,  6],
       [ 8,  9, 10]])

i can define boolean arrays 'equivalent' to this slices

>>> b1 = array([False,True,True])             # equivalent to 1:3
>>> b2 = array([True,True,True,False])      # equivalent to 0:3

now if i use one of this boolean arrays for indexing, all work like with slices:

>>> a[b1,:]                     #same as a[1:3,:]
array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> a[:,b2]                     # same as a[:,0:3]
array([[ 0,  1,  2],
       [ 4,  5,  6],
       [ 8,  9, 10]])

but if I use both at the same time:

>>> a[b1,b2]                 # not equivalent to a[1:3,0:3] but to
a[r_[1:3],r_[0:3]]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ValueError: shape mismatch: objects cannot be broadcast to a single shape

it doesn't work because nonzero(b1) and nonzero(b2) have different shapes.
if I want the equivalent to a[1:3,1:3], i can do

>>> a[ix_(b1,b2)]
array([[ 4,  5,  6],
       [ 8,  9, 10]])

I can not see when the current behaviour of a[b1,b2] would be used.
>From my (probably naive) point of view, <boolean> should not be
converted to nonzero(<boolean>), but to some kind of slicing object.
In that way boolean indexing could work like slices and not like
arrays of integers, which will be more intuitive for me.

Converting slices to arrays of indices is a trick that only works for one axis:

>>> a[r_[1:3],0:3]           #same as a[1:3,0:3]
array([[ 4,  5,  6],
       [ 8,  9, 10]])
>>> a[1:3,r_[0:3]]            #same as a[1:3,0:3]
array([[ 4,  5,  6],
       [ 8,  9, 10]])
>>> a[r_[1:3],r_[0:3]]       # NOT same as a[1:3,0:3]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ValueError: shape mismatch: objects cannot be broadcast to a single shape

am I completly wrong??
may be the current behaviour (only usefull for one axis) is enought??

sorry for asking things and not giving solutions and thanks for everything.

pau

PD: I noticed that the following code works
>>> a[a>4,:,:,:,:,1:2:3,...,4:5:6]
array([ 5,  6,  7,  8,  9, 10, 11])