[Numpy-discussion] What is up with raw boolean indices (like a[False])?

Mon Jul 6 14:51:09 EDT 2020

On Mon, 2020-07-06 at 12:39 -0600, Aaron Meurer wrote:
> I've been trying to figure out this behavior. It doesn't seem to be
> documented at 
> https://numpy.org/doc/stable/reference/arrays.indexing.html
> 
> > > > a = np.empty((2, 3))
> > > > a.shape
> (2, 5)
> > > > a[True].shape
> (1, 2, 5)
> > > > a[False].shape
> (0, 2, 5)
> 
> It seems like indexing with a raw boolean (True or False) adds an
> axis
> with a dimension 1 or 0, resp.
> 
> Except it only works once:
> 
> > > > a[:,False]
> array([], shape=(2, 0, 3), dtype=float64)
> > > > a[:,False, False]
> array([], shape=(2, 0, 3), dtype=float64)
> > > > a[:,False,True].shape
> (2, 0, 3)
> > > > a[:,True,False].shape
> (2, 0, 3)
> 
> The docs say "A single boolean index array is practically identical
> to
> x[obj.nonzero()]". I have a hard time seeing this as an extension of
> that, since indexing by `np.nonzero(False)` or `np.nonzero(True)`
> *replaces* the given axis.
> 
>  >>> a[np.nonzero(True)].shape
> (1, 3)
> > > > a[np.nonzero(False)].shape
> (0, 3)
> 
> I think at best this behavior should be documented. I'm trying to
> understand the motivation for it, or if it's even intentional. And in
> particular, why do multiple boolean indices not insert multiple axes?
> It would actually be useful to be able to generically add length 0
> axes using an index, similar to how `newaxis` adds a length 1 axis.

Its fully intentional as it is the correct generalization from an N-D
boolean index to include a 0-D boolean index.
To be fair, there is a footnote in the "Detailed notes" saying that:
"the nonzero equivalence for Boolean arrays does not hold for zero
dimensional boolean arrays.", this is for technical reasons since
`nonzero` does not do useful things for 0-D input.

In any case, a boolean index always does the following:

1. It will *remove as many dimensions as the index has, because this
   is the number of dimensions effectively indexed by it*
2. It will add a single new dimension at the same place.  The length of
   this new dimension is the number of `True` elements.
3. If you have multiple advanced indexing you get annoying broadcasting
   of all of these. That is *always* confusing for boolean indices.
   0-D should not be too special there...

And this generalizes to 0-D just as well, even if it may be a bit
surprising at first.

I have written much of this more clearly once before in this NEP, which
may be a good read to _really_ understand it:

https://numpy.org/neps/nep-0021-advanced-indexing.html

In general, I wonder if going into much depth about how 0-D arrays are
not actually really handled very special is good.  Yes, its confusing
on its own, but it seems also a bit like overloading the user with
unnecessary knowledge?

Cheers,

Sebastian

> 
> Aaron Meurer
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200706/27556bd4/attachment.sig>