[Numpy-discussion] What is up with raw boolean indices (like a[False])?
Sebastian Berg
sebastian at sipsolutions.net
Mon Jul 6 14:51:09 EDT 2020
On Mon, 2020-07-06 at 12:39 -0600, Aaron Meurer wrote:
> I've been trying to figure out this behavior. It doesn't seem to be
> documented at
> https://numpy.org/doc/stable/reference/arrays.indexing.html
>
> > > > a = np.empty((2, 3))
> > > > a.shape
> (2, 5)
> > > > a[True].shape
> (1, 2, 5)
> > > > a[False].shape
> (0, 2, 5)
>
> It seems like indexing with a raw boolean (True or False) adds an
> axis
> with a dimension 1 or 0, resp.
>
> Except it only works once:
>
> > > > a[:,False]
> array([], shape=(2, 0, 3), dtype=float64)
> > > > a[:,False, False]
> array([], shape=(2, 0, 3), dtype=float64)
> > > > a[:,False,True].shape
> (2, 0, 3)
> > > > a[:,True,False].shape
> (2, 0, 3)
>
> The docs say "A single boolean index array is practically identical
> to
> x[obj.nonzero()]". I have a hard time seeing this as an extension of
> that, since indexing by `np.nonzero(False)` or `np.nonzero(True)`
> *replaces* the given axis.
>
> >>> a[np.nonzero(True)].shape
> (1, 3)
> > > > a[np.nonzero(False)].shape
> (0, 3)
>
> I think at best this behavior should be documented. I'm trying to
> understand the motivation for it, or if it's even intentional. And in
> particular, why do multiple boolean indices not insert multiple axes?
> It would actually be useful to be able to generically add length 0
> axes using an index, similar to how `newaxis` adds a length 1 axis.
Its fully intentional as it is the correct generalization from an N-D
boolean index to include a 0-D boolean index.
To be fair, there is a footnote in the "Detailed notes" saying that:
"the nonzero equivalence for Boolean arrays does not hold for zero
dimensional boolean arrays.", this is for technical reasons since
`nonzero` does not do useful things for 0-D input.
In any case, a boolean index always does the following:
1. It will *remove as many dimensions as the index has, because this
is the number of dimensions effectively indexed by it*
2. It will add a single new dimension at the same place. The length of
this new dimension is the number of `True` elements.
3. If you have multiple advanced indexing you get annoying broadcasting
of all of these. That is *always* confusing for boolean indices.
0-D should not be too special there...
And this generalizes to 0-D just as well, even if it may be a bit
surprising at first.
I have written much of this more clearly once before in this NEP, which
may be a good read to _really_ understand it:
https://numpy.org/neps/nep-0021-advanced-indexing.html
In general, I wonder if going into much depth about how 0-D arrays are
not actually really handled very special is good. Yes, its confusing
on its own, but it seems also a bit like overloading the user with
unnecessary knowledge?
Cheers,
Sebastian
>
> Aaron Meurer
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200706/27556bd4/attachment.sig>
More information about the NumPy-Discussion
mailing list