[Numpy-discussion] What is up with raw boolean indices (like a[False])?

Thu Aug 20 17:55:40 EDT 2020

On Thu, 2020-08-20 at 16:50 -0500, Sebastian Berg wrote:
> On Thu, 2020-08-20 at 12:21 -0600, Aaron Meurer wrote:
> > You're right. I was confusing the broadcasting logic for boolean
> > arrays.
> > 
> > However, I did find this example
> > 
> > > > > np.arange(10).reshape((2, 5))[np.array([[0, 0, 0, 0, 0]],
> > > > > dtype=np.int64), False]
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> > IndexError: shape mismatch: indexing arrays could not be broadcast
> > together with shapes (1,5) (0,)
> > 
> > That certainly seems to imply there is some broadcasting being
> > done.
> 
> Yes, it broadcasts the array after converting it with `nonzero`, i.e.
> its much the same as:
> 
>    indices = [[0, 0, 0, 0, 0]], *np.nonzero(False)
>    indices = np.broadcast_arrays(*indices)
> 
> will give the same result (see also `np.ix_` which converts booleans
> as
> well for this reason, to give you outer indexing).
> I was half way through a mock-up/pseudo code, but thought you likely
> wasn't sure it was ending up clear. It sounds like things are
> probably
> falling into place for you (if they are not, let me know what might
> help you):

Sorry editing error up there, in short I hope those steps sense to you,
note that the broadcasting is basically part of a later "integer only"
indexing step, and the `nonzero` part is pre-processing.

> 
> 1. Convert all boolean indices into a series of integer indices using
>    `np.nonzero(index)`
> 
> 2. For True/False scalars, that doesn't work, because `np.nonzero()`.
>  
>  `nonzero` gave us an index array (which is good, we obviously want
>    
> one), but we need to index into `boolean_index.ndim == 0`
>    dimensions!
>    So that won't work, the approach using `nonzero` cannot generalize
>  
>  here, although boolean indices generalize perfectly.
> 
>    The solution to the dilemma is simple: If we have to index one
>    dimension, but should be indexing zero, then we simply add that
>    dimension to the original array (or at least pretend there was
>    an additional dimension).
> 
> 3. Do normal indexing with the result *including broadcasting*,
>    we forget it was converted.
> 
> The other way to solve it would be to always reshape the original
> array
> to combine all axes being indexed by a single boolean index into one
> axis and then index it using `np.flatnonzero`.  (But that would get a
> different result if you try to broadcast!)
> 
> 
> In any case, I am not sure I would bother with making sense of this,
> except for sports!
> Its pretty much nonsense and I think the time understanding it is
> probably better spend deprecating it.  The only reason I did not
> Deprecate itt before, is that I tried to do be minimal in the changes
> when I rewrote advanced indexing (and generalized boolean scalars
> correctly) long ago.  That was likely the right start/choice at the
> time, since there were much bigger fish to catch, but I do not think
> anything is holding us back now.
> 
> Cheers,
> 
> Sebastian
> 
> 
> > Aaron Meurer
> > 
> > On Wed, Aug 19, 2020 at 6:55 PM Sebastian Berg
> > <sebastian at sipsolutions.net> wrote:
> > > On Wed, 2020-08-19 at 18:07 -0600, Aaron Meurer wrote:
> > > > > > 3. If you have multiple advanced indexing you get annoying
> > > > > > broadcasting
> > > > > >    of all of these. That is *always* confusing for boolean
> > > > > > indices.
> > > > > >    0-D should not be too special there...
> > > > 
> > > > OK, now that I am learning more about advanced indexing, this
> > > > statement is confusing to me. It seems that scalar boolean
> > > > indices do
> > > > not broadcast. For example:
> > > 
> > > Well, broadcasting means you broadcast the *nonzero result*
> > > unless
> > > I am
> > > very confused... There is a reason I dismissed it. We could (and
> > > arguably should) just deprecate it.  And I have doubts anyone
> > > would
> > > even notice.
> > > 
> > > > > > > np.arange(2)[False, np.array([True, False])]
> > > > array([], dtype=int64)
> > > > > > > np.arange(2)[tuple(np.broadcast_arrays(False,
> > > > > > > np.array([True,
> > > > > > > False])))]
> > > > Traceback (most recent call last):
> > > >   File "<stdin>", line 1, in <module>
> > > > IndexError: too many indices for array: array is 1-dimensional,
> > > > but 2
> > > > were indexed
> > > > 
> > > > And indeed, the docs even say, as you noted, "the nonzero
> > > > equivalence
> > > > for Boolean arrays does not hold for zero dimensional boolean
> > > > arrays,"
> > > > which I guess also applies to the broadcasting.
> > > 
> > > I actually think that probably also holds. Nonzero just behave
> > > weird
> > > for 0D because arrays (because it returns a tuple).
> > > But since broadcasting the nonzero result is so weird, and since
> > > 0-
> > > D
> > > booleans require some additional logic and don't generalize 100%
> > > (code
> > > wise), I won't rule out there are differences.
> > > 
> > > > From what I can tell, the logic is that all integer and boolean
> > > > arrays
> > > 
> > > Did you try that? Because as I said above, IIRC broadcasting the
> > > boolean array without first calling `nonzero` isn't really whats
> > > going
> > > on. And I don't know how it could be whats going on, since adding
> > > dimensions to a boolean index would have much more implications?
> > > 
> > > - Sebastian
> > > 
> > > 
> > > > (and scalar ints) are broadcast together, *except* for boolean
> > > > scalars. Then the first boolean scalar is replaced with and(all
> > > > boolean scalars) and the rest are removed from the index. Then
> > > > that
> > > > index adds a length 1 axis if it is True and 0 if it is False.
> > > > 
> > > > So they don't broadcast, but rather "fake broadcast". I still
> > > > contend
> > > > that it would be much more useful, if True were a synonym for
> > > > newaxis
> > > > and False worked like newaxis but instead added a length 0
> > > > axis.
> > > > Alternately, True and False scalars should behave exactly like
> > > > all
> > > > other boolean arrays with no exceptions (i.e., work like
> > > > np.nonzero(),
> > > > broadcast, etc.). This would be less useful, but more
> > > > consistent.
> > > > 
> > > > Aaron Meurer
> > > > _______________________________________________
> > > > NumPy-Discussion mailing list
> > > > NumPy-Discussion at python.org
> > > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > > 
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200820/432c8c3b/attachment-0001.sig>