[Numpy-discussion] Indexing issue with ndarrays

Joseph Fox-Rabinovitz jfoxrabinovitz at gmail.com
Fri Aug 26 09:57:22 EDT 2016


On Thu, Aug 25, 2016 at 4:37 PM, Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Do, 2016-08-25 at 10:36 -0400, Joseph Fox-Rabinovitz wrote:
> > This issue recently came up on Stack Overflow: http://stackoverflow.c
> > om/questions/39145795/masking-a-series-with-a-boolean-array. The
> > poster attempted to index an ndarray with a pandas boolean Series
> > object (all False), but the result was as if he had indexed with an
> > array of integer zeros.
> >
> > Can someone explain this behavior? I can see two obvious
> > possibilities:
> > ndarray checks if the input to __getitem__ is of exactly the right
> > type, not using instanceof.
> > pandas actually uses a wider datatype than boolean internally, so
> > indexing with the series is in fact indexing with an integer array.
>
> You are overthinking it ;). The reason is quite simply that the logic
> used to be:
>
>  * Boolean array? -> think about boolean indexing.
>  * Everything "array-like" (not caught earlier) -> convert to `intp`
> array and do integer indexing.
>
> Now you might wonder why, but probably it is quite simply because
> boolean indexing was tagged on later.
>
> - Sebastian
>
>
> > In my attempt to reproduce the poster's results, I got the following
> > warning:
> > FutureWarning: in the future, boolean array-likes will be handled as
> > a boolean array index
> > This indicates that the issue is probably #1 and that a fix is
> > already on the way. Please correct me if I am wrong. Also, where does
> > the code for ndarray.__getitem__ live?
> > Thanks,
> >     -Joe
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
This makes perfect sense. I would like to help fix it if a fix is desired
and has not been done already. Could you point me to where the "Boolean
array?, etc." decision happens? I have had trouble navigating to
`__getitem__` (which I assume is somewhere in np.core.multiarray C code.

    -Joe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160826/e3ab1d15/attachment.html>


More information about the NumPy-Discussion mailing list