[Numpy-discussion] Indexing issue with ndarrays

Sebastian Berg sebastian at sipsolutions.net
Fri Aug 26 10:08:14 EDT 2016


On Fr, 2016-08-26 at 09:57 -0400, Joseph Fox-Rabinovitz wrote:
> 
> 
> On Thu, Aug 25, 2016 at 4:37 PM, Sebastian Berg <sebastian at sipsolutio
> ns.net> wrote:
> > On Do, 2016-08-25 at 10:36 -0400, Joseph Fox-Rabinovitz wrote:
> > > This issue recently came up on Stack Overflow: http://stackoverfl
> > ow.c
> > > om/questions/39145795/masking-a-series-with-a-boolean-array. The
> > > poster attempted to index an ndarray with a pandas boolean Series
> > > object (all False), but the result was as if he had indexed with
> > an
> > > array of integer zeros.
> > >
> > > Can someone explain this behavior? I can see two obvious
> > > possibilities:
> > > ndarray checks if the input to __getitem__ is of exactly the
> > right
> > > type, not using instanceof.
> > > pandas actually uses a wider datatype than boolean internally, so
> > > indexing with the series is in fact indexing with an integer
> > array.
> > 
> > You are overthinking it ;). The reason is quite simply that the
> > logic
> > used to be:
> > 
> >  * Boolean array? -> think about boolean indexing.
> >  * Everything "array-like" (not caught earlier) -> convert to
> > `intp`
> > array and do integer indexing.
> > 
> > Now you might wonder why, but probably it is quite simply because
> > boolean indexing was tagged on later.
> > 
> > - Sebastian
> > 
> > 
> > > In my attempt to reproduce the poster's results, I got the
> > following
> > > warning:
> > > FutureWarning: in the future, boolean array-likes will be handled
> > as
> > > a boolean array index
> > > This indicates that the issue is probably #1 and that a fix is
> > > already on the way. Please correct me if I am wrong. Also, where
> > does
> > > the code for ndarray.__getitem__ live?
> > > Thanks,
> > >     -Joe
> > >
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> This makes perfect sense. I would like to help fix it if a fix is
> desired and has not been done already. Could you point me to where
> the "Boolean array?, etc." decision happens? I have had trouble
> navigating to `__getitem__` (which I assume is somewhere in
> np.core.multiarray C code.
> 

As the warning says, it already is fixed in a sense (we just have to
move forward with the deprecation, which you can maybe actually do at
this time). This is all in the mapping.c stuff, without checking, there
is a function called something like "prepare index" which goes through
all the different types of indexing objects. It should be pretty
straight forward to find the warning.

The actual old behaviour where this behaviour originated in was a
completely different code base though (you would have to check out some
pre 1.9 version of numpy if you are interested in archeology.

- Sebastian




>     -Joe
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160826/af9ca201/attachment.sig>


More information about the NumPy-Discussion mailing list