
This issue recently came up on Stack Overflow: http://stackoverflow.com/questions/39145795/masking-a-series-with-a-boolean-.... The poster attempted to index an ndarray with a pandas boolean Series object (all False), but the result was as if he had indexed with an array of integer zeros.
Can someone explain this behavior? I can see two obvious possibilities:
1. ndarray checks if the input to __getitem__ is of exactly the right type, not using instanceof. 2. pandas actually uses a wider datatype than boolean internally, so indexing with the series is in fact indexing with an integer array.
In my attempt to reproduce the poster's results, I got the following warning:
FutureWarning: in the future, boolean array-likes will be handled as a boolean array index
This indicates that the issue is probably #1 and that a fix is already on the way. Please correct me if I am wrong. Also, where does the code for ndarray.__getitem__ live?
Thanks,
-Joe

On Do, 2016-08-25 at 10:36 -0400, Joseph Fox-Rabinovitz wrote:
This issue recently came up on Stack Overflow: http://stackoverflow.c om/questions/39145795/masking-a-series-with-a-boolean-array. The poster attempted to index an ndarray with a pandas boolean Series object (all False), but the result was as if he had indexed with an array of integer zeros.
Can someone explain this behavior? I can see two obvious possibilities: ndarray checks if the input to __getitem__ is of exactly the right type, not using instanceof. pandas actually uses a wider datatype than boolean internally, so indexing with the series is in fact indexing with an integer array.
You are overthinking it ;). The reason is quite simply that the logic used to be:
* Boolean array? -> think about boolean indexing. * Everything "array-like" (not caught earlier) -> convert to `intp` array and do integer indexing.
Now you might wonder why, but probably it is quite simply because boolean indexing was tagged on later.
- Sebastian
In my attempt to reproduce the poster's results, I got the following warning: FutureWarning: in the future, boolean array-likes will be handled as a boolean array index This indicates that the issue is probably #1 and that a fix is already on the way. Please correct me if I am wrong. Also, where does the code for ndarray.__getitem__ live? Thanks, -Joe
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion

On Thu, Aug 25, 2016 at 4:37 PM, Sebastian Berg sebastian@sipsolutions.net wrote:
On Do, 2016-08-25 at 10:36 -0400, Joseph Fox-Rabinovitz wrote:
This issue recently came up on Stack Overflow: http://stackoverflow.c om/questions/39145795/masking-a-series-with-a-boolean-array. The poster attempted to index an ndarray with a pandas boolean Series object (all False), but the result was as if he had indexed with an array of integer zeros.
Can someone explain this behavior? I can see two obvious possibilities: ndarray checks if the input to __getitem__ is of exactly the right type, not using instanceof. pandas actually uses a wider datatype than boolean internally, so indexing with the series is in fact indexing with an integer array.
You are overthinking it ;). The reason is quite simply that the logic used to be:
- Boolean array? -> think about boolean indexing.
- Everything "array-like" (not caught earlier) -> convert to `intp`
array and do integer indexing.
Now you might wonder why, but probably it is quite simply because boolean indexing was tagged on later.
- Sebastian
In my attempt to reproduce the poster's results, I got the following warning: FutureWarning: in the future, boolean array-likes will be handled as a boolean array index This indicates that the issue is probably #1 and that a fix is already on the way. Please correct me if I am wrong. Also, where does the code for ndarray.__getitem__ live? Thanks, -Joe
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
This makes perfect sense. I would like to help fix it if a fix is desired and has not been done already. Could you point me to where the "Boolean array?, etc." decision happens? I have had trouble navigating to `__getitem__` (which I assume is somewhere in np.core.multiarray C code.
-Joe

On Fr, 2016-08-26 at 09:57 -0400, Joseph Fox-Rabinovitz wrote:
On Thu, Aug 25, 2016 at 4:37 PM, Sebastian Berg <sebastian@sipsolutio ns.net> wrote:
On Do, 2016-08-25 at 10:36 -0400, Joseph Fox-Rabinovitz wrote:
This issue recently came up on Stack Overflow: http://stackoverfl
ow.c
om/questions/39145795/masking-a-series-with-a-boolean-array. The poster attempted to index an ndarray with a pandas boolean Series object (all False), but the result was as if he had indexed with
an
array of integer zeros.
Can someone explain this behavior? I can see two obvious possibilities: ndarray checks if the input to __getitem__ is of exactly the
right
type, not using instanceof. pandas actually uses a wider datatype than boolean internally, so indexing with the series is in fact indexing with an integer
array.
You are overthinking it ;). The reason is quite simply that the logic used to be:
* Boolean array? -> think about boolean indexing. * Everything "array-like" (not caught earlier) -> convert to `intp` array and do integer indexing.
Now you might wonder why, but probably it is quite simply because boolean indexing was tagged on later.
- Sebastian
In my attempt to reproduce the poster's results, I got the
following
warning: FutureWarning: in the future, boolean array-likes will be handled
as
a boolean array index This indicates that the issue is probably #1 and that a fix is already on the way. Please correct me if I am wrong. Also, where
does
the code for ndarray.__getitem__ live? Thanks, -Joe
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
This makes perfect sense. I would like to help fix it if a fix is desired and has not been done already. Could you point me to where the "Boolean array?, etc." decision happens? I have had trouble navigating to `__getitem__` (which I assume is somewhere in np.core.multiarray C code.
As the warning says, it already is fixed in a sense (we just have to move forward with the deprecation, which you can maybe actually do at this time). This is all in the mapping.c stuff, without checking, there is a function called something like "prepare index" which goes through all the different types of indexing objects. It should be pretty straight forward to find the warning.
The actual old behaviour where this behaviour originated in was a completely different code base though (you would have to check out some pre 1.9 version of numpy if you are interested in archeology.
- Sebastian
-Joe
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (2)
-
Joseph Fox-Rabinovitz
-
Sebastian Berg