This issue recently came up on Stack Overflow: http://stackoverflow.com/questions/39145795/maskingaserieswithaboolean.... The poster attempted to index an ndarray with a pandas boolean Series object (all False), but the result was as if he had indexed with an array of integer zeros.
Can someone explain this behavior? I can see two obvious possibilities:
1. ndarray checks if the input to __getitem__ is of exactly the right type, not using instanceof. 2. pandas actually uses a wider datatype than boolean internally, so indexing with the series is in fact indexing with an integer array.
In my attempt to reproduce the poster's results, I got the following warning:
FutureWarning: in the future, boolean arraylikes will be handled as a boolean array index
This indicates that the issue is probably #1 and that a fix is already on the way. Please correct me if I am wrong. Also, where does the code for ndarray.__getitem__ live?
Thanks,
Joe
On Do, 20160825 at 10:36 0400, Joseph FoxRabinovitz wrote:
This issue recently came up on Stack Overflow: http://stackoverflow.c om/questions/39145795/maskingaserieswithabooleanarray. The poster attempted to index an ndarray with a pandas boolean Series object (all False), but the result was as if he had indexed with an array of integer zeros.
Can someone explain this behavior? I can see two obvious possibilities: ndarray checks if the input to __getitem__ is of exactly the right type, not using instanceof. pandas actually uses a wider datatype than boolean internally, so indexing with the series is in fact indexing with an integer array.
You are overthinking it ;). The reason is quite simply that the logic used to be:
* Boolean array? > think about boolean indexing. * Everything "arraylike" (not caught earlier) > convert to `intp` array and do integer indexing.
Now you might wonder why, but probably it is quite simply because boolean indexing was tagged on later.
 Sebastian
In my attempt to reproduce the poster's results, I got the following warning: FutureWarning: in the future, boolean arraylikes will be handled as a boolean array index This indicates that the issue is probably #1 and that a fix is already on the way. Please correct me if I am wrong. Also, where does the code for ndarray.__getitem__ live? Thanks, Joe
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
On Thu, Aug 25, 2016 at 4:37 PM, Sebastian Berg sebastian@sipsolutions.net wrote:
On Do, 20160825 at 10:36 0400, Joseph FoxRabinovitz wrote:
This issue recently came up on Stack Overflow: http://stackoverflow.c om/questions/39145795/maskingaserieswithabooleanarray. The poster attempted to index an ndarray with a pandas boolean Series object (all False), but the result was as if he had indexed with an array of integer zeros.
Can someone explain this behavior? I can see two obvious possibilities: ndarray checks if the input to __getitem__ is of exactly the right type, not using instanceof. pandas actually uses a wider datatype than boolean internally, so indexing with the series is in fact indexing with an integer array.
You are overthinking it ;). The reason is quite simply that the logic used to be:
 Boolean array? > think about boolean indexing.
 Everything "arraylike" (not caught earlier) > convert to `intp`
array and do integer indexing.
Now you might wonder why, but probably it is quite simply because boolean indexing was tagged on later.
 Sebastian
In my attempt to reproduce the poster's results, I got the following warning: FutureWarning: in the future, boolean arraylikes will be handled as a boolean array index This indicates that the issue is probably #1 and that a fix is already on the way. Please correct me if I am wrong. Also, where does the code for ndarray.__getitem__ live? Thanks, Joe
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
This makes perfect sense. I would like to help fix it if a fix is desired and has not been done already. Could you point me to where the "Boolean array?, etc." decision happens? I have had trouble navigating to `__getitem__` (which I assume is somewhere in np.core.multiarray C code.
Joe
On Fr, 20160826 at 09:57 0400, Joseph FoxRabinovitz wrote:
On Thu, Aug 25, 2016 at 4:37 PM, Sebastian Berg <sebastian@sipsolutio ns.net> wrote:
On Do, 20160825 at 10:36 0400, Joseph FoxRabinovitz wrote:
This issue recently came up on Stack Overflow: http://stackoverfl
ow.c
om/questions/39145795/maskingaserieswithabooleanarray. The poster attempted to index an ndarray with a pandas boolean Series object (all False), but the result was as if he had indexed with
an
array of integer zeros.
Can someone explain this behavior? I can see two obvious possibilities: ndarray checks if the input to __getitem__ is of exactly the
right
type, not using instanceof. pandas actually uses a wider datatype than boolean internally, so indexing with the series is in fact indexing with an integer
array.
You are overthinking it ;). The reason is quite simply that the logic used to be:
* Boolean array? > think about boolean indexing. * Everything "arraylike" (not caught earlier) > convert to `intp` array and do integer indexing.
Now you might wonder why, but probably it is quite simply because boolean indexing was tagged on later.
 Sebastian
In my attempt to reproduce the poster's results, I got the
following
warning: FutureWarning: in the future, boolean arraylikes will be handled
as
a boolean array index This indicates that the issue is probably #1 and that a fix is already on the way. Please correct me if I am wrong. Also, where
does
the code for ndarray.__getitem__ live? Thanks, Joe
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
This makes perfect sense. I would like to help fix it if a fix is desired and has not been done already. Could you point me to where the "Boolean array?, etc." decision happens? I have had trouble navigating to `__getitem__` (which I assume is somewhere in np.core.multiarray C code.
As the warning says, it already is fixed in a sense (we just have to move forward with the deprecation, which you can maybe actually do at this time). This is all in the mapping.c stuff, without checking, there is a function called something like "prepare index" which goes through all the different types of indexing objects. It should be pretty straight forward to find the warning.
The actual old behaviour where this behaviour originated in was a completely different code base though (you would have to check out some pre 1.9 version of numpy if you are interested in archeology.
 Sebastian
Joe
NumPyDiscussion mailing list NumPyDiscussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpydiscussion
participants (2)

Joseph FoxRabinovitz

Sebastian Berg