[Numpy-discussion] boolean indexing of structured arrays

Wed Jun 6 09:08:43 EDT 2012

Not sure if this is a bug or not.  I am using a fairly recent master branch.

>>> # Setting up...
>>> import numpy as np
>>> a = np.zeros((10, 1), dtype=[('foo', 'f4'), ('bar', 'f4'), ('spam',
'f4')])
>>> a['foo'] = np.random.random((10, 1))
>>> a['bar'] = np.random.random((10, 1))
>>> a['spam'] = np.random.random((10, 1))
>>> a
array([[(0.8748096823692322, 0.08278043568134308, 0.2463584989309311)],
       [(0.27129432559013367, 0.9645473957061768, 0.41787904500961304)],
       [(0.4902191460132599, 0.6772263646125793, 0.07460898905992508)],
       [(0.13542482256889343, 0.8646988868713379, 0.98673015832901)],
       [(0.6527929902076721, 0.7392181754112244, 0.5919206738471985)],
       [(0.11248272657394409, 0.5818713903427124, 0.9287213087081909)],
       [(0.47561103105545044, 0.48848700523376465, 0.7108170390129089)],
       [(0.47087424993515015, 0.6080209016799927, 0.6583810448646545)],
       [(0.08447299897670746, 0.39479559659957886, 0.13520188629627228)],
       [(0.7074970006942749, 0.8426893353462219, 0.19329732656478882)]],
      dtype=[('foo', '<f4'), ('bar', '<f4'), ('spam', '<f4')])
>>> b = (a['bar'] > 0.4)
>>> b
array([[False],
       [ True],
       [ True],
       [ True],
       [ True],
       [ True],
       [ True],
       [ True],
       [False],
       [ True]], dtype=bool)
>>> # ---- Boolean indexing of structured array with a (10,1) boolean array
----
>>> a[b]['foo']
array([ 0.27129433,  0.49021915,  0.13542482,  0.65279299,  0.11248273,
        0.47561103,  0.47087425,  0.707497  ], dtype=float32)
>>> # ---- Boolean indexing of structured array with a (10,) boolean array
----
>>> a[b[:,0]]['foo']
array([[(0.27129432559013367, 0.9645473957061768, 0.41787904500961304)],
       [(0.4902191460132599, 0.6772263646125793, 0.07460898905992508)],
       [(0.13542482256889343, 0.8646988868713379, 0.98673015832901)],
       [(0.6527929902076721, 0.7392181754112244, 0.5919206738471985)],
       [(0.11248272657394409, 0.5818713903427124, 0.9287213087081909)],
       [(0.47561103105545044, 0.48848700523376465, 0.7108170390129089)],
       [(0.47087424993515015, 0.6080209016799927, 0.6583810448646545)],
       [(0.7074970006942749, 0.8426893353462219, 0.19329732656478882)]],
      dtype=[('foo', '<f4'), ('bar', '<f4'), ('spam', '<f4')])

So, if I index with a (10, 1) boolean array, I get back a (N,) shape result
(regardless of whether I am accessing a field or not). But, if I index with
a (10, ) boolean array, I get back a (N, 1) result.  Note that other forms
of indexing such as slicing and fancy indexing returns (N, 1) shaped
results.  Now, admittedly, this is actually consistent with boolean
indexing of regular numpy arrays.  I just wanted to make sure that this is
intentional.  This has caused some confusion for me recently when I
(perhaps falsely) expected that the result from a boolean index of a
structured array would result in a similarly structured array.  The
use-case was to modify an existing function by removing the unwanted "rows"
with a simply boolean index statement instead of a slice.

Cheers!
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120606/61421ed4/attachment.html>