[Numpy-discussion] Fancy Indexing of Structured Arrays is Slow

Sebastian Berg sebastian at sipsolutions.net
Fri May 16 04:09:50 EDT 2014

On Do, 2014-05-15 at 12:31 +0000, Dave Hirschfeld wrote:
> As can be seen from the code below (or in the notebook linked beneath) fancy 
> indexing of a structured array is twice as slow as indexing both fields 
> independently - making it 4x slower?
> I found that fancy indexing was a bottleneck in my application so I was 
> hoping to reduce the overhead by combining the arrays into a structured 
> array and only doing one indexing operation. Unfortunately that doubled the 
> time that it took!
> Is there any reason for this? If not, I'm happy to open an enhancement issue 
> on GitHub - just let me know.

The non-vanilla types tend to be somewhat more efficient with these
things and the first indexing does not copy so it is rather fast. I did
not check the code, but we use (also in the new one for this operation)
the copyswap function on individual elements (only for non-trivial
copies in 1.9 in later, making the difference even larger), and this is
probably not specialized to the specific void type so it probably has to
do call the copyswap for every field (and first get the fields). All
that work would be done for every element.
If you are interested in this, you could check the fancy indexing inner
loop and see if replacing the copyswap with the specialized strided
transfer functions (it is used further down in a different branch of the
loop) actually makes things faster. I would expect so for some void
types anyway, but not sure in general.

- Sebastian

> Thanks,
> Dave
> In [32]: nrows, ncols = 365, 10000
> In [33]: items = np.rec.fromarrays(randn(2,nrows, ncols), names=
> ['widgets','gadgets'])
> In [34]: row_idx = randint(0, nrows, ncols)
>     ...: col_idx = np.arange(ncols)
> In [35]: %timeit filtered_items = items[row_idx, col_idx]
> 100 loops, best of 3: 3.45 ms per loop
> In [36]: %%timeit 
>     ...: widgets = items['widgets'][row_idx, col_idx]
>     ...: gadgets = items['gadgets'][row_idx, col_idx]
>     ...: 
> 1000 loops, best of 3: 1.57 ms per loop
> http://nbviewer.ipython.org/urls/gist.githubusercontent.com/dhirschfeld/98b9
> 970fb68adf23dfea/raw/10c0f968ea1489f0a24da80d3af30de7106848ac/Slow%20Structu
> red%20Array%20Indexing.ipynb
> https://gist.github.com/dhirschfeld/98b9970fb68adf23dfea
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140516/15ad0408/attachment.sig>

More information about the NumPy-Discussion mailing list