Fancy Indexing of Structured Arrays is Slow
As can be seen from the code below (or in the notebook linked beneath) fancy indexing of a structured array is twice as slow as indexing both fields independently - making it 4x slower? I found that fancy indexing was a bottleneck in my application so I was hoping to reduce the overhead by combining the arrays into a structured array and only doing one indexing operation. Unfortunately that doubled the time that it took! Is there any reason for this? If not, I'm happy to open an enhancement issue on GitHub - just let me know. Thanks, Dave In [32]: nrows, ncols = 365, 10000 In [33]: items = np.rec.fromarrays(randn(2,nrows, ncols), names= ['widgets','gadgets']) In [34]: row_idx = randint(0, nrows, ncols) ...: col_idx = np.arange(ncols) In [35]: %timeit filtered_items = items[row_idx, col_idx] 100 loops, best of 3: 3.45 ms per loop In [36]: %%timeit ...: widgets = items['widgets'][row_idx, col_idx] ...: gadgets = items['gadgets'][row_idx, col_idx] ...: 1000 loops, best of 3: 1.57 ms per loop http://nbviewer.ipython.org/urls/gist.githubusercontent.com/dhirschfeld/98b9 970fb68adf23dfea/raw/10c0f968ea1489f0a24da80d3af30de7106848ac/Slow%20Structu red%20Array%20Indexing.ipynb https://gist.github.com/dhirschfeld/98b9970fb68adf23dfea
On Do, 2014-05-15 at 12:31 +0000, Dave Hirschfeld wrote:
As can be seen from the code below (or in the notebook linked beneath) fancy indexing of a structured array is twice as slow as indexing both fields independently - making it 4x slower?
I found that fancy indexing was a bottleneck in my application so I was hoping to reduce the overhead by combining the arrays into a structured array and only doing one indexing operation. Unfortunately that doubled the time that it took!
Is there any reason for this? If not, I'm happy to open an enhancement issue on GitHub - just let me know.
The non-vanilla types tend to be somewhat more efficient with these things and the first indexing does not copy so it is rather fast. I did not check the code, but we use (also in the new one for this operation) the copyswap function on individual elements (only for non-trivial copies in 1.9 in later, making the difference even larger), and this is probably not specialized to the specific void type so it probably has to do call the copyswap for every field (and first get the fields). All that work would be done for every element. If you are interested in this, you could check the fancy indexing inner loop and see if replacing the copyswap with the specialized strided transfer functions (it is used further down in a different branch of the loop) actually makes things faster. I would expect so for some void types anyway, but not sure in general. - Sebastian
Thanks, Dave
In [32]: nrows, ncols = 365, 10000
In [33]: items = np.rec.fromarrays(randn(2,nrows, ncols), names= ['widgets','gadgets'])
In [34]: row_idx = randint(0, nrows, ncols) ...: col_idx = np.arange(ncols)
In [35]: %timeit filtered_items = items[row_idx, col_idx] 100 loops, best of 3: 3.45 ms per loop
In [36]: %%timeit ...: widgets = items['widgets'][row_idx, col_idx] ...: gadgets = items['gadgets'][row_idx, col_idx] ...: 1000 loops, best of 3: 1.57 ms per loop
http://nbviewer.ipython.org/urls/gist.githubusercontent.com/dhirschfeld/98b9 970fb68adf23dfea/raw/10c0f968ea1489f0a24da80d3af30de7106848ac/Slow%20Structu red%20Array%20Indexing.ipynb
https://gist.github.com/dhirschfeld/98b9970fb68adf23dfea
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, May 16, 2014 at 10:08 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Do, 2014-05-15 at 12:31 +0000, Dave Hirschfeld wrote:
As can be seen from the code below (or in the notebook linked beneath) fancy indexing of a structured array is twice as slow as indexing both fields independently - making it 4x slower?
I found that fancy indexing was a bottleneck in my application so I was hoping to reduce the overhead by combining the arrays into a structured array and only doing one indexing operation. Unfortunately that doubled the time that it took!
Is there any reason for this? If not, I'm happy to open an enhancement issue on GitHub - just let me know.
The non-vanilla types tend to be somewhat more efficient with these things and the first indexing does not copy so it is rather fast. I did not check the code, but we use (also in the new one for this operation) the copyswap function on individual elements (only for non-trivial copies in 1.9 in later, making the difference even larger), and this is probably not specialized to the specific void type so it probably has to do call the copyswap for every field (and first get the fields). All that work would be done for every element. If you are interested in this, you could check the fancy indexing inner loop and see if replacing the copyswap with the specialized strided transfer functions (it is used further down in a different branch of the loop) actually makes things faster. I would expect so for some void types anyway, but not sure in general.
if ~50% faster is fast enough a simple improvement would be to replace the use of PyArg_ParseTuple with manual tuple unpacking. The PyArg functions are incredibly slow and is not required in VOID_copyswap which just extracts 'Oi". This 50% increase still makes it slower than the simpler indexing variant as these have been greatly improved in 1.9 (thanks to Sebastian for this :) )
Julian Taylor <jtaylor.debian <at> googlemail.com> writes:
if ~50% faster is fast enough a simple improvement would be to replace the use of PyArg_ParseTuple with manual tuple unpacking. The PyArg functions are incredibly slow and is not required in VOID_copyswap which just extracts 'Oi".
This 50% increase still makes it slower than the simpler indexing variant as these have been greatly improved in 1.9 (thanks to Sebastian for this :) )
Yes, I'd heard about the improvements and am very excited to try them out since indexing is one of the bottlenecks in our algorithm. -Dave
On 16.05.2014 10:59, Dave Hirschfeld wrote:
Julian Taylor <jtaylor.debian <at> googlemail.com> writes:
if ~50% faster is fast enough a simple improvement would be to replace the use of PyArg_ParseTuple with manual tuple unpacking. The PyArg functions are incredibly slow and is not required in VOID_copyswap which just extracts 'Oi".
This 50% increase still makes it slower than the simpler indexing variant as these have been greatly improved in 1.9 (thanks to Sebastian for this :) )
Yes, I'd heard about the improvements and am very excited to try them out since indexing is one of the bottlenecks in our algorithm.
I made a PR with the simple change: https://github.com/numpy/numpy/pull/4721 improves it by the expected 50%, but its still 40% slower than the improved normal indexing.
Julian Taylor <jtaylor.debian <at> googlemail.com> writes:
On 16.05.2014 10:59, Dave Hirschfeld wrote:
Julian Taylor <jtaylor.debian <at> googlemail.com> writes:
Yes, I'd heard about the improvements and am very excited to try them out since indexing is one of the bottlenecks in our algorithm.
I made a PR with the simple change: https://github.com/numpy/numpy/pull/4721
improves it by the expected 50%, but its still 40% slower than the improved normal indexing.
Having some problems building numpy to test this out, but assuming it does what it says on the tin I'd be very keen to get this in the impending 1.9 release if possible. Thanks, Dave
On Do, 2014-05-15 at 12:31 +0000, Dave Hirschfeld wrote:
As can be seen from the code below (or in the notebook linked beneath) fancy indexing of a structured array is twice as slow as indexing both fields independently - making it 4x slower?
I found that fancy indexing was a bottleneck in my application so I was hoping to reduce the overhead by combining the arrays into a structured array and only doing one indexing operation. Unfortunately that doubled the time that it took!
Is there any reason for this? If not, I'm happy to open an enhancement issue on GitHub - just let me know.
The non-vanilla types tend to be somewhat more efficient with these things and the first indexing does not copy so it is rather fast. I did not check the code, but we use (also in the new one for this operation) the copyswap function on individual elements (only for non-trivial copies in 1.9 in later, making the difference even larger), and this is probably not specialized to the specific void type so it probably has to do call the copyswap for every field (and first get the fields). All that work would be done for every element. If you are interested in this, you could check the fancy indexing inner loop and see if replacing the copyswap with the specialized strided transfer functions (it is used further down in a different branch of the loop) actually makes things faster. I would expect so for some void types anyway, but not sure in general. - Sebastian
Thanks, Dave
In [32]: nrows, ncols = 365, 10000
In [33]: items = np.rec.fromarrays(randn(2,nrows, ncols), names= ['widgets','gadgets'])
In [34]: row_idx = randint(0, nrows, ncols) ...: col_idx = np.arange(ncols)
In [35]: %timeit filtered_items = items[row_idx, col_idx] 100 loops, best of 3: 3.45 ms per loop
In [36]: %%timeit ...: widgets = items['widgets'][row_idx, col_idx] ...: gadgets = items['gadgets'][row_idx, col_idx] ...: 1000 loops, best of 3: 1.57 ms per loop
http://nbviewer.ipython.org/urls/gist.githubusercontent.com/dhirschfeld/98b9 970fb68adf23dfea/raw/10c0f968ea1489f0a24da80d3af30de7106848ac/Slow%20Structu red%20Array%20Indexing.ipynb
https://gist.github.com/dhirschfeld/98b9970fb68adf23dfea
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Sebastian Berg <sebastian <at> sipsolutions.net> writes:
On Do, 2014-05-15 at 12:31 +0000, Dave Hirschfeld wrote:
As can be seen from the code below (or in the notebook linked beneath)
fancy
indexing of a structured array is twice as slow as indexing both fields independently - making it 4x slower?
The non-vanilla types tend to be somewhat more efficient with these things and the first indexing does not copy so it is rather fast. I did not check the code, but we use (also in the new one for this operation) the copyswap function on individual elements (only for non-trivial copies in 1.9 in later, making the difference even larger), and this is probably not specialized to the specific void type so it probably has to do call the copyswap for every field (and first get the fields). All that work would be done for every element. If you are interested in this, you could check the fancy indexing inner loop and see if replacing the copyswap with the specialized strided transfer functions (it is used further down in a different branch of the loop) actually makes things faster. I would expect so for some void types anyway, but not sure in general.
- Sebastian
Thanks for the explanation and pointers - it sounds like a good opportunity for getting stuck into the internals of numpy which I've been meaning to do. I'm not sure I've got the required skills but I'm sure it will be a good learning experience. Unfortunately it won't likely be in the immediate future that'll I'll have the time to do so. In the meantime I can live with indexing the fields independently. Thanks, Dave
participants (3)
-
Dave Hirschfeld
-
Julian Taylor
-
Sebastian Berg