
On Fri, May 16, 2014 at 10:08 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Do, 2014-05-15 at 12:31 +0000, Dave Hirschfeld wrote:
As can be seen from the code below (or in the notebook linked beneath) fancy indexing of a structured array is twice as slow as indexing both fields independently - making it 4x slower?
I found that fancy indexing was a bottleneck in my application so I was hoping to reduce the overhead by combining the arrays into a structured array and only doing one indexing operation. Unfortunately that doubled the time that it took!
Is there any reason for this? If not, I'm happy to open an enhancement issue on GitHub - just let me know.
The non-vanilla types tend to be somewhat more efficient with these things and the first indexing does not copy so it is rather fast. I did not check the code, but we use (also in the new one for this operation) the copyswap function on individual elements (only for non-trivial copies in 1.9 in later, making the difference even larger), and this is probably not specialized to the specific void type so it probably has to do call the copyswap for every field (and first get the fields). All that work would be done for every element. If you are interested in this, you could check the fancy indexing inner loop and see if replacing the copyswap with the specialized strided transfer functions (it is used further down in a different branch of the loop) actually makes things faster. I would expect so for some void types anyway, but not sure in general.
if ~50% faster is fast enough a simple improvement would be to replace the use of PyArg_ParseTuple with manual tuple unpacking. The PyArg functions are incredibly slow and is not required in VOID_copyswap which just extracts 'Oi". This 50% increase still makes it slower than the simpler indexing variant as these have been greatly improved in 1.9 (thanks to Sebastian for this :) )