[Numpy-discussion] Fancy Indexing of Structured Arrays is Slow

Julian Taylor jtaylor.debian at googlemail.com
Fri May 16 04:42:11 EDT 2014


On Fri, May 16, 2014 at 10:08 AM, Sebastian Berg
<sebastian at sipsolutions.net> wrote:
> On Do, 2014-05-15 at 12:31 +0000, Dave Hirschfeld wrote:
>> As can be seen from the code below (or in the notebook linked beneath) fancy
>> indexing of a structured array is twice as slow as indexing both fields
>> independently - making it 4x slower?
>>
>> I found that fancy indexing was a bottleneck in my application so I was
>> hoping to reduce the overhead by combining the arrays into a structured
>> array and only doing one indexing operation. Unfortunately that doubled the
>> time that it took!
>>
>> Is there any reason for this? If not, I'm happy to open an enhancement issue
>> on GitHub - just let me know.
>>
>
> The non-vanilla types tend to be somewhat more efficient with these
> things and the first indexing does not copy so it is rather fast. I did
> not check the code, but we use (also in the new one for this operation)
> the copyswap function on individual elements (only for non-trivial
> copies in 1.9 in later, making the difference even larger), and this is
> probably not specialized to the specific void type so it probably has to
> do call the copyswap for every field (and first get the fields). All
> that work would be done for every element.
> If you are interested in this, you could check the fancy indexing inner
> loop and see if replacing the copyswap with the specialized strided
> transfer functions (it is used further down in a different branch of the
> loop) actually makes things faster. I would expect so for some void
> types anyway, but not sure in general.
>

if ~50% faster is fast enough a simple improvement would be to replace
the use of PyArg_ParseTuple with manual tuple unpacking.
The PyArg functions are incredibly slow and is not required in
VOID_copyswap which just extracts 'Oi".

This 50% increase still makes it slower than the simpler indexing
variant as these have been greatly improved in 1.9 (thanks to
Sebastian for this :) )



More information about the NumPy-Discussion mailing list