[Numpy-discussion] Indexing changes in 1.9

Mon Feb 3 13:19:57 EST 2014

On Mon, 2014-02-03 at 00:41 -0800, Dinesh Vadhia wrote:
> Does the numpy indexing refactorizing address the performance of fancy
> indexing highlighted in wes mckinney's blog some years back -
> http://wesmckinney.com/blog/?p=215 - where numpy.take() was shown to
> be preferable than fancy indexing?

Well, technically there is a pretty big difference in how those two
approach the problem and the indexing is much more general[1] and
(especially now) optimized for large subspaces[2]. So while the new code
is faster even for small subspaces, the linked test hits the performance
wise worst spot (of course make the thing 10000x2 will be a bit worse):

In [4]: arr = np.random.randn(10000, 5)
In [5]: indexer = np.arange(10000)
In [6]: random.shuffle(indexer)

In [7]: %timeit arr[indexer]
1000 loops, best of 3: 300 us per loop

In [8]: %timeit arr.take(indexer, axis=0)
10000 loops, best of 3: 95.4 us per loop

# With a larger subspace the result is (now) different though:
In [9]: arr = np.random.randn(10000, 100)

In [10]: %timeit arr[indexer]
100 loops, best of 3: 4.85 ms per loop

In [11]: %timeit arr.take(indexer, axis=0)
100 loops, best of 3: 5.02 ms per loop

So the performance in this use case improved (maybe a factor of 3, which
is neither shabby nor sensational), but the subspace design and no big
special cases for this operation leads to overhead. You could likely
squeeze out a bit, but squeezing out a lot is non-trivial [3].

However this should be one of the few cases where take should still
outperform advanced indexing.

- Sebastian

[1] Mostly multiple integer array indices, which take does not support
and arbitrary memory order.
[2] subspace here means the non-advanced indexing part. In the case of
arr[integer_array, :] the (or a) subspace is arr[0, :].
[3] Basically, we have the value and the indexing arrays to iterate,
since they are arbitrary (unlike take which assumes contiguous) and
broadcasting etc. I did not try to squeeze them into a single iterator
when there is a subspace (it is complicating, basically I want buffering
for the index arrays, but if you use buffering on the value array you
can't easily get the pointer into the subspace, etc.).
This means you have two iterators, which again means that the external
loop optimization is not quite trivial (since the two inner loops can
have different sizes). So *if* you were to just add the logic for the
different sized inner loops, then you could use the external loop
optimization of NpyIter and probably remove the overhead causing (most
of) this discrepancy. Though I would suggest timing first to be sure
this is the reason ^^.

>  
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion