[Numpy-discussion] Index Array Performance

Mon Feb 13 19:18:01 EST 2012

On Mon, Feb 13, 2012 at 6:23 PM, Marcel Oliver
<m.oliver at jacobs-university.de> wrote:
> Hi,
>
> I have a short piece of code where the use of an index array "feels
> right", but incurs a severe performance penalty: It's about an order
> of magnitude slower than all other operations with arrays of that
> size.
>
> It comes up in a piece of code which is doing a large number of "on
> the fly" histograms via
>
>  hist[i,j] += 1
>
> where i is an array with the bin index to be incremented and j is
> simply enumerating the histograms.  I attach a full short sample code
> below which shows how it's being used in context, and corresponding
> timeit output from the critical code section.
>
> Questions:
>
> - Is this a matter of principle, or due to an inefficient
>  implementation?
> - Is there an equivalent way of doing it which is fast?
>
> Regards,
> Marcel
>
> =================================================================
>
> #! /usr/bin/env python
> # Plot the bifurcation diagram of the logistic map
>
> from pylab import *
>
> Nx = 800
> Ny = 600
> I = 50000
>
> rmin = 2.5
> rmax = 4.0
> ymin = 0.0
> ymax = 1.0
>
> rr = linspace (rmin, rmax, Nx)
> x = 0.5*ones(rr.shape)
> hist = zeros((Ny+1,Nx), dtype=int)
> j = arange(Nx)
>
> dy = ymax/Ny
>
> def f(x):
>    return rr*x*(1.0-x)
>
> for n in xrange(1000):
>    x = f(x)
>
> for n in xrange(I):
>    x = f(x)
>    i = array(x/dy, dtype=int)
>    hist[i,j] += 1
>
> figure()
>
> imshow(hist,
>       cmap='binary',
>       origin='lower',
>       interpolation='nearest',
>       extent=(rmin,rmax,ymin,ymax),
>       norm=matplotlib.colors.LogNorm())
>
> xlabel ('$r$')
> ylabel ('$x$')
>
> title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$')
>
> show()
>
> ====================================================================
>
> In [4]: timeit y=f(x)
> 10000 loops, best of 3: 19.4 us per loop
>
> In [5]: timeit i = array(x/dy, dtype=int)
> 10000 loops, best of 3: 22 us per loop
>
> In [6]: timeit img[i,j] += 1
> 10000 loops, best of 3: 119 us per loop
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

This suggests to me that fancy indexing could be quite a bit faster in
this case:

In [40]: timeit hist[i,j] += 110000 loops, best of 3: 58.2 us per loop
In [39]: timeit hist.put(np.ravel_multi_index((i, j), hist.shape), 1)
10000 loops, best of 3: 20.6 us per loop

I wrote a simple Cython method

def fancy_inc(ndarray[int64_t, ndim=2] values,
              ndarray[int64_t] iarr, ndarray[int64_t] jarr, int64_t inc):
    cdef:
        Py_ssize_t i, n = len(iarr)

    for i in range(n):
        values[iarr[i], jarr[i]] += inc

that does even faster

In [8]: timeit sbx.fancy_inc(hist, i, j, 1)
100000 loops, best of 3: 4.85 us per loop

About 10% faster if bounds checking and wraparound are disabled.

Kind of a bummer-- perhaps this should go high on the NumPy 2.0 TODO list?

- Wes