[Numpy-discussion] Index Array Performance

Nathaniel Smith njs at pobox.com
Mon Feb 13 19:30:29 EST 2012


How would you fix it? I shouldn't speculate without profiling, but I'll be
naughty. Presumably the problem is that python turns that into something
like

hist[i,j] = hist[i,j] + 1

which means there's no way for numpy to avoid creating a temporary array.
So maybe this could be fixed by adding a fused __inplace_add__ protocol to
the language (and similarly for all the other inplace operators), but that
seems really unlikely. Fundamentally this is just the sort of optimization
opportunity you miss when you don't have a compiler with a global view;
Fortran or c++ expression templates will win every time. Maybe pypy will
fix it someday.

Perhaps it would help to make np.add(hist, 1, out=hist, where=(i,j)) work?

- N
On Feb 14, 2012 12:18 AM, "Wes McKinney" <wesmckinn at gmail.com> wrote:

> On Mon, Feb 13, 2012 at 6:23 PM, Marcel Oliver
> <m.oliver at jacobs-university.de> wrote:
> > Hi,
> >
> > I have a short piece of code where the use of an index array "feels
> > right", but incurs a severe performance penalty: It's about an order
> > of magnitude slower than all other operations with arrays of that
> > size.
> >
> > It comes up in a piece of code which is doing a large number of "on
> > the fly" histograms via
> >
> >  hist[i,j] += 1
> >
> > where i is an array with the bin index to be incremented and j is
> > simply enumerating the histograms.  I attach a full short sample code
> > below which shows how it's being used in context, and corresponding
> > timeit output from the critical code section.
> >
> > Questions:
> >
> > - Is this a matter of principle, or due to an inefficient
> >  implementation?
> > - Is there an equivalent way of doing it which is fast?
> >
> > Regards,
> > Marcel
> >
> > =================================================================
> >
> > #! /usr/bin/env python
> > # Plot the bifurcation diagram of the logistic map
> >
> > from pylab import *
> >
> > Nx = 800
> > Ny = 600
> > I = 50000
> >
> > rmin = 2.5
> > rmax = 4.0
> > ymin = 0.0
> > ymax = 1.0
> >
> > rr = linspace (rmin, rmax, Nx)
> > x = 0.5*ones(rr.shape)
> > hist = zeros((Ny+1,Nx), dtype=int)
> > j = arange(Nx)
> >
> > dy = ymax/Ny
> >
> > def f(x):
> >    return rr*x*(1.0-x)
> >
> > for n in xrange(1000):
> >    x = f(x)
> >
> > for n in xrange(I):
> >    x = f(x)
> >    i = array(x/dy, dtype=int)
> >    hist[i,j] += 1
> >
> > figure()
> >
> > imshow(hist,
> >       cmap='binary',
> >       origin='lower',
> >       interpolation='nearest',
> >       extent=(rmin,rmax,ymin,ymax),
> >       norm=matplotlib.colors.LogNorm())
> >
> > xlabel ('$r$')
> > ylabel ('$x$')
> >
> > title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$')
> >
> > show()
> >
> > ====================================================================
> >
> > In [4]: timeit y=f(x)
> > 10000 loops, best of 3: 19.4 us per loop
> >
> > In [5]: timeit i = array(x/dy, dtype=int)
> > 10000 loops, best of 3: 22 us per loop
> >
> > In [6]: timeit img[i,j] += 1
> > 10000 loops, best of 3: 119 us per loop
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> This suggests to me that fancy indexing could be quite a bit faster in
> this case:
>
> In [40]: timeit hist[i,j] += 110000 loops, best of 3: 58.2 us per loop
> In [39]: timeit hist.put(np.ravel_multi_index((i, j), hist.shape), 1)
> 10000 loops, best of 3: 20.6 us per loop
>
> I wrote a simple Cython method
>
> def fancy_inc(ndarray[int64_t, ndim=2] values,
>              ndarray[int64_t] iarr, ndarray[int64_t] jarr, int64_t inc):
>    cdef:
>        Py_ssize_t i, n = len(iarr)
>
>    for i in range(n):
>        values[iarr[i], jarr[i]] += inc
>
> that does even faster
>
> In [8]: timeit sbx.fancy_inc(hist, i, j, 1)
> 100000 loops, best of 3: 4.85 us per loop
>
> About 10% faster if bounds checking and wraparound are disabled.
>
> Kind of a bummer-- perhaps this should go high on the NumPy 2.0 TODO list?
>
> - Wes
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120214/95522bc7/attachment.html>


More information about the NumPy-Discussion mailing list