[Numpy-discussion] Comparing NumPy/IDL Performance

Mon Sep 26 11:12:35 EDT 2011

On Mon, Sep 26, 2011 at 3:19 PM, Keith Hughitt <keith.hughitt at gmail.com> wrote:
> Hi all,
> Myself and several colleagues have recently started work on a Python library
> for solar physics, in order to provide an alternative to the current
> mainstay for solar physics, which is written in IDL.
> One of the first steps we have taken is to create a Python port of a popular
> benchmark for IDL (time_test3) which measures performance for a variety of
> (primarily matrix) operations. In our initial attempt, however, Python
> performs significantly poorer than IDL for several of the tests. I have
> attached a graph which shows the results for one machine: the x-axis is the
> test # being compared, and the y-axis is the time it took to complete the
> test, in milliseconds. While it is possible that this is simply due to
> limitations in Python/Numpy, I suspect that this is due at least in part to
> our lack in familiarity with NumPy and SciPy.
>
> So my question is, does anyone see any places where we are doing things very
> inefficiently in Python?

Looking at the plot there are five stand out tests, 1,2,3, 6 and 21.

Tests 1, 2 and 3 are testing Python itself (no numpy or scipy),
but are things you should be avoiding when using numpy
anyway (don't use loops, use vectorised calculations etc).

This is test 6,

    #Test 6 - Shift 512 by 512 byte and store
    nrep = 300 * scale_factor
    for i in range(nrep):
        c = np.roll(np.roll(b, 10, axis=0), 10, axis=1) #pylint: disable=W0612
    timer.log('Shift 512 by 512 byte and store, %d times.' % nrep)

The precise contents of b are determined by the previous tests
(is that deliberate - it makes testing it in isolation hard). I'm unsure
what you are trying to do and if it is the best way.

This is test 21, which is just calling a scipy function repeatedly.
Questions about this might be better directed to the scipy
mailing list - also check what version of SciPy etc you have.

    n = 2**(17 * scale_factor)
    a = np.arange(n, dtype=np.float32)
    ...
    #Test 21 - Smooth 512 by 512 byte array, 5x5 boxcar
    for i in range(nrep):
        b = scipy.ndimage.filters.median_filter(a, size=(5, 5))
    timer.log('Smooth 512 by 512 byte array, 5x5 boxcar, %d times' % nrep)

After than, tests 10, 15 and 18 stand out. Test 10 is another use
of roll, so whatever advice you get on test 6 may apply. Test 10:

    #Test 10 - Shift 512 x 512 array
    nrep = 60 * scale_factor
    for i in range(nrep):
        c = np.roll(np.roll(b, 10, axis=0), 10, axis=1)
    #for i in range(nrep): c = d.rotate(
    timer.log('Shift 512 x 512 array, %d times' % nrep)

Test 15 is a loop based version of 16, where Python wins. Test 18
is a loop based version of 19 (log), where the difference is small.

So in terms of numpy speed, your question just seems to be
about numpy.roll and how else one might achieve this result?

Peter