[SciPy-User] [ANN] Bottleneck 0.2

Keith Goodman kwgoodman at gmail.com
Mon Dec 27 15:04:04 EST 2010


Bottleneck is a collection of fast NumPy array functions written in Cython.

The second release of Bottleneck is faster, contains more functions,
and supports more dtypes.

Faster:
- All functions faster (less overhead) when output is not a scalar
- Faster nanmean() for 2d, 3d arrays containing NaNs when axis is not None

New functions:
- nanargmin()
- nanargmax()
- nanmedian, 100X faster than SciPy's nanmedian for (100,100) input, axis=0

Enhancements:
- Added support for float32
- Fallback to slower, non-Cython functions for unaccelerated ndim/dtype
- Scipy is no longer a dependency
- Added support for older versions of NumPy (1.4.1)
- All functions are now templated for dtype and axis
- Added a sandbox for prototyping of new Bottleneck functions
- Rewrote benchmarking code

Breaks from 0.1.0:
- To run benchmark use bn.bench() instead of bn.benchit()

download
    http://pypi.python.org/pypi/Bottleneck
docs
    http://berkeleyanalytics.com/bottleneck
code
    http://github.com/kwgoodman/bottleneck
mailing list
    http://groups.google.com/group/bottle-neck
mailing list 2
    http://mail.scipy.org/mailman/listinfo/scipy-user

Bottleneck comes with a benchmark suite that compares the performance
of the bottleneck functions that have a NumPy/SciPy equivalent. To run
the benchmark:

    >>> bn.bench(mode='fast')
    Bottleneck performance benchmark
        Bottleneck  0.2.0
        Numpy (np)  1.5.1
        Scipy (sp)  0.8.0
        Speed is NumPy or SciPy time divided by Bottleneck time
        NaN means one-third NaNs; axis=0 and float64 are used
    median vs np.median
        3.59  (10,10)
        2.43  (1001,1001)
        2.28  (1000,1000)
        2.16  (100,100)
    nanmedian vs local copy of sp.stats.nanmedian
      102.72  (10,10)      NaN
       94.34  (10,10)
       67.89  (100,100)    NaN
       28.52  (100,100)
        6.37  (1000,1000)  NaN
        4.41  (1000,1000)
    nanmax vs np.nanmax
        9.99  (100,100)    NaN
        6.12  (10,10)      NaN
        5.99  (10,10)
        5.88  (100,100)
        1.79  (1000,1000)  NaN
        1.76  (1000,1000)
    nanmean vs local copy of sp.stats.nanmean
       25.95  (100,100)    NaN
       12.85  (100,100)
       12.26  (10,10)      NaN
       11.89  (10,10)
        5.15  (1000,1000)  NaN
        3.17  (1000,1000)
    nanstd vs local copy of sp.stats.nanstd
       16.96  (100,100)    NaN
       15.75  (10,10)      NaN
       15.49  (10,10)
        9.51  (100,100)
        3.85  (1000,1000)  NaN
        2.82  (1000,1000)
    nanargmax vs np.nanargmax
        8.60  (100,100)    NaN
        5.65  (10,10)      NaN
        5.62  (100,100)
        5.44  (10,10)
        2.84  (1000,1000)  NaN
        2.58  (1000,1000)
    move_nanmean vs sp.ndimage.convolve1d based function
        window = 5
       19.52  (10,10)      NaN
       18.55  (10,10)
       10.56  (100,100)    NaN
        6.67  (100,100)
        5.19  (1000,1000)  NaN
        4.42  (1000,1000)

Under the hood Bottleneck uses a separate Cython function for each
combination of ndim, dtype, and axis. A lot of the overhead in
bn.nanmax(), for example, is in checking that the axis is within
range, converting non-array data to an array, and selecting the
function to use to calculate the maximum. You can get rid of the
overhead by calling the underlying Cython function directly.

Benchmarks for the low-level Cython version of each function:

    >>> bn.bench(mode='faster')
    Bottleneck performance benchmark
        Bottleneck  0.2.0
        Numpy (np)  1.5.1
        Scipy (sp)  0.8.0
        Speed is NumPy or SciPy time divided by Bottleneck time
        NaN means one-third NaNs; axis=0 and float64 are used
    median_selector vs np.median
       15.29  (10,10)
       14.19  (100,100)
        8.04  (1001,1001)
        7.32  (1000,1000)
    nanmedian_selector vs local copy of sp.stats.nanmedian
      352.08  (10,10)      NaN
      340.27  (10,10)
      185.56  (100,100)    NaN
      138.81  (100,100)
        8.21  (1000,1000)
        8.09  (1000,1000)  NaN
    nanmax_selector vs np.nanmax
       21.54  (10,10)      NaN
       19.98  (10,10)
       12.65  (100,100)    NaN
        6.82  (100,100)
        1.79  (1000,1000)  NaN
        1.76  (1000,1000)
    nanmean_selector vs local copy of sp.stats.nanmean
       41.08  (10,10)      NaN
       39.05  (10,10)
       31.74  (100,100)    NaN
       15.24  (100,100)
        5.13  (1000,1000)  NaN
        3.16  (1000,1000)
    nanstd_selector vs local copy of sp.stats.nanstd
       44.55  (10,10)      NaN
       43.49  (10,10)
       18.66  (100,100)    NaN
       10.29  (100,100)
        3.83  (1000,1000)  NaN
        2.82  (1000,1000)
    nanargmax_selector vs np.nanargmax
       17.91  (10,10)      NaN
       17.00  (10,10)
       10.56  (100,100)    NaN
        6.50  (100,100)
        2.85  (1000,1000)  NaN
        2.59  (1000,1000)
    move_nanmean_selector vs sp.ndimage.convolve1d based function
        window = 5
       55.96  (10,10)      NaN
       50.82  (10,10)
       11.77  (100,100)    NaN
        6.93  (100,100)
        5.56  (1000,1000)  NaN
        4.51  (1000,1000)



More information about the SciPy-User mailing list