[Numpy-discussion] numexpr with the new iterator

John Salvatier jsalvati at u.washington.edu
Sun Jan 9 18:33:41 EST 2011


Is evaluate_iter basically numpexpr but using your numpy branch or are there
other changes?

On Sun, Jan 9, 2011 at 2:45 PM, Mark Wiebe <mwwiebe at gmail.com> wrote:

> As a benchmark of C-based iterator usage and to make it work properly in a
> multi-threaded context, I've updated numexpr to use the new iterator.  In
> addition to some performance improvements, this also made it easy to add
> optional out= and order= parameters to the evaluate function.  The numexpr
> repository with this update is available here:
>
> https://github.com/m-paradox/numexpr
>
> To use it, you need the new_iterator branch of NumPy from here:
>
> https://github.com/m-paradox/numpy
>
> In all cases tested, the iterator version of numexpr's evaluate function
> matches or beats the standard version.  The timing results are below, with
> some explanatory comments placed inline:
>
> -Mark
>
> In [1]: import numexpr as ne
>
> # numexpr front page example
>
> In [2]: a = np.arange(1e6)
> In [3]: b = np.arange(1e6)
>
> In [4]: timeit a**2 + b**2 + 2*a*b
> 1 loops, best of 3: 121 ms per loop
>
> In [5]: ne.set_num_threads(1)
>
> # iterator version performance matches standard version
>
> In [6]: timeit ne.evaluate("a**2 + b**2 + 2*a*b")
> 10 loops, best of 3: 24.8 ms per loop
> In [7]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b")
> 10 loops, best of 3: 24.3 ms per loop
>
> In [8]: ne.set_num_threads(2)
>
> # iterator version performance matches standard version
>
> In [9]: timeit ne.evaluate("a**2 + b**2 + 2*a*b")
> 10 loops, best of 3: 21 ms per loop
> In [10]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b")
> 10 loops, best of 3: 20.5 ms per loop
>
> # numexpr front page example with a 10x bigger array
>
> In [11]: a = np.arange(1e7)
> In [12]: b = np.arange(1e7)
>
> In [13]: ne.set_num_threads(2)
>
> # the iterator version performance improvement is due to
> # a small task scheduler tweak
>
> In [14]: timeit ne.evaluate("a**2 + b**2 + 2*a*b")
> 1 loops, best of 3: 282 ms per loop
> In [15]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b")
> 1 loops, best of 3: 255 ms per loop
>
> # numexpr front page example with a Fortran contiguous array
>
> In [16]: a = np.arange(1e7).reshape(10,100,100,100).T
> In [17]: b = np.arange(1e7).reshape(10,100,100,100).T
>
> In [18]: timeit a**2 + b**2 + 2*a*b
> 1 loops, best of 3: 3.22 s per loop
>
> In [19]: ne.set_num_threads(1)
>
> # even with a C-ordered output, the iterator version performs better
>
> In [20]: timeit ne.evaluate("a**2 + b**2 + 2*a*b")
> 1 loops, best of 3: 3.74 s per loop
> In [21]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b")
> 1 loops, best of 3: 379 ms per loop
> In [22]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b", order='C')
> 1 loops, best of 3: 2.03 s per loop
>
> In [23]: ne.set_num_threads(2)
>
> # the standard version just uses 1 thread here, I believe
> # the iterator version performs the same as for the flat 1e7-sized array
>
> In [24]: timeit ne.evaluate("a**2 + b**2 + 2*a*b")
> 1 loops, best of 3: 3.92 s per loop
> In [25]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b")
> 1 loops, best of 3: 254 ms per loop
> In [26]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b", order='C')
> 1 loops, best of 3: 1.74 s per loop
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110109/fd3d355f/attachment.html>


More information about the NumPy-Discussion mailing list