[Numpy-discussion] NEP for faster ufuncs
Charles R Harris
charlesr.harris at gmail.com
Wed Dec 22 01:05:36 EST 2010
On Tue, Dec 21, 2010 at 5:53 PM, Mark Wiebe <mwwiebe at gmail.com> wrote:
> Hello NumPy-ers,
>
> After some performance analysis, I've designed and implemented a new
> iterator designed to speed up ufuncs and allow for easier multi-dimensional
> iteration. The new code is fairly large, but works quite well already. If
> some people could read the NEP and give some feedback, that would be great!
> Here's a link:
>
>
> https://github.com/m-paradox/numpy/blob/mw_neps/doc/neps/new-iterator-ufunc.rst
>
> I would also love it if someone could try building the code and play around
> with it a bit. The github branch is here:
>
> https://github.com/m-paradox/numpy/tree/new_iterator
>
> To give a taste of the iterator's functionality, below is an example from
> the NEP for how to implement a "Lambda UFunc." With just a few lines of
> code, it's possible to replicate something similar to the numexpr library
> (numexpr still gets a bigger speedup, though). In the example expression I
> chose, execution time went from 138ms to 61ms.
>
> Hopefully this is a good Christmas present for NumPy. :)
>
> Cheers,
> Mark
>
> Here is the definition of the ``luf`` function.::
>
> def luf(lamdaexpr, *args, **kwargs):
> """Lambda UFunc
>
> e.g.
> c = luf(lambda i,j:i+j, a, b, order='K',
> casting='safe', buffersize=8192)
>
> c = np.empty(...)
> luf(lambda i,j:i+j, a, b, out=c, order='K',
> casting='safe', buffersize=8192)
> """
>
> nargs = len(args)
> op = args + (kwargs.get('out',None),)
> it = np.newiter(op, ['buffered','no_inner_iteration'],
> [['readonly','nbo_aligned']]*nargs +
> [['writeonly','allocate','no_broadcast']],
> order=kwargs.get('order','K'),
> casting=kwargs.get('casting','safe'),
> buffersize=kwargs.get('buffersize',0))
> while not it.finished:
> it[-1] = lamdaexpr(*it[:-1])
> it.iternext()
>
> return it.operands[-1]
>
> Then, by using ``luf`` instead of straight Python expressions, we
> can gain some performance from better cache behavior.::
>
> In [2]: a = np.random.random((50,50,50,10))
> In [3]: b = np.random.random((50,50,1,10))
> In [4]: c = np.random.random((50,50,50,1))
>
> In [5]: timeit 3*a+b-(a/c)
> 1 loops, best of 3: 138 ms per loop
>
> In [6]: timeit luf(lambda a,b,c:3*a+b-(a/c), a, b, c)
> 10 loops, best of 3: 60.9 ms per loop
>
> In [7]: np.all(3*a+b-(a/c) == luf(lambda a,b,c:3*a+b-(a/c), a, b, c))
> Out[7]: True
>
>
>
Wow, that's a really nice design and write up. Small typo:
/* Only allow exactly equivalent types */
NPY_NO_CASTING=0,
/* Allow casts between equivalent types of different byte orders */
NPY_EQUIV_CASTING=0,
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20101221/1760676d/attachment.html>
More information about the NumPy-Discussion
mailing list