[Numpy-discussion] NEP for faster ufuncs

Francesc Alted faltet at pytables.org
Wed Dec 22 13:41:10 EST 2010


A Wednesday 22 December 2010 18:21:28 Mark Wiebe escrigué:
> On Wed, Dec 22, 2010 at 9:07 AM, Francesc Alted <faltet at pytables.org> 
wrote:
> > A Wednesday 22 December 2010 17:25:13 Mark Wiebe escrigué:
> > > Can you print out your np.__version__, and try running the tests?
> > >  If newiter didn't build for some reason, its tests should be
> > > throwing a bunch of exceptions.

$ PYTHONPATH=numpy python -c "import numpy; numpy.test()"
Running unit tests for numpy
NumPy version 2.0.0.dev-147f817
NumPy is installed in /tmp/numpy/numpy
Python version 2.6.1 (r261:67515, Feb  3 2009, 17:34:37) [GCC 4.3.2 
[gcc-4_3-branch revision 141291]]
nose version 0.11.0
[clip]
Warning: divide by zero encountered in log
Warning: divide by zero encountered in log
[clip]
Ran 3094 tests in 16.771s
OK (KNOWNFAIL=4, SKIP=1)

IPython seems to work well too:

>>> np.__version__
'2.0.0.dev-147f817'
>>> timeit 3*a+b-(a/c)
10 loops, best of 3: 67.5 ms per loop

However, when trying you luf function:

>>> cpaste
[the luf code here]
--
>>> timeit luf(lambda a,b,c:3*a+b-(a/c), a, b, c)
[clip]
AttributeError: 'module' object has no attribute 'newiter'

> The reason I think it might help is that with 'luf' is that it's
> calculating the expression on smaller sized arrays, which possibly
> just got buffered. If the memory allocator for the temporaries keeps
> giving back the same addresses, all this will be in one of the
> caches very close to the CPU. Unless this cache is still too slow to
> feed the SSE instructions, there should be a speed benefit.  The
> ufunc inner loops could also use the SSE prefetch instructions based
> on the stride to give some strong hints about where the next memory
> bytes to use will be.

Ah, okay.  However, Numexpr is not meant to accelerate calculations with 
small operands.  I suppose that this is where your new iterator makes 
more sense: accelerating operations where some of the operands are small 
(i.e. fit in cache) and have to be broadcasted to match the 
dimensionality of the others.

-- 
Francesc Alted



More information about the NumPy-Discussion mailing list