[Numpy-discussion] ANN: Numexpr 1.1, an efficient array evaluator

Sun Jan 18 16:21:42 EST 2009

Francesc Alted wrote:

> > > Numexpr is a fast numerical expression evaluator for NumPy.  With
> > > it, expressions that operate on arrays (like "3*a+4*b") are
> > > accelerated and use less memory than doing the same calculation in
> > > Python.
>
> > Please pardon my ignorance as I know this project has been around for
> > a while.  It this looks very exciting, but either it's cumbersome, or
> > I'm not understanding exactly what's being fixed.  If you can
> > accelerate evaluation, why not just integrate the faster math into
> > numpy, rather than having two packages?  Or is this something that is
> > only an advantage when the expression is given as a string (and why
> > is that the case)?  It would be helpful if you could put the answer
> > on your web page and in your standard release blurb in some compact
> > form. I guess what I'm really looking for when I read one of those is
> > a quick answer to the question "should I look into this?".

> Well, there is a link in the project page to the "Overview" section of 
> the wiki, but perhaps is a bit hidden.  I've added some blurb as you 
> suggested in the main page an another link to the "Overview" wiki page.
> Hope that, by reading the new blurb, you can see why it accelerates 
> expression evaluation with regard to NumPy.  If not, tell me and will 
> try to come with something more comprehensible.

I did see the overview.  The addition you made is great but it's so
far down that many won't get to it.  Even in its section, the meat of
it is below three paragraphs that most users won't care about and many
won't understand.  I've posted some notes on writing intros in
Developer_Zone.

In the following, I've reordered the page to address the questions of
potential users first, edited it a bit, and fixed the example to
conform to our doc standards (and 128->256; hope that was right).  See
what you think...

** Description:

The numexpr package evaluates multiple-operator array expressions many
times faster than numpy can.  It accepts the expression as a string,
analyzes it, rewrites it more efficiently, and compiles it to faster
Python code on the fly.  It's the next best thing to writing the
expression in C and compiling it with an optimizing compiler (as
scipy.weave does), but requires no compiler at runtime.

Using it is simple:

>>> import numpy as np
>>> import numexpr as ne
>>> a = np.arange(10)
>>> b = np.arange(0, 20, 2)
>>> c = ne.evaluate("2*a+3*b")
>>> c
array([ 0,  8, 16, 24, 32, 40, 48, 56, 64, 72])

** Why does it work?

There are two extremes to array expression evaluation.  Each binary
operation can run separately over the array elements and return a
temporary array.  This is what NumPy does: 2*a + 3*b uses three
temporary arrays as large as a or b.  This strategy wastes memory (a
problem if the arrays are large).  It is also not a good use of CPU
cache memory because the results of 2*a and 3*b will not be in cache
for the final addition if the arrays are large.

The other extreme is to loop over each element:

for i in xrange(len(a)):
    c[i] = 2*a[i] + 3*b[i]

This conserves memory and is good for the cache, but on each iteration
Python must check the type of each operand and select the correct
routine for each operation.  All but the first such checks are wasted,
as the input arrays are not changing.

numexpr uses an in-between approach.  Arrays are handled in chunks
(the first pass uses 256 elements).  As Python code, it looks
something like this:

for i in xrange(0, len(a), 256):
    r0 = a[i:i+256]
    r1 = b[i:i+256]
    multiply(r0, 2, r2)
    multiply(r1, 3, r3)
    add(r2, r3, r2)
    c[i:i+256] = r2

The 3-argument form of add() stores the result in the third argument,
instead of allocating a new array.  This achieves a good balance
between cache and branch prediction.  The virtual machine is written
entirely in C, which makes it faster than the Python above.

** Supported Operators (unchanged)

** Supported Functions (unchanged, but capitalize 'F')

** Usage Notes (no need to repeat the example)

Numexpr's principal routine is:

evaluate(ex, local_dict=None, global_dict=None, **kwargs)

ex is a string forming an expression, like "2*a+3*b".  The values for
a and b will by default be taken from the calling function's frame
(through the use of sys._getframe()).  Alternatively, they can be
specified using the local_dict or global_dict` arguments, or passed as
keyword arguments.

Expressions are cached, so reuse is fast.  Arrays or scalars are
allowed for the variables, which must be of type 8-bit boolean (bool),
32-bit signed integer (int), 64-bit signed integer (long),
double-precision floating point number (float), 2x64-bit,
double-precision complex number (complex) or raw string of bytes
(str).  The arrays must all be the same size.

** Building (unchanged, but move down since it's standard and most
   users will only do this once, if ever)

** Implementation Notes (rest of current How It Works section)

** Credits

--jh--