Re: [Numpy-discussion] ANN: Numexpr 1.1, an efficient array evaluator
A Friday 16 January 2009, jh@physics.ucf.edu escrigué:
Hi Francesc,
Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python.
Please pardon my ignorance as I know this project has been around for a while. It this looks very exciting, but either it's cumbersome, or I'm not understanding exactly what's being fixed. If you can accelerate evaluation, why not just integrate the faster math into numpy, rather than having two packages? Or is this something that is only an advantage when the expression is given as a string (and why is that the case)? It would be helpful if you could put the answer on your web page and in your standard release blurb in some compact form. I guess what I'm really looking for when I read one of those is a quick answer to the question "should I look into this?".
Well, there is a link in the project page to the "Overview" section of the wiki, but perhaps is a bit hidden. I've added some blurb as you suggested in the main page an another link to the "Overview" wiki page. Hope that, by reading the new blurb, you can see why it accelerates expression evaluation with regard to NumPy. If not, tell me and will try to come with something more comprehensible.
Right now, I'm not quite sure whether the problem you are solving is merely the case of expressions-in-strings, and there is no advantage for expressions-in-code, or whether your expressions-in-strings are faster than numpy's expressions-in-code. In either case, it would appear this would be a good addition to the numpy core, and it's past 1.0, so why keep it separate? Even if there is value in having a non-numpy version, is there not also value in accelerating numpy by default?
Having the expression encapsulated in a string has the advantage that you exactly know the part of the code that you want to parse and accelerate. Making NumPy to understand parts of the Python code that can be accelerated sounds more like a true JIT for Python, and this is something that is not trivial at all (although, with the advent of PyPy there are appearing some efforts in this direction [1]). [1] http://www.enthought.com/~ischnell/paper.html Cheers, -- Francesc Alted
Hi Francesc,
this is a wonderful project ! I was just wondering if you would /
could support single precision float arrays ?
In 3+D image analysis we generally don't have enough memory to effort
double precision; and we could save our selves lots of extra C coding
(or Cython) coding of we could use numexpr ;-)
Thanks,
Sebastian Haase
On Fri, Jan 16, 2009 at 5:04 PM, Francesc Alted
A Friday 16 January 2009, jh@physics.ucf.edu escrigué:
Hi Francesc,
Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python.
Please pardon my ignorance as I know this project has been around for a while. It this looks very exciting, but either it's cumbersome, or I'm not understanding exactly what's being fixed. If you can accelerate evaluation, why not just integrate the faster math into numpy, rather than having two packages? Or is this something that is only an advantage when the expression is given as a string (and why is that the case)? It would be helpful if you could put the answer on your web page and in your standard release blurb in some compact form. I guess what I'm really looking for when I read one of those is a quick answer to the question "should I look into this?".
Well, there is a link in the project page to the "Overview" section of the wiki, but perhaps is a bit hidden. I've added some blurb as you suggested in the main page an another link to the "Overview" wiki page. Hope that, by reading the new blurb, you can see why it accelerates expression evaluation with regard to NumPy. If not, tell me and will try to come with something more comprehensible.
Right now, I'm not quite sure whether the problem you are solving is merely the case of expressions-in-strings, and there is no advantage for expressions-in-code, or whether your expressions-in-strings are faster than numpy's expressions-in-code. In either case, it would appear this would be a good addition to the numpy core, and it's past 1.0, so why keep it separate? Even if there is value in having a non-numpy version, is there not also value in accelerating numpy by default?
Having the expression encapsulated in a string has the advantage that you exactly know the part of the code that you want to parse and accelerate. Making NumPy to understand parts of the Python code that can be accelerated sounds more like a true JIT for Python, and this is something that is not trivial at all (although, with the advent of PyPy there are appearing some efforts in this direction [1]).
[1] http://www.enthought.com/~ischnell/paper.html
Cheers,
-- Francesc Alted _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
A Friday 16 January 2009, Sebastian Haase escrigué:
Hi Francesc, this is a wonderful project ! I was just wondering if you would / could support single precision float arrays ?
As I said before, it is doable, but I don't know if I will have time enough to implement this myself.
In 3+D image analysis we generally don't have enough memory to effort double precision; and we could save our selves lots of extra C coding (or Cython) coding of we could use numexpr ;-)
Well, one of the ideas that I'm toying long time ago is to provide the capability to Numexpr to work with PyTables disk-based objects. That way, you would be able to evaluate potentially complex expressions by using data that is completely on-disk. But this might be a completely different thing of what you are talking about. Cheers, -- Francesc Alted
Francesc Alted wrote:
A Friday 16 January 2009, jh@physics.ucf.edu escrigué:
Right now, I'm not quite sure whether the problem you are solving is merely the case of expressions-in-strings, and there is no advantage for expressions-in-code, or whether your expressions-in-strings are faster than numpy's expressions-in-code. In either case, it would appear this would be a good addition to the numpy core, and it's past 1.0, so why keep it separate? Even if there is value in having a non-numpy version, is there not also value in accelerating numpy by default?
Having the expression encapsulated in a string has the advantage that you exactly know the part of the code that you want to parse and accelerate. Making NumPy to understand parts of the Python code that can be accelerated sounds more like a true JIT for Python, and this is something that is not trivial at all (although, with the advent of PyPy there are appearing some efforts in this direction [1]).
A full compiler/JIT isn't needed, there's another route: One could use the Numexpr methodology together with a symbolic expression framework (like SymPy or the one in Sage). I.e. operator overloads and lazy expressions. Combining NumExpr with a symbolic manipulation engine would be very cool IMO. Unfortunately I don't have time myself (and I understand that you don't, I'm just mentioning it). Example using psuedo-Sage-like syntax: a = np.arange(bignum) b = np.arange(bignum) x, y = sage.var("x, y") expr = sage.integrate(x + y, x) z = expr(x=a, y=b) # z = a**2/2 + b, but Numexpr-enabled -- Dag Sverre
Francesc Alted wrote:
Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python.
Please pardon my ignorance as I know this project has been around for a while. It this looks very exciting, but either it's cumbersome, or I'm not understanding exactly what's being fixed. If you can accelerate evaluation, why not just integrate the faster math into numpy, rather than having two packages? Or is this something that is only an advantage when the expression is given as a string (and why is that the case)? It would be helpful if you could put the answer on your web page and in your standard release blurb in some compact form. I guess what I'm really looking for when I read one of those is a quick answer to the question "should I look into this?".
Well, there is a link in the project page to the "Overview" section of the wiki, but perhaps is a bit hidden. I've added some blurb as you suggested in the main page an another link to the "Overview" wiki page. Hope that, by reading the new blurb, you can see why it accelerates expression evaluation with regard to NumPy. If not, tell me and will try to come with something more comprehensible.
I did see the overview. The addition you made is great but it's so far down that many won't get to it. Even in its section, the meat of it is below three paragraphs that most users won't care about and many won't understand. I've posted some notes on writing intros in Developer_Zone. In the following, I've reordered the page to address the questions of potential users first, edited it a bit, and fixed the example to conform to our doc standards (and 128->256; hope that was right). See what you think... ** Description: The numexpr package evaluates multiple-operator array expressions many times faster than numpy can. It accepts the expression as a string, analyzes it, rewrites it more efficiently, and compiles it to faster Python code on the fly. It's the next best thing to writing the expression in C and compiling it with an optimizing compiler (as scipy.weave does), but requires no compiler at runtime. Using it is simple:
import numpy as np import numexpr as ne a = np.arange(10) b = np.arange(0, 20, 2) c = ne.evaluate("2*a+3*b") c array([ 0, 8, 16, 24, 32, 40, 48, 56, 64, 72])
** Why does it work? There are two extremes to array expression evaluation. Each binary operation can run separately over the array elements and return a temporary array. This is what NumPy does: 2*a + 3*b uses three temporary arrays as large as a or b. This strategy wastes memory (a problem if the arrays are large). It is also not a good use of CPU cache memory because the results of 2*a and 3*b will not be in cache for the final addition if the arrays are large. The other extreme is to loop over each element: for i in xrange(len(a)): c[i] = 2*a[i] + 3*b[i] This conserves memory and is good for the cache, but on each iteration Python must check the type of each operand and select the correct routine for each operation. All but the first such checks are wasted, as the input arrays are not changing. numexpr uses an in-between approach. Arrays are handled in chunks (the first pass uses 256 elements). As Python code, it looks something like this: for i in xrange(0, len(a), 256): r0 = a[i:i+256] r1 = b[i:i+256] multiply(r0, 2, r2) multiply(r1, 3, r3) add(r2, r3, r2) c[i:i+256] = r2 The 3-argument form of add() stores the result in the third argument, instead of allocating a new array. This achieves a good balance between cache and branch prediction. The virtual machine is written entirely in C, which makes it faster than the Python above. ** Supported Operators (unchanged) ** Supported Functions (unchanged, but capitalize 'F') ** Usage Notes (no need to repeat the example) Numexpr's principal routine is: evaluate(ex, local_dict=None, global_dict=None, **kwargs) ex is a string forming an expression, like "2*a+3*b". The values for a and b will by default be taken from the calling function's frame (through the use of sys._getframe()). Alternatively, they can be specified using the local_dict or global_dict` arguments, or passed as keyword arguments. Expressions are cached, so reuse is fast. Arrays or scalars are allowed for the variables, which must be of type 8-bit boolean (bool), 32-bit signed integer (int), 64-bit signed integer (long), double-precision floating point number (float), 2x64-bit, double-precision complex number (complex) or raw string of bytes (str). The arrays must all be the same size. ** Building (unchanged, but move down since it's standard and most users will only do this once, if ever) ** Implementation Notes (rest of current How It Works section) ** Credits --jh--
A Sunday 18 January 2009, jh@physics.ucf.edu escrigué:
Francesc Alted wrote:
Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python.
Please pardon my ignorance as I know this project has been around for a while. It this looks very exciting, but either it's cumbersome, or I'm not understanding exactly what's being fixed. If you can accelerate evaluation, why not just integrate the faster math into numpy, rather than having two packages? Or is this something that is only an advantage when the expression is given as a string (and why is that the case)? It would be helpful if you could put the answer on your web page and in your standard release blurb in some compact form. I guess what I'm really looking for when I read one of those is a quick answer to the question "should I look into this?".
Well, there is a link in the project page to the "Overview" section of the wiki, but perhaps is a bit hidden. I've added some blurb as you suggested in the main page an another link to the "Overview" wiki page. Hope that, by reading the new blurb, you can see why it accelerates expression evaluation with regard to NumPy. If not, tell me and will try to come with something more comprehensible.
I did see the overview. The addition you made is great but it's so far down that many won't get to it. Even in its section, the meat of it is below three paragraphs that most users won't care about and many won't understand. I've posted some notes on writing intros in Developer_Zone.
In the following, I've reordered the page to address the questions of potential users first, edited it a bit, and fixed the example to conform to our doc standards (and 128->256; hope that was right). See what you think... [clip]
That's great! I've heavily changed the docs on the project site. I've followed your advices in most of places, but not always (a `Building` section has to be always high on a manual, IMHO). Thanks a lot for your contribution! -- Francesc Alted
Thanks! I think this will help the package attract a lot of users. A couple of housekeeping things: on http://code.google.com/p/numexpr: What it is? -> What is it? or What it is (no question mark) on http://code.google.com/p/numexpr/wiki/Overview: The last example got incorporated as straight text somehow. In firefox, the first code example runs into the pastel boxes on the right for modest-width browsers. This is a common problem with firefox, but I think it comes from improper HTML code that IE somehow deals with, rather than non-standard behavior in firefox. One thing I'd add is a benchmark example against numpy. Make it simple, so that people can copy and modify the benchmark code to test their own performance improvements. I added an entry for it on the Topical Software list. Please check it out and modify as you see fit --jh--
participants (4)
-
Dag Sverre Seljebotn
-
Francesc Alted
-
jh@physics.ucf.edu
-
Sebastian Haase