Mailman 3 vectorizing loops - NumPy-Discussion

newer
Re: [Numpy-discussion] numpy FFT...

vectorizing loops

Mathew Yeates

25 Oct 2007 25 Oct '07

4:16 p.m.

Anybody know of any tricks for handling something like z[0]=1.0 for i in range(100): out[i]=func1(z[i]) z[i+1]=func2(out[i]) ??

Show replies by date

Gael Varoquaux

25 Oct 25 Oct

11:36 p.m.

On Thu, Oct 25, 2007 at 04:16:06PM -0700, Mathew Yeates wrote:

...

Anybody know of any tricks for handling something like

...

z[0]=1.0 for i in range(100): out[i]=func1(z[i]) z[i+1]=func2(out[i])

Something like: z[0] = 1. out = func1(z) z[1:] = func2(out[:-1]) HTH, Gaël

David Cournapeau

11:44 p.m.

Gael Varoquaux wrote:

...

On Thu, Oct 25, 2007 at 04:16:06PM -0700, Mathew Yeates wrote:

...
Anybody know of any tricks for handling something like

...
z[0]=1.0 for i in range(100): out[i]=func1(z[i]) z[i+1]=func2(out[i])

Something like:

z[0] = 1. out = func1(z) z[1:] = func2(out[:-1]) This only works if func1 has no side effect on its argument. The problem boils down to whether the above algorithm is recursive or not (does z[i+1] needs z[i]). If func1 has no side effect, then your solution is fine (but then writing the thing as out[i] = func1(z[i]); z[i+1] = func2(z[i]) is not really intuitive). If func1 has side effect, then you cannot vectorize easily.

cheers, David P.S: IMHO, this is one of the main limitation of numpy (or any language using arrays for speed; and this is really difficult to optimize: you need compilation, JIT or similar to solve those efficiently).

Sebastian Haase

26 Oct 26 Oct

5:06 a.m.

On 10/26/07, David Cournapeau wrote:

...

P.S: IMHO, this is one of the main limitation of numpy (or any language using arrays for speed; and this is really difficult to optimize: you need compilation, JIT or similar to solve those efficiently).

This is where the scipy - sandbox numexpr project comes in - if I'm not misaken .... http://www.scipy.org/SciPyPackages/NumExpr Description The scipy.sandbox.numexpr package supplies routines for the fast evaluation of array expressions elementwise by using a vector-based virtual machine. It's comparable to scipy.weave.blitz (in Weave), but doesn't require a separate compile step of C or C++ code. I hope that more noise around this will result in more interest and subsequentially result in more support. I think numexpr might be one of the most powerful ideas in numpy / scipy "recently". Did you know about numexpr - David ? Cheers, Sebastian Haase

Timothy Hochberg

6:30 a.m.

On 10/26/07, Sebastian Haase wrote:

...

On 10/26/07, David Cournapeau wrote:

...
P.S: IMHO, this is one of the main limitation of numpy (or any language using arrays for speed; and this is really difficult to optimize: you need compilation, JIT or similar to solve those efficiently).

This is where the scipy - sandbox numexpr project comes in - if I'm not misaken ....

http://www.scipy.org/SciPyPackages/NumExpr

Description The scipy.sandbox.numexpr package supplies routines for the fast evaluation of array expressions elementwise by using a vector-based virtual machine. It's comparable to scipy.weave.blitz (in Weave), but doesn't require a separate compile step of C or C++ code.

I hope that more noise around this will result in more interest and subsequentially result in more support. I think numexpr might be one of the most powerful ideas in numpy / scipy "recently". Did you know about numexpr - David ?

Sadly, I don't think numexpr will help here; It basically handles the same cases as numpy; only faster because it can avoid a lot of temporaries. I think he's going to need Psyco, Pyrex, Weave or similar. -- . __ . |-\ . . tim.hochberg@ieee.org

Charles R Harris

11:37 a.m.

Hi, On 10/26/07, Timothy Hochberg wrote:

...

On 10/26/07, Sebastian Haase wrote:

...
On 10/26/07, David Cournapeau wrote:

...
P.S: IMHO, this is one of the main limitation of numpy (or any language using arrays for speed; and this is really difficult to optimize: you need compilation, JIT or similar to solve those efficiently).

This is where the scipy - sandbox numexpr project comes in - if I'm not misaken ....

http://www.scipy.org/SciPyPackages/NumExpr

Description The scipy.sandbox.numexpr package supplies routines for the fast evaluation of array expressions elementwise by using a vector-based virtual machine. It's comparable to scipy.weave.blitz (in Weave), but doesn't require a separate compile step of C or C++ code.

I hope that more noise around this will result in more interest and subsequentially result in more support. I think numexpr might be one of the most powerful ideas in numpy / scipy "recently". Did you know about numexpr - David ?

Sadly, I don't think numexpr will help here; It basically handles the same cases as numpy; only faster because it can avoid a lot of temporaries. I think he's going to need Psyco, Pyrex, Weave or similar.

This problem of efficient recursion turns up often. The only versions we support now are the reduce and accumulate methods for ufuncs. I wonder if we can think of some way of offering support that will speed it up? The problem space is big enough that there is probably no method that will solve all problems, but maybe we can toss around some thoughts. For 2-D sorts of things recursion based on functions on "half squares" preceding the current value could be useful. A reverse recursion would also be useful in evaluating a lot of series. The posted problem is sort of a matrix recursion, though not necessarily linear. Some sort of accumulate method using a vector of function calls would probably do the trick. Hmmm... I think that would not be too hard to do, although the calls to python functions would probably keep the speedup down to a factor of 2-3x if implemented in C. In a sense, I am thinking of accumulate and reduce type methods attached to vectors of functions with more than 2 arguments, iterated multidimensional maps, if you will. Chuck

David Cournapeau

28 Oct 28 Oct

7:18 p.m.

Sebastian Haase wrote:

...

On 10/26/07, David Cournapeau wrote:

...
P.S: IMHO, this is one of the main limitation of numpy (or any language using arrays for speed; and this is really difficult to optimize: you need compilation, JIT or similar to solve those efficiently).

This is where the scipy - sandbox numexpr project comes in - if I'm not misaken ....

http://www.scipy.org/SciPyPackages/NumExpr

Description The scipy.sandbox.numexpr package supplies routines for the fast evaluation of array expressions elementwise by using a vector-based virtual machine. It's comparable to scipy.weave.blitz (in Weave), but doesn't require a separate compile step of C or C++ code.

I hope that more noise around this will result in more interest and subsequentially result in more support. I think numexpr might be one of the most powerful ideas in numpy / scipy "recently". Did you know about numexpr - David ?

I knew about it, but never really tried it. But numexpr still through python for function calls, right ? The big problem of python for (recursive) numeric work is the function call cost. On my macbook (core 2 duo @ 2 ghz), which I already consider quite beefy, I cannot call more than 0.5-1 million functions every second (really simple ones, e.g. not member functions of some objects, which are more costly). That's why I don't see much hope outside JIT for those (incidently, the kind of things 'we' are doing seem like the most simple things to JIT). cheers, David

Christopher Barker

29 Oct 29 Oct

9:44 a.m.

...

...
(incidently, the kind of things 'we' are doing seem like the most simple things to JIT).

Wouldn't a numpy-aware psyco be cool then? Oh well, I'm not going to write it! (though as I think about it, for the special case of a contiguous array, it would be awfully similar to an array.array --- hmmm.) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Timothy Hochberg

10:53 a.m.

On 10/29/07, Christopher Barker wrote:

...

...
...
(incidently, the kind of things 'we' are doing seem like the most simple things to JIT).

Wouldn't a numpy-aware psyco be cool then?

Oh well, I'm not going to write it!

(though as I think about it, for the special case of a contiguous array, it would be awfully similar to an array.array --- hmmm.)

Psyco is aware of array.array and can operate on array.array's quite fast. [In fact, somewhere I have a ndarray-like class based on Psyco that runs pretty fast for integers, but not so fast for floats]. The problem that Psyco has for numeric purposes is that it has no concept of floating point numbers. It can "natively" store only a few different things: ints, pointers, and arrays of ints or pointers. To put, for example, doubles, onto the Psyco stack or into one of it's registers, you need to break the bits of the double up, and stuff them into a couple of different int registers. Then to operate on them you need to put them back together, since they may get separated. All of this adds a fair amount of overhead. I've been hoping that the PyPy jit will address this, but I haven't had time to follow that project closely enough to see if that's on the agenda. -- . __ . |-\ . . tim.hochberg@ieee.org

David Cournapeau

8:20 p.m.

Timothy Hochberg wrote:

...

On 10/29/07, *Christopher Barker* mailto:Chris.Barker@noaa.gov> wrote:

>> (incidently, the kind of things 'we' are doing seem like the most > simple things to JIT).

Wouldn't a numpy-aware psyco be cool then?

Oh well, I'm not going to write it!

(though as I think about it, for the special case of a contiguous array, it would be awfully similar to an array.array --- hmmm.)

Psyco is aware of array.array and can operate on array.array's quite fast. [In fact, somewhere I have a ndarray-like class based on Psyco that runs pretty fast for integers, but not so fast for floats]. The problem that Psyco has for numeric purposes is that it has no concept of floating point numbers. It can "natively" store only a few different things: ints, pointers, and arrays of ints or pointers. To put, for example, doubles, onto the Psyco stack or into one of it's registers, you need to break the bits of the double up, and stuff them into a couple of different int registers. Then to operate on them you need to put them back together, since they may get separated. All of this adds a fair amount of overhead.

I've been hoping that the PyPy jit will address this, but I haven't had time to follow that project closely enough to see if that's on the agenda.

My impression is that it is hard to follow pypy when you are not 'part of it', but just from the ML, and a couple of questions of mine, my understanding is that it is on the agenda. They have a JIT (maybe does not work on float, though), there is work on numpy arrays at the rpython level (rpython being the typed subset of python pypy is using at their core). cheers, David

...

-- . __ . |-\ . . tim.hochberg@ieee.org mailto:tim.hochberg@ieee.org ------------------------------------------------------------------------

_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Mathew Yeates

30 Oct 30 Oct

5:06 p.m.

Thanks all for your suggestions. I see it brought up a lot of interesting asides. But my loop was simple enough that I was able to figure it out. I also took a look at NumExpr. While it wasn't something I needed for vectorizing, it still looks very interesting. What kinds of performance improvements would be expected using this? Mathew Sebastian Haase wrote:

...

On 10/26/07, David Cournapeau wrote:

...
P.S: IMHO, this is one of the main limitation of numpy (or any language using arrays for speed; and this is really difficult to optimize: you need compilation, JIT or similar to solve those efficiently).

This is where the scipy - sandbox numexpr project comes in - if I'm not misaken ....

http://www.scipy.org/SciPyPackages/NumExpr

Description The scipy.sandbox.numexpr package supplies routines for the fast evaluation of array expressions elementwise by using a vector-based virtual machine. It's comparable to scipy.weave.blitz (in Weave), but doesn't require a separate compile step of C or C++ code.

I hope that more noise around this will result in more interest and subsequentially result in more support. I think numexpr might be one of the most powerful ideas in numpy / scipy "recently". Did you know about numexpr - David ?

Cheers, Sebastian Haase _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Francesc Altet

31 Oct 31 Oct

3:18 a.m.

A Wednesday 31 October 2007, Mathew Yeates escrigué: [...]

...

I also took a look at NumExpr. While it wasn't something I needed for vectorizing, it still looks very interesting. What kinds of performance improvements would be expected using this?

Well, for some speedup figures you can always check the first (and very exciting) message (and many of the interesting followups) of the original author, David Cooke, in this very same list: http://thread.gmane.org/gmane.comp.python.numeric.general/4081/focus=4138 This was back in March 2006. Since then, David Cooke and Tim Hochberg (with some additions of Ivan Vilata and myself) have added many interesting functionality, like the support for logical and comparison operators, many functions, like 'where', 'sqrt' and all the trigonometric and inverse trigonometric, reductions ('sum' and 'prod') and also optimizations for some specific operations, like division (a/b) or exponentation (a**b) with floating points, among others. I'm attaching a file (timing2-orig.out) with the results of the file 'timing.py' that comes with the numexpr distribution for more up-to-date figures (all the runs mentioned here have been executed on a Opteron at 2 GHz machine). On it, you can see that the speed-ups for a variety of tests range from 1.1x up to 8.7x, being the mean speedup of aproximately 2x, which is pretty good. Furthermore, and due to the specifics needs of Pytables (mainly, for doing 'in-core' selections in tables), Ivan and me have ended creating an enhanced version of numexpr that has improved support for more datatypes (long integers and strings, mainly), improvements for boolean operations and specific optimization for unidimensional arrays being strided and/or unaligned (these cases can be quite common in PyTables). You can see the result of these improvements in the attached 'boolean_timing-tables.out' file which is the output of the (also attached) benchmark 'boolean_timing.py' (see 'boolean_timing-orig.out' for the results using the original numexpr). The main things to notice is that, for these kind of 'boolean' computations, the Pytables version of numexpr can be more than 3x faster than plain Numpy and up to 1.8x faster (2.0x for the unidimensional strided case) than the original numexpr. Incidentally, all the improvements of the PyTables flavor of numexpr have been reported to the original authors, but, for the sake of keeping numexpr simple, they decided to implement only some of them. However, people is encouraged to try out the Pytables flavor from: http://www.pytables.org/trac/browser/trunk/tables/numexpr/ Or, by installing PyTables 2.x series and importing numexpr as: "from tables import numexpr" Cheers, --

...

0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"

Timothy Hochberg

11:51 a.m.

On Oct 31, 2007 3:18 AM, Francesc Altet wrote: [SNIP]

...

Incidentally, all the improvements of the PyTables flavor of numexpr have been reported to the original authors, but, for the sake of keeping numexpr simple, they decided to implement only some of them. However, people is encouraged to try out the Pytables flavor from:

My original concern was that we'd start overflowing the code cache and slow things down. Some experiments seemed to confirm this on some processors, but it then turned out those were in error. At least that's my recollection. Because of that, and because PyTables is, as far as I know, the major user of numexpr, I suspect I'd be fine putting those changes in now. I say "suspect" since it's been a long time since I looked at the relevant patches, and I should probably look at those again before I commit myself. I just haven't had the free time and motivation to go back and look at the patches. I can't speak for David though. -- . __ . |-\ . . tim.hochberg@ieee.org

Francesc Altet

1 Nov 1 Nov

5:56 a.m.

A Wednesday 31 October 2007, Timothy Hochberg escrigué:

...

On Oct 31, 2007 3:18 AM, Francesc Altet wrote:

[SNIP]

...
Incidentally, all the improvements of the PyTables flavor of numexpr have been reported to the original authors, but, for the sake of keeping numexpr simple, they decided to implement only some of them. However, people is encouraged to try out the Pytables flavor from:

My original concern was that we'd start overflowing the code cache and slow things down. Some experiments seemed to confirm this on some processors, but it then turned out those were in error. At least that's my recollection. Because of that, and because PyTables is, as far as I know, the major user of numexpr, I suspect I'd be fine putting those changes in now. I say "suspect" since it's been a long time since I looked at the relevant patches, and I should probably look at those again before I commit myself. I just haven't had the free time and motivation to go back and look at the patches. I can't speak for David though.

Yes, I remember that you were mainly concerned with overflowing the code cache, and yes, I think you can be confident this is not the case, as my benchmarking in many CPU's (ranging from 7 year old AMD Duron, Pentiums 4 and up to relatively recents AMD Opterons), doesn't show any slow down with the PyTables version of numexpr, but rather the contrary, it speeds things up, specially in contexts where booleans and/or strided/unaligned arrays are used, where the improvement can be up to 2x on recent CPU's (as shown in my previous post). If I remember correctly, another point where you (specially David) were not very keen to include was the support for strings, arguing that numexpr is meant mainly to deal with numerical data, not textual. However, our experience is that adding this support was not too complicated (mainly due to NumPy flexibility), and can be helpful in some instances. As for one, we use them for evaluating expressions like 'array_string == "some_string"', and this can be very convenient to use when you are in the middle of potentially complex boolean expressions that you want to evaluate *fast*. At any rate, we would be glad if you would like to integrate our patches in the main numexpr, as there is not much sense to have different implementations of numexpr (most specially when it seems that there are not much users out there). So, count on us for any question you may have in this regard. Cheers, --

...

0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"

David M. Cooke

7:14 a.m.

On Nov 1, 2007, at 08:56 , Francesc Altet wrote:

...

A Wednesday 31 October 2007, Timothy Hochberg escrigué:

...
On Oct 31, 2007 3:18 AM, Francesc Altet wrote:

[SNIP]

...
Incidentally, all the improvements of the PyTables flavor of numexpr have been reported to the original authors, but, for the sake of keeping numexpr simple, they decided to implement only some of them. However, people is encouraged to try out the Pytables flavor from:

My original concern was that we'd start overflowing the code cache and slow things down. Some experiments seemed to confirm this on some processors, but it then turned out those were in error. At least that's my recollection. Because of that, and because PyTables is, as far as I know, the major user of numexpr, I suspect I'd be fine putting those changes in now. I say "suspect" since it's been a long time since I looked at the relevant patches, and I should probably look at those again before I commit myself. I just haven't had the free time and motivation to go back and look at the patches. I can't speak for David though.

If I remember correctly, another point where you (specially David) were not very keen to include was the support for strings, arguing that numexpr is meant mainly to deal with numerical data, not textual. However, our experience is that adding this support was not too complicated (mainly due to NumPy flexibility), and can be helpful in some instances. As for one, we use them for evaluating expressions like 'array_string == "some_string"', and this can be very convenient to use when you are in the middle of potentially complex boolean expressions that you want to evaluate *fast*.

At any rate, we would be glad if you would like to integrate our patches in the main numexpr, as there is not much sense to have different implementations of numexpr (most specially when it seems that there are not much users out there). So, count on us for any question you may have in this regard.

Well, I don't have much time to work on it, but if you make sure your patches on the scipy Trac apply clean, I'll have a quick look at them and apply them. Since you've had them working in production code, they should be good ;-) Another issue is that numexpr is still in the scipy sandbox, so only those who enable it will use it (or use it through PyTables). One problem with moving it out is that Tim reports the compile times on Windows are ridiculous (20 mins!). Maybe numexpr should become a scikit? It certainly doesn't need the rest of scipy. -- |>|\/|< /------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca

Timothy Hochberg

3:06 p.m.

On Nov 1, 2007 7:14 AM, David M. Cooke wrote:

...

On Nov 1, 2007, at 08:56 , Francesc Altet wrote:

...
A Wednesday 31 October 2007, Timothy Hochberg escrigué:

...
On Oct 31, 2007 3:18 AM, Francesc Altet wrote:

[SNIP]

...
Incidentally, all the improvements of the PyTables flavor of numexpr have been reported to the original authors, but, for the sake of keeping numexpr simple, they decided to implement only some of them. However, people is encouraged to try out the Pytables flavor from:

My original concern was that we'd start overflowing the code cache and slow things down. Some experiments seemed to confirm this on some processors, but it then turned out those were in error. At least that's my recollection. Because of that, and because PyTables is, as far as I know, the major user of numexpr, I suspect I'd be fine putting those changes in now. I say "suspect" since it's been a long time since I looked at the relevant patches, and I should probably look at those again before I commit myself. I just haven't had the free time and motivation to go back and look at the patches. I can't speak for David though.

If I remember correctly, another point where you (specially David) were not very keen to include was the support for strings, arguing that numexpr is meant mainly to deal with numerical data, not textual. However, our experience is that adding this support was not too complicated (mainly due to NumPy flexibility), and can be helpful in some instances. As for one, we use them for evaluating expressions like 'array_string == "some_string"', and this can be very convenient to use when you are in the middle of potentially complex boolean expressions that you want to evaluate *fast*.

At any rate, we would be glad if you would like to integrate our patches in the main numexpr, as there is not much sense to have different implementations of numexpr (most specially when it seems that there are not much users out there). So, count on us for any question you may have in this regard.

Well, I don't have much time to work on it, but if you make sure your patches on the scipy Trac apply clean, I'll have a quick look at them and apply them. Since you've had them working in production code, they should be good ;-)

Another issue is that numexpr is still in the scipy sandbox, so only those who enable it will use it (or use it through PyTables). One problem with moving it out is that Tim reports the compile times on Windows are ridiculous (20 mins!).

While this is true at the default optimization (O2), it compiles reasonably quickly at O1 and I've never been able to detect a speed difference between versions compiled with O1 versus O2. It would probably be sufficient to crank back the optimization on Windows.

...

Maybe numexpr should become a scikit? It certainly doesn't need the rest of scipy.

-- . __ . |-\ . . tim.hochberg@ieee.org

Francesc Altet

2 Nov 2 Nov

6:55 a.m.

A Thursday 01 November 2007, Timothy Hochberg escrigué:

...

On Nov 1, 2007 7:14 AM, David M. Cooke > Another issue is that numexpr is still in the scipy sandbox, so

...
only those who enable it will use it (or use it through PyTables). One problem with moving it out is that Tim reports the compile times on Windows are ridiculous (20 mins!).

While this is true at the default optimization (O2), it compiles reasonably quickly at O1 and I've never been able to detect a speed difference between versions compiled with O1 versus O2. It would probably be sufficient to crank back the optimization on Windows.

Yes. This has been my experience too on Windows/MSVC boxes.

...

...
Maybe numexpr should become a scikit? It certainly doesn't need the rest of scipy.

Call me intrepid, but I've always felt that numexpr belongs more to numpy itself than scipy. However, I agree that perhaps it should be a bit more polished (but not much; perhaps just adding some functions like, exp, log, log10... would be enough) before being integrated. At any rate, Numexpr would be a extremely useful complement to NumPy. My two cents, --

...

0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"

Francesc Altet

6:40 a.m.

A Thursday 01 November 2007, David M. Cooke escrigué:

...

...
At any rate, we would be glad if you would like to integrate our patches in the main numexpr, as there is not much sense to have different implementations of numexpr (most specially when it seems that there are not much users out there). So, count on us for any question you may have in this regard.

Well, I don't have much time to work on it, but if you make sure your patches on the scipy Trac apply clean, I'll have a quick look at them and apply them. Since you've had them working in production code, they should be good ;-)

Well, being in production and around 10000 tests (where numexpr is involved) in the pytables package seems a good guarantee to me too ;-) I've attached a clean patch to ticket #529 of SciPy site: http://scipy.org/scipy/scipy/ticket/529 However, I've committed a couple of mistakes during ticket creation, so: - in #529, I've uploaded the patch twice, so they are exactly the same - please mark #530 as invalid (duplicated) Cheers, --

...

0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"

Robert Kern

25 Oct 25 Oct

11:56 p.m.

Gael Varoquaux wrote:

...

On Thu, Oct 25, 2007 at 04:16:06PM -0700, Mathew Yeates wrote:

...
Anybody know of any tricks for handling something like

...
z[0]=1.0 for i in range(100): out[i]=func1(z[i]) z[i+1]=func2(out[i])

Something like:

z[0] = 1. out = func1(z) z[1:] = func2(out[:-1])

No, that doesn't work. The way you have it, for each i>0, z[i] = func2(func1(0)) What Matthew wants is this z[0] = 1.0 z[1] = func2(func1(1.0)) z[2] = func2(func1(func2(func1(1.0)))) ... I'm afraid that there is no fast, elegant way to do this. If you could turn func2(func1(x)) into a binary ufunc f(x, y) with an ignored y, then you could call f.accumulate(z). However, if that ufunc is not implemented in C, there's not much point. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco