Mailman 3 Question about Optimization (Inline and Pyrex) - NumPy-Discussion

Question about Optimization (Inline and Pyrex)

Simon Berube

17 Apr 2007 17 Apr '07

4:43 p.m.

I recently made the switch from Matlab to Python and am very interested in optimizing certain routines that I find too slow in python/numpy (long loops). I have looked and learned about the different methods used for such problems such as blitz, weave and pyrex but had a question for more experienced developpers. It appears that pyrex is the fastest of the bunch with weave very close behind but at the same time pyrex requires entirely different modules while weave can be inserted almost painlessly into existing code. Is the speed gain and usefulness of pyrex severely limited by the extra maintenance required y having separate "fast" routines from the rest of the code files? I am greatly interested in finding out what more experienced developers feel about these issues given that I may be completely off track and missing on a useful tool(pyrex) thinking weave is better than it actually is and I am quite frankly afraid of writing routines in one format and realizing later that it creates problems that I need to rewrite. I have tried searching for previous similar posts but could not find any. My apologies if this is a repeat or a severly dumb question. Regards, Simon Berube

Show replies by date

Bruce Southey

17 Apr 17 Apr

6:11 p.m.

New subject: Question about Optimization (Inline and Pyrex)

Hi, You can find various suggestions to improve performance like Tim Hochberg's list: """ 0. Think about your algorithm. 1. Vectorize your inner loop. 2. Eliminate temporaries 3. Ask for help 4. Recode in C. 5. Accept that your code will never be fast. Step zero should probably be repeated after every other step ;) """ The first item is very important because loop swapping and factorization can really help. Item 1 is probably very important for your 'long loops' . Also, Numpy may have a suitable function for some of the calculations. Bruce On 4/17/07, Simon Berube wrote:

...

I recently made the switch from Matlab to Python and am very interested in optimizing certain routines that I find too slow in python/numpy (long loops).

I have looked and learned about the different methods used for such problems such as blitz, weave and pyrex but had a question for more experienced developpers.

It appears that pyrex is the fastest of the bunch with weave very close behind but at the same time pyrex requires entirely different modules while weave can be inserted almost painlessly into existing code. Is the speed gain and usefulness of pyrex severely limited by the extra maintenance required y having separate "fast" routines from the rest of the code files?

I am greatly interested in finding out what more experienced developers feel about these issues given that I may be completely off track and missing on a useful tool(pyrex) thinking weave is better than it actually is and I am quite frankly afraid of writing routines in one format and realizing later that it creates problems that I need to rewrite.

I have tried searching for previous similar posts but could not find any. My apologies if this is a repeat or a severly dumb question.

Regards,

Simon Berube

_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Lou Pecora

6:48 p.m.

New subject: Question about Optimization (Inline and Pyrex)

You should probably look over your code and see if you can eliminate loops by using the built in vectorization of NumPy. I've found this can really speed things up. E.g. given element by element multiplication of two n-dimensional arrays x and y replace, z=zeros(n) for i in xrange(n): z[i]=x[i]*y[i] with, z=x*y # NumPy will handle this in a vector fashion Maybe you've already done that, but I thought I'd offer it. --- Simon Berube wrote:

...

I recently made the switch from Matlab to Python and am very interested in optimizing certain routines that I find too slow in python/numpy (long loops).

I have looked and learned about the different methods used for such problems such as blitz, weave and pyrex but had a question for more experienced developpers.

It appears that pyrex is the fastest of the bunch with weave very close behind but at the same time pyrex requires entirely different modules while weave can be inserted almost painlessly into existing code. Is the speed gain and usefulness of pyrex severely limited by the extra maintenance required y having separate "fast" routines from the rest of the code files?

I am greatly interested in finding out what more experienced developers feel about these issues given that I may be completely off track and missing on a useful tool(pyrex) thinking weave is better than it actually is and I am quite frankly afraid of writing routines in one format and realizing later that it creates problems that I need to rewrite.

I have tried searching for previous similar posts but could not find any. My apologies if this is a repeat or a severly dumb question.

Regards,

Simon Berube

_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org

http://projects.scipy.org/mailman/listinfo/numpy-discussion

...

-- Lou Pecora, my views are my own. --------------- "I knew I was going to take the wrong train, so I left early." --Yogi Berra __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Anne Archibald

6:55 p.m.

New subject: Question about Optimization (Inline and Pyrex)

On 17/04/07, Lou Pecora wrote:

...

You should probably look over your code and see if you can eliminate loops by using the built in vectorization of NumPy. I've found this can really speed things up. E.g. given element by element multiplication of two n-dimensional arrays x and y replace,

z=zeros(n) for i in xrange(n): z[i]=x[i]*y[i]

with,

z=x*y # NumPy will handle this in a vector fashion

Maybe you've already done that, but I thought I'd offer it.

It's also worth mentioning that this sort of vectorization may allow you to avoid python's global interpreter lock. Normally, python's multithreading is effectively cooperative, because the interpreter's data structures are all stored under the same lock, so only one thread can be executing python bytecode at a time. However, many of numpy's vectorized functions release the lock while running, so on a multiprocessor or multicore machine you can have several cores at once running vectorized code. Anne M. Archibald

Lou Pecora

7:21 p.m.

New subject: Question about Optimization (Inline and Pyrex)

Now, I didn't know that. That's cool because I have a new dual core Intel Mac Pro. I see I have some learning to do with multithreading. Thanks. --- Anne Archibald wrote:

...

On 17/04/07, Lou Pecora wrote:

...
You should probably look over your code and see if you can eliminate loops by using the built in vectorization of NumPy. I've found this can really speed things up. E.g. given element by element multiplication of two n-dimensional arrays x and y replace,

z=zeros(n) for i in xrange(n): z[i]=x[i]*y[i]

with,

z=x*y # NumPy will handle this in a vector fashion

Maybe you've already done that, but I thought I'd offer it.

It's also worth mentioning that this sort of vectorization may allow you to avoid python's global interpreter lock.

Normally, python's multithreading is effectively cooperative, because the interpreter's data structures are all stored under the same lock, so only one thread can be executing python bytecode at a time. However, many of numpy's vectorized functions release the lock while running, so on a multiprocessor or multicore machine you can have several cores at once running vectorized code.

Anne M. Archibald _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org

http://projects.scipy.org/mailman/listinfo/numpy-discussion

...

Anne Archibald

7:25 p.m.

New subject: Question about Optimization (Inline and Pyrex)

On 17/04/07, Lou Pecora wrote:

...

Now, I didn't know that. That's cool because I have a new dual core Intel Mac Pro. I see I have some learning to do with multithreading. Thanks.

No problem. I had completely forgotten about the global interpreter lock, wrote a little multithreading tool that ran my code in three different threads, and got just about a 2x speedup on a dual-core machine. Then someone reminded me about the GIL and I was puzzled... your results will certainly depend on your code, but I found it useful to have a little parallel-for-loop idiom for all those cases where parallelism is stupidly easy. Anne

Lou Pecora

7:29 p.m.

New subject: Question about Optimization (Inline and Pyrex)

Ii get what you are saying, but I'm not even at the Stupidly Easy Parallel level, yet. Eventually. Thanks. --- Anne Archibald wrote:

...

On 17/04/07, Lou Pecora wrote:

...
Now, I didn't know that. That's cool because I have a new dual core Intel Mac Pro. I see I have some learning to do with multithreading. Thanks.

No problem. I had completely forgotten about the global interpreter lock, wrote a little multithreading tool that ran my code in three different threads, and got just about a 2x speedup on a dual-core machine. Then someone reminded me about the GIL and I was puzzled... your results will certainly depend on your code, but I found it useful to have a little parallel-for-loop idiom for all those cases where parallelism is stupidly easy.

Anne _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org

http://projects.scipy.org/mailman/listinfo/numpy-discussion

...

-- Lou Pecora, my views are my own. --------------- Great spirits have always encountered violent opposition from mediocre minds. -Albert Einstein __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Anne Archibald

7:55 p.m.

New subject: Question about Optimization (Inline and Pyrex)

On 17/04/07, Lou Pecora wrote:

...

I get what you are saying, but I'm not even at the Stupidly Easy Parallel level, yet. Eventually.

Well, it's hardly wonderful, but I wrote a little package to make idioms like: d = {} def work(f): d[f] = sum(exp(2.j*pi*f*times)) foreach(work,freqs,threads=3) work fine. Of course you need to make sure your threads don't accidentally trample all over each other, but otherwise it's an easy way to get a factor-of-two speedup. Anne

Lou Pecora

8:42 p.m.

New subject: Question about Optimization (Inline and Pyrex)

Very nice. Thanks. Examples are welcome since they are usually the best to get up to speed with programming concepts. --- Anne Archibald wrote:

...

On 17/04/07, Lou Pecora wrote:

...
I get what you are saying, but I'm not even at the Stupidly Easy Parallel level, yet. Eventually.

Well, it's hardly wonderful, but I wrote a little package to make idioms like:

[cut] -- Lou Pecora, my views are my own. --------------- Great spirits have always encountered violent opposition from mediocre minds. -Albert Einstein __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Sebastian Haase

18 Apr 18 Apr

4:32 a.m.

New subject: Question about Optimization (Inline and Pyrex)

Hi Anne, I'm just starting to look into your code (sound very interesting - should probably be put onto the wiki) -- quick note: you are mixing tabs and spaces :-( what editor are you using !? -Sebastian On 4/17/07, Anne Archibald wrote:

...

On 17/04/07, Lou Pecora wrote:

...
I get what you are saying, but I'm not even at the Stupidly Easy Parallel level, yet. Eventually.

Well, it's hardly wonderful, but I wrote a little package to make idioms like:

d = {} def work(f): d[f] = sum(exp(2.j*pi*f*times)) foreach(work,freqs,threads=3)

work fine.

Of course you need to make sure your threads don't accidentally trample all over each other, but otherwise it's an easy way to get a factor-of-two speedup.

Anne

_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Anne Archibald

5:39 a.m.

New subject: Question about Optimization (Inline and Pyrex)

On 18/04/07, Sebastian Haase wrote:

...

Hi Anne, I'm just starting to look into your code (sound very interesting - should probably be put onto the wiki) -- quick note: you are mixing tabs and spaces :-( what editor are you using !?

Agh. vim is misbehaving. Sorry about that. I just took another look at that code and added a parallel_map I hadn't got around to writing before, too. I'd be happy to stick it (and test file) on the wiki under some open license or other ("do what thou wilt shall be the whole of the law"?). It's certainly not competition for ipython1, though, it's mostly to show an example of making threads easy to use. Anne

Lou Pecora

2:34 p.m.

New subject: Question about Optimization (Inline and Pyrex)

--- Anne Archibald wrote:

...

I just took another look at that code and added a parallel_map I hadn't got around to writing before, too. I'd be happy to stick it (and test file) on the wiki under some open license or other ("do what thou wilt shall be the whole of the law"?). It's certainly not competition for ipython1, though, it's mostly to show an example of making threads easy to use.

Anne

Please put the parallel map code on the Wiki. I found your first (obvious-parallel) example very helpful. -- Lou Pecora, my views are my own. --------------- Great spirits have always encountered violent opposition from mediocre minds. -Albert Einstein __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

James Turner

17 Apr 17 Apr

7:55 p.m.

New subject: Question about Optimization (Inline, and Pyrex)

Hi Anne, Your reply to Lou raises a naive follow-up question of my own...

...

Normally, python's multithreading is effectively cooperative, because the interpreter's data structures are all stored under the same lock, so only one thread can be executing python bytecode at a time. However, many of numpy's vectorized functions release the lock while running, so on a multiprocessor or multicore machine you can have several cores at once running vectorized code.

Are you saying that numpy's vectorized functions will perform a single array operation in parallel on a multi-processor machine, or just that the user can explicitly write threaded code to run *multiple* array operations on different processors at the same time? I hope that's not too stupid a question, but I haven't done any threaded programming yet and the answer could be rather useful... Thanks a lot, James.

Anne Archibald

8:02 p.m.

New subject: Question about Optimization (Inline, and Pyrex)

On 17/04/07, James Turner wrote:

...

Hi Anne,

Your reply to Lou raises a naive follow-up question of my own...

...
Normally, python's multithreading is effectively cooperative, because the interpreter's data structures are all stored under the same lock, so only one thread can be executing python bytecode at a time. However, many of numpy's vectorized functions release the lock while running, so on a multiprocessor or multicore machine you can have several cores at once running vectorized code.

Are you saying that numpy's vectorized functions will perform a single array operation in parallel on a multi-processor machine, or just that the user can explicitly write threaded code to run *multiple* array operations on different processors at the same time? I hope that's not too stupid a question, but I haven't done any threaded programming yet and the answer could be rather useful...

For the most part, numpy's vectorized functions don't do anything fancy in terms of computations; just giant for loops. What they do do (and not necessarily all of them) is release the GIL so another thread can be doing something else while they do that. That said, some of them (dot for example) use BLAS in certain situations, and then all bets are off. At the least a decent BLAS implementation will be smart about cache behaviour; a fancy BLAS implementation might actually vectorize the operation automatically. That would be using SSE3, though, or some vector processor (Cray?), not likely SMP. Though I can't say for sure. The scipy linear algebra functions use LAPACK, which is more likely to be able to make such speedups (and in fact I'm pretty sure there is an MPI-based LAPACK, though whether it's a plug-in replacement I don't know). Anne

Matthieu Brucher

8:03 p.m.

New subject: Question about Optimization (Inline, and Pyrex)

I would say that if the underlying atlas library is multithreaded, numpy operations will be as well. Then, at the Python level, even if the operations take a lot of time, the interpreter will be able to process threads, as the lock is freed during the numpy operations - as I understood for the last mails, only one thread can access the interpreter at a specific time - Matthieu 2007/4/17, James Turner :

...

Hi Anne,

Your reply to Lou raises a naive follow-up question of my own...

...
Normally, python's multithreading is effectively cooperative, because the interpreter's data structures are all stored under the same lock, so only one thread can be executing python bytecode at a time. However, many of numpy's vectorized functions release the lock while running, so on a multiprocessor or multicore machine you can have several cores at once running vectorized code.

Are you saying that numpy's vectorized functions will perform a single array operation in parallel on a multi-processor machine, or just that the user can explicitly write threaded code to run *multiple* array operations on different processors at the same time? I hope that's not too stupid a question, but I haven't done any threaded programming yet and the answer could be rather useful...

Thanks a lot,

James.

_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Robert Kern

8:43 p.m.

New subject: Question about Optimization (Inline, and Pyrex)

Matthieu Brucher wrote:

...

I would say that if the underlying atlas library is multithreaded, numpy operations will be as well. Then, at the Python level, even if the operations take a lot of time, the interpreter will be able to process threads, as the lock is freed during the numpy operations - as I understood for the last mails, only one thread can access the interpreter at a specific time -

ATLAS doesn't *underlie* much of numpy at all. Just dot() and the functions in linalg, nothing else. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Sebastian Haase

18 Apr 18 Apr

4:41 a.m.

New subject: Question about Optimization (Inline, and Pyrex)

On 4/17/07, Robert Kern wrote:

...

Matthieu Brucher wrote:

...
I would say that if the underlying atlas library is multithreaded, numpy operations will be as well. Then, at the Python level, even if the operations take a lot of time, the interpreter will be able to process threads, as the lock is freed during the numpy operations - as I understood for the last mails, only one thread can access the interpreter at a specific time -

ATLAS doesn't *underlie* much of numpy at all. Just dot() and the functions in linalg, nothing else.

Hi, I don't know much about ATLAS -- would there be other numpy functions that *could* or *should* be implemented using ATLAS !? Any ? -Sebastian

Robert Kern

4:43 a.m.

New subject: Question about Optimization (Inline, and Pyrex)

Sebastian Haase wrote:

...

Hi, I don't know much about ATLAS -- would there be other numpy functions that *could* or *should* be implemented using ATLAS !? Any ?

Not really, no. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Anne Archibald

5:33 a.m.

New subject: Question about Optimization (Inline, and Pyrex)

On 18/04/07, Robert Kern wrote:

...

Sebastian Haase wrote:

...
Hi, I don't know much about ATLAS -- would there be other numpy functions that *could* or *should* be implemented using ATLAS !? Any ?

Not really, no.

ATLAS is a library designed to implement linear algebra functions efficiently on many machines. It does things like reorder the multiplications and additions in matrix multiplication to make the best possible use of your cache, as measured by empirical testing. (FFTW does something similar for the FFT.) But ATLAS is only designed for linear algebra. If what you want to do is linear algebra, look at scipy for a full selection of linear algebra routines that make fairly good use of ATLAS where applicable. It would be perfectly possible, in principle, to implement an ATLAS-like library that handled a variety (perhaps all) of numpy's basic operations in platform-optimized fashion. But implementing ATLAS is not a simple process! And it's not clear how much gain would be available - it would almost certainly be noticeably faster only for very large numpy objects (where the python overhead is unimportant), and those objects can be very inefficient because of excessive copying. And the scope of improvement would be very limited; an expression like A*B+C*D would be much more efficient, probably, if the whole expression were evaluated at once for each element (due to memory locality and temporary allocation). But it is impossible for numpy, sitting inside python as it does, to do that. Anne M. Archibald

David Cournapeau

5:38 a.m.

New subject: Question about Optimization (Inline, and Pyrex)

Anne Archibald wrote:

...

And the scope of improvement would be very limited; an expression like A*B+C*D would be much more efficient, probably, if the whole expression were evaluated at once for each element (due to memory locality and temporary allocation). But it is impossible for numpy, sitting inside python as it does, to do that.

My understanding is that alternative implementations of python such as pypy makes this kind of things possible (at least in theory). I asked a question about this a few months ago: http://www.mail-archive.com/pypy-dev@codespeak.net/msg02243.html David

David M. Cooke

6:33 a.m.

New subject: Question about Optimization (Inline, and Pyrex)

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Anne Archibald wrote:

...

It would be perfectly possible, in principle, to implement an ATLAS-like library that handled a variety (perhaps all) of numpy's basic operations in platform-optimized fashion. But implementing ATLAS is not a simple process! And it's not clear how much gain would be available - it would almost certainly be noticeably faster only for very large numpy objects (where the python overhead is unimportant), and those objects can be very inefficient because of excessive copying. And the scope of improvement would be very limited; an expression like A*B+C*D would be much more efficient, probably, if the whole expression were evaluated at once for each element (due to memory locality and temporary allocation). But it is impossible for numpy, sitting inside python as it does, to do that.

numexpr (in the scipy sandbox) does something like this: it takes an expression like A*B+C*D and constructs a small bytecode program that does that calculation in chunks, minimising temporary variables and number of passes through memory. As it is, the speed is faster than the python expression, and comparable to that of weave. I've been thinking of making a JIT for it, but I haven't had the time :) - -- |>|\/|< /------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGJbvTN9ixZKFWjRQRAi+WAJ9HmeCTeB59Jso5vlVzbgHQ0TDj9ACfdKWy jYEnsRYau8T5BVAKnZJWpLk= =75Jc -----END PGP SIGNATURE-----

Sebastian Haase

4:48 p.m.

New subject: Question about Optimization (Inline, and Pyrex)

On 4/17/07, Anne Archibald wrote:

...

On 18/04/07, Robert Kern wrote:

...
Sebastian Haase wrote:

...
Hi, I don't know much about ATLAS -- would there be other numpy functions that *could* or *should* be implemented using ATLAS !? Any ?

Not really, no.

ATLAS is a library designed to implement linear algebra functions efficiently on many machines. It does things like reorder the multiplications and additions in matrix multiplication to make the best possible use of your cache, as measured by empirical testing.

So, this means that 'matrixmultiply' could / should be using ATLAS for the same reason as 'dot' does - right ? -Seb.

...

(FFTW does something similar for the FFT.) But ATLAS is only designed for linear algebra. If what you want to do is linear algebra, look at scipy for a full selection of linear algebra routines that make fairly good use of ATLAS where applicable.

It would be perfectly possible, in principle, to implement an ATLAS-like library that handled a variety (perhaps all) of numpy's basic operations in platform-optimized fashion. But implementing ATLAS is not a simple process! And it's not clear how much gain would be available - it would almost certainly be noticeably faster only for very large numpy objects (where the python overhead is unimportant), and those objects can be very inefficient because of excessive copying. And the scope of improvement would be very limited; an expression like A*B+C*D would be much more efficient, probably, if the whole expression were evaluated at once for each element (due to memory locality and temporary allocation). But it is impossible for numpy, sitting inside python as it does, to do that.

Robert Kern

5:19 p.m.

New subject: Question about Optimization (Inline, and Pyrex)

Sebastian Haase wrote:

...

On 4/17/07, Anne Archibald wrote:

...
On 18/04/07, Robert Kern wrote:

...
Sebastian Haase wrote:

...
Hi, I don't know much about ATLAS -- would there be other numpy functions that *could* or *should* be implemented using ATLAS !? Any ? Not really, no. ATLAS is a library designed to implement linear algebra functions efficiently on many machines. It does things like reorder the multiplications and additions in matrix multiplication to make the best possible use of your cache, as measured by empirical testing.

So, this means that 'matrixmultiply' could / should be using ATLAS for the same reason as 'dot' does - right ?

matrixmultiply() is just a long-deprecated alias to dot(). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Sebastian Haase

5:40 p.m.

New subject: Question about Optimization (Inline, and Pyrex)

On 4/18/07, Robert Kern wrote:

...

Sebastian Haase wrote:

...
On 4/17/07, Anne Archibald wrote:

...
On 18/04/07, Robert Kern wrote:

...
Sebastian Haase wrote:

...
Hi, I don't know much about ATLAS -- would there be other numpy functions that *could* or *should* be implemented using ATLAS !? Any ? Not really, no. ATLAS is a library designed to implement linear algebra functions efficiently on many machines. It does things like reorder the multiplications and additions in matrix multiplication to make the best possible use of your cache, as measured by empirical testing.

So, this means that 'matrixmultiply' could / should be using ATLAS for the same reason as 'dot' does - right ?

matrixmultiply() is just a long-deprecated alias to dot(). Of course - I should have turn my brain on before hitting 'send'.... Does ATLAS/BLAS do anything special for element wise multiplication and alike - if for example the data is not aligned or not contiguous?

-Seb.

Robert Kern

6:59 p.m.

New subject: Question about Optimization (Inline, and Pyrex)

Sebastian Haase wrote:

...

Does ATLAS/BLAS do anything special for element wise multiplication and alike - if for example the data is not aligned or not contiguous?

Nothing that ATLAS optimizes, no. They focus (rightly) on the more complicated matrix operations (BLAS Level 3, if you are familiar with the BLAS levels). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Francesc Altet

17 Apr 17 Apr

7 p.m.

New subject: Question about Optimization (Inline and Pyrex)

El dt 17 de 04 del 2007 a les 16:43 +0000, en/na Simon Berube va escriure:

...

I recently made the switch from Matlab to Python and am very interested in optimizing certain routines that I find too slow in python/numpy (long loops).

I have looked and learned about the different methods used for such problems such as blitz, weave and pyrex but had a question for more experienced developpers.

It appears that pyrex is the fastest of the bunch with weave very close behind but at the same time pyrex requires entirely different modules while weave can be inserted almost painlessly into existing code. Is the speed gain and usefulness of pyrex severely limited by the extra maintenance required y having separate "fast" routines from the rest of the code files?

I am greatly interested in finding out what more experienced developers feel about these issues given that I may be completely off track and missing on a useful tool(pyrex) thinking weave is better than it actually is and I am quite frankly afraid of writing routines in one format and realizing later that it creates problems that I need to rewrite.

I have tried searching for previous similar posts but could not find any. My apologies if this is a repeat or a severly dumb question.

Well, this is a delicate question. Let me put something clear before. Pyrex code *might* be fast (as fast as C code can be in fact) if you write good code, which is not an easy thing in most of situations mainly because this does require mastery of not only the Pyrex language (which, due to its similarity to Python, is relatively simple to learn), but also (and specially) the internals of how your machine architecture works (CPU bottlenecks, memory bottlenecks...). When you compare weave (or whatever) against Pyrex in numerical computations, you should have in mind other features as well and specially the easy of use and the convenience to access the elements of your numerical objects. I'm not a weave user, but I know that it allows merging the weave code in your Python code and besides allows multidimensional indexing (Pyrex don't). So, generally speaking, weave is more high level (but still, fast!) than C, Fortran or Pyrex for doing this kind of computations and depending on your needs, these factors (and not only speed) can be worth considering. Having said that, if you need to get all the performance that you platform can offer to you, then Pyrex is an excellent option in that it allows getting the maximum performance (if well coded, of course) from the inside of the language. In addition, as it is heavily based on Python syntax, it allows object oriented programming and excellent interaction with Python code. These aforementioned factors are normally very important ones when you have to develop relatively large modules with high efficiency in mind. However, it must be stressed that Pyrex *doesn't* allow to access multidimensional data in a convenient way (you need to compute the indices your own for accessing the flat data array in memory). It is true that this should'nt be a handicap for undimensional or two-dimensional data, but it can be a pain if most of your code has to deal with highly multidimensional objects. Finally, don't let benchmarks fool you. If you can, it is always better to run your own benchmarks made of your own problems. A tool that can be killer for one application can be just mediocre for another (that's somewhat extreme, but I hope you get the point). Hope that helps, -- Francesc Altet | Be careful about using the following code -- Carabos Coop. V. | I've only proven that it works, www.carabos.com | I haven't tested it. -- Donald Knuth

Anne Archibald

7:23 p.m.

New subject: Question about Optimization (Inline and Pyrex)

On 17/04/07, Francesc Altet wrote:

...

Finally, don't let benchmarks fool you. If you can, it is always better to run your own benchmarks made of your own problems. A tool that can be killer for one application can be just mediocre for another (that's somewhat extreme, but I hope you get the point).

And, also important, don't forget that the time you usually care about is the time until you obtain a correct solution of your problem - which includes the time to write the code and the time to debug the code. I find that it's extremely rare that the extra time it takes to write highly-optimized code is worth the time it saves to run. Anne M. Archibald

6217

Age (days ago)

6218

Last active (days ago)

List overview

Download

26 comments

11 participants

participants (11)

Anne Archibald
Bruce Southey
David Cournapeau
David M. Cooke
Francesc Altet
James Turner
Lou Pecora
Matthieu Brucher
Robert Kern
Sebastian Haase
Simon Berube

Question about Optimization (Inline and Pyrex)

Simon Berube

Bruce Southey

Lou Pecora

Lou Pecora

Lou Pecora

Lou Pecora

Sebastian Haase

Lou Pecora

Sebastian Haase

David Cournapeau

David M. Cooke

Sebastian Haase

Sebastian Haase

Francesc Altet

tags

participants (11)