Question about Optimization (Inline and Pyrex)
I recently made the switch from Matlab to Python and am very interested in optimizing certain routines that I find too slow in python/numpy (long loops). I have looked and learned about the different methods used for such problems such as blitz, weave and pyrex but had a question for more experienced developpers. It appears that pyrex is the fastest of the bunch with weave very close behind but at the same time pyrex requires entirely different modules while weave can be inserted almost painlessly into existing code. Is the speed gain and usefulness of pyrex severely limited by the extra maintenance required y having separate "fast" routines from the rest of the code files? I am greatly interested in finding out what more experienced developers feel about these issues given that I may be completely off track and missing on a useful tool(pyrex) thinking weave is better than it actually is and I am quite frankly afraid of writing routines in one format and realizing later that it creates problems that I need to rewrite. I have tried searching for previous similar posts but could not find any. My apologies if this is a repeat or a severly dumb question. Regards, Simon Berube
Hi,
You can find various suggestions to improve performance like Tim
Hochberg's list:
"""
0. Think about your algorithm.
1. Vectorize your inner loop.
2. Eliminate temporaries
3. Ask for help
4. Recode in C.
5. Accept that your code will never be fast.
Step zero should probably be repeated after every other step ;)
"""
The first item is very important because loop swapping and
factorization can really help. Item 1 is probably very important for
your 'long loops' . Also, Numpy may have a suitable function for some
of the calculations.
Bruce
On 4/17/07, Simon Berube
I recently made the switch from Matlab to Python and am very interested in optimizing certain routines that I find too slow in python/numpy (long loops).
I have looked and learned about the different methods used for such problems such as blitz, weave and pyrex but had a question for more experienced developpers.
It appears that pyrex is the fastest of the bunch with weave very close behind but at the same time pyrex requires entirely different modules while weave can be inserted almost painlessly into existing code. Is the speed gain and usefulness of pyrex severely limited by the extra maintenance required y having separate "fast" routines from the rest of the code files?
I am greatly interested in finding out what more experienced developers feel about these issues given that I may be completely off track and missing on a useful tool(pyrex) thinking weave is better than it actually is and I am quite frankly afraid of writing routines in one format and realizing later that it creates problems that I need to rewrite.
I have tried searching for previous similar posts but could not find any. My apologies if this is a repeat or a severly dumb question.
Regards,
Simon Berube
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
You should probably look over your code and see if you
can eliminate loops by using the built in
vectorization of NumPy. I've found this can really
speed things up. E.g. given element by element
multiplication of two n-dimensional arrays x and y
replace,
z=zeros(n)
for i in xrange(n):
z[i]=x[i]*y[i]
with,
z=x*y # NumPy will handle this in a vector fashion
Maybe you've already done that, but I thought I'd
offer it.
--- Simon Berube
I recently made the switch from Matlab to Python and am very interested in optimizing certain routines that I find too slow in python/numpy (long loops).
I have looked and learned about the different methods used for such problems such as blitz, weave and pyrex but had a question for more experienced developpers.
It appears that pyrex is the fastest of the bunch with weave very close behind but at the same time pyrex requires entirely different modules while weave can be inserted almost painlessly into existing code. Is the speed gain and usefulness of pyrex severely limited by the extra maintenance required y having separate "fast" routines from the rest of the code files?
I am greatly interested in finding out what more experienced developers feel about these issues given that I may be completely off track and missing on a useful tool(pyrex) thinking weave is better than it actually is and I am quite frankly afraid of writing routines in one format and realizing later that it creates problems that I need to rewrite.
I have tried searching for previous similar posts but could not find any. My apologies if this is a repeat or a severly dumb question.
Regards,
Simon Berube
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- Lou Pecora, my views are my own. --------------- "I knew I was going to take the wrong train, so I left early." --Yogi Berra __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On 17/04/07, Lou Pecora
You should probably look over your code and see if you can eliminate loops by using the built in vectorization of NumPy. I've found this can really speed things up. E.g. given element by element multiplication of two n-dimensional arrays x and y replace,
z=zeros(n) for i in xrange(n): z[i]=x[i]*y[i]
with,
z=x*y # NumPy will handle this in a vector fashion
Maybe you've already done that, but I thought I'd offer it.
It's also worth mentioning that this sort of vectorization may allow you to avoid python's global interpreter lock. Normally, python's multithreading is effectively cooperative, because the interpreter's data structures are all stored under the same lock, so only one thread can be executing python bytecode at a time. However, many of numpy's vectorized functions release the lock while running, so on a multiprocessor or multicore machine you can have several cores at once running vectorized code. Anne M. Archibald
Now, I didn't know that. That's cool because I have a
new dual core Intel Mac Pro. I see I have some
learning to do with multithreading. Thanks.
--- Anne Archibald
On 17/04/07, Lou Pecora
wrote: You should probably look over your code and see if you can eliminate loops by using the built in vectorization of NumPy. I've found this can really speed things up. E.g. given element by element multiplication of two n-dimensional arrays x and y replace,
z=zeros(n) for i in xrange(n): z[i]=x[i]*y[i]
with,
z=x*y # NumPy will handle this in a vector fashion
Maybe you've already done that, but I thought I'd offer it.
It's also worth mentioning that this sort of vectorization may allow you to avoid python's global interpreter lock.
Normally, python's multithreading is effectively cooperative, because the interpreter's data structures are all stored under the same lock, so only one thread can be executing python bytecode at a time. However, many of numpy's vectorized functions release the lock while running, so on a multiprocessor or multicore machine you can have several cores at once running vectorized code.
Anne M. Archibald _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- Lou Pecora, my views are my own. --------------- "I knew I was going to take the wrong train, so I left early." --Yogi Berra __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On 17/04/07, Lou Pecora
Now, I didn't know that. That's cool because I have a new dual core Intel Mac Pro. I see I have some learning to do with multithreading. Thanks.
No problem. I had completely forgotten about the global interpreter lock, wrote a little multithreading tool that ran my code in three different threads, and got just about a 2x speedup on a dual-core machine. Then someone reminded me about the GIL and I was puzzled... your results will certainly depend on your code, but I found it useful to have a little parallel-for-loop idiom for all those cases where parallelism is stupidly easy. Anne
Ii get what you are saying, but I'm not even at the
Stupidly Easy Parallel level, yet. Eventually.
Thanks.
--- Anne Archibald
On 17/04/07, Lou Pecora
wrote: Now, I didn't know that. That's cool because I have a new dual core Intel Mac Pro. I see I have some learning to do with multithreading. Thanks.
No problem. I had completely forgotten about the global interpreter lock, wrote a little multithreading tool that ran my code in three different threads, and got just about a 2x speedup on a dual-core machine. Then someone reminded me about the GIL and I was puzzled... your results will certainly depend on your code, but I found it useful to have a little parallel-for-loop idiom for all those cases where parallelism is stupidly easy.
Anne _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- Lou Pecora, my views are my own. --------------- Great spirits have always encountered violent opposition from mediocre minds. -Albert Einstein __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
On 17/04/07, Lou Pecora
I get what you are saying, but I'm not even at the Stupidly Easy Parallel level, yet. Eventually.
Well, it's hardly wonderful, but I wrote a little package to make idioms like: d = {} def work(f): d[f] = sum(exp(2.j*pi*f*times)) foreach(work,freqs,threads=3) work fine. Of course you need to make sure your threads don't accidentally trample all over each other, but otherwise it's an easy way to get a factor-of-two speedup. Anne
Very nice. Thanks. Examples are welcome since they
are usually the best to get up to speed with
programming concepts.
--- Anne Archibald
On 17/04/07, Lou Pecora
wrote: I get what you are saying, but I'm not even at the Stupidly Easy Parallel level, yet. Eventually.
Well, it's hardly wonderful, but I wrote a little package to make idioms like:
[cut] -- Lou Pecora, my views are my own. --------------- Great spirits have always encountered violent opposition from mediocre minds. -Albert Einstein __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Hi Anne,
I'm just starting to look into your code (sound very interesting -
should probably be put onto the wiki)
-- quick note:
you are mixing tabs and spaces :-(
what editor are you using !?
-Sebastian
On 4/17/07, Anne Archibald
On 17/04/07, Lou Pecora
wrote: I get what you are saying, but I'm not even at the Stupidly Easy Parallel level, yet. Eventually.
Well, it's hardly wonderful, but I wrote a little package to make idioms like:
d = {} def work(f): d[f] = sum(exp(2.j*pi*f*times)) foreach(work,freqs,threads=3)
work fine.
Of course you need to make sure your threads don't accidentally trample all over each other, but otherwise it's an easy way to get a factor-of-two speedup.
Anne
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
On 18/04/07, Sebastian Haase
Hi Anne, I'm just starting to look into your code (sound very interesting - should probably be put onto the wiki) -- quick note: you are mixing tabs and spaces :-( what editor are you using !?
Agh. vim is misbehaving. Sorry about that. I just took another look at that code and added a parallel_map I hadn't got around to writing before, too. I'd be happy to stick it (and test file) on the wiki under some open license or other ("do what thou wilt shall be the whole of the law"?). It's certainly not competition for ipython1, though, it's mostly to show an example of making threads easy to use. Anne
--- Anne Archibald
I just took another look at that code and added a parallel_map I hadn't got around to writing before, too. I'd be happy to stick it (and test file) on the wiki under some open license or other ("do what thou wilt shall be the whole of the law"?). It's certainly not competition for ipython1, though, it's mostly to show an example of making threads easy to use.
Anne
Please put the parallel map code on the Wiki. I found your first (obvious-parallel) example very helpful. -- Lou Pecora, my views are my own. --------------- Great spirits have always encountered violent opposition from mediocre minds. -Albert Einstein __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Hi Anne, Your reply to Lou raises a naive follow-up question of my own...
Normally, python's multithreading is effectively cooperative, because the interpreter's data structures are all stored under the same lock, so only one thread can be executing python bytecode at a time. However, many of numpy's vectorized functions release the lock while running, so on a multiprocessor or multicore machine you can have several cores at once running vectorized code.
Are you saying that numpy's vectorized functions will perform a single array operation in parallel on a multi-processor machine, or just that the user can explicitly write threaded code to run *multiple* array operations on different processors at the same time? I hope that's not too stupid a question, but I haven't done any threaded programming yet and the answer could be rather useful... Thanks a lot, James.
On 17/04/07, James Turner
Hi Anne,
Your reply to Lou raises a naive follow-up question of my own...
Normally, python's multithreading is effectively cooperative, because the interpreter's data structures are all stored under the same lock, so only one thread can be executing python bytecode at a time. However, many of numpy's vectorized functions release the lock while running, so on a multiprocessor or multicore machine you can have several cores at once running vectorized code.
Are you saying that numpy's vectorized functions will perform a single array operation in parallel on a multi-processor machine, or just that the user can explicitly write threaded code to run *multiple* array operations on different processors at the same time? I hope that's not too stupid a question, but I haven't done any threaded programming yet and the answer could be rather useful...
For the most part, numpy's vectorized functions don't do anything fancy in terms of computations; just giant for loops. What they do do (and not necessarily all of them) is release the GIL so another thread can be doing something else while they do that. That said, some of them (dot for example) use BLAS in certain situations, and then all bets are off. At the least a decent BLAS implementation will be smart about cache behaviour; a fancy BLAS implementation might actually vectorize the operation automatically. That would be using SSE3, though, or some vector processor (Cray?), not likely SMP. Though I can't say for sure. The scipy linear algebra functions use LAPACK, which is more likely to be able to make such speedups (and in fact I'm pretty sure there is an MPI-based LAPACK, though whether it's a plug-in replacement I don't know). Anne
I would say that if the underlying atlas library is multithreaded, numpy
operations will be as well. Then, at the Python level, even if the
operations take a lot of time, the interpreter will be able to process
threads, as the lock is freed during the numpy operations - as I understood
for the last mails, only one thread can access the interpreter at a specific
time -
Matthieu
2007/4/17, James Turner
Hi Anne,
Your reply to Lou raises a naive follow-up question of my own...
Normally, python's multithreading is effectively cooperative, because the interpreter's data structures are all stored under the same lock, so only one thread can be executing python bytecode at a time. However, many of numpy's vectorized functions release the lock while running, so on a multiprocessor or multicore machine you can have several cores at once running vectorized code.
Are you saying that numpy's vectorized functions will perform a single array operation in parallel on a multi-processor machine, or just that the user can explicitly write threaded code to run *multiple* array operations on different processors at the same time? I hope that's not too stupid a question, but I haven't done any threaded programming yet and the answer could be rather useful...
Thanks a lot,
James.
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Matthieu Brucher wrote:
I would say that if the underlying atlas library is multithreaded, numpy operations will be as well. Then, at the Python level, even if the operations take a lot of time, the interpreter will be able to process threads, as the lock is freed during the numpy operations - as I understood for the last mails, only one thread can access the interpreter at a specific time -
ATLAS doesn't *underlie* much of numpy at all. Just dot() and the functions in linalg, nothing else. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 4/17/07, Robert Kern
Matthieu Brucher wrote:
I would say that if the underlying atlas library is multithreaded, numpy operations will be as well. Then, at the Python level, even if the operations take a lot of time, the interpreter will be able to process threads, as the lock is freed during the numpy operations - as I understood for the last mails, only one thread can access the interpreter at a specific time -
ATLAS doesn't *underlie* much of numpy at all. Just dot() and the functions in linalg, nothing else.
Hi, I don't know much about ATLAS -- would there be other numpy functions that *could* or *should* be implemented using ATLAS !? Any ? -Sebastian
Sebastian Haase wrote:
Hi, I don't know much about ATLAS -- would there be other numpy functions that *could* or *should* be implemented using ATLAS !? Any ?
Not really, no. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 18/04/07, Robert Kern
Sebastian Haase wrote:
Hi, I don't know much about ATLAS -- would there be other numpy functions that *could* or *should* be implemented using ATLAS !? Any ?
Not really, no.
ATLAS is a library designed to implement linear algebra functions efficiently on many machines. It does things like reorder the multiplications and additions in matrix multiplication to make the best possible use of your cache, as measured by empirical testing. (FFTW does something similar for the FFT.) But ATLAS is only designed for linear algebra. If what you want to do is linear algebra, look at scipy for a full selection of linear algebra routines that make fairly good use of ATLAS where applicable. It would be perfectly possible, in principle, to implement an ATLAS-like library that handled a variety (perhaps all) of numpy's basic operations in platform-optimized fashion. But implementing ATLAS is not a simple process! And it's not clear how much gain would be available - it would almost certainly be noticeably faster only for very large numpy objects (where the python overhead is unimportant), and those objects can be very inefficient because of excessive copying. And the scope of improvement would be very limited; an expression like A*B+C*D would be much more efficient, probably, if the whole expression were evaluated at once for each element (due to memory locality and temporary allocation). But it is impossible for numpy, sitting inside python as it does, to do that. Anne M. Archibald
Anne Archibald wrote:
And the scope of improvement would be very limited; an expression like A*B+C*D would be much more efficient, probably, if the whole expression were evaluated at once for each element (due to memory locality and temporary allocation). But it is impossible for numpy, sitting inside python as it does, to do that.
My understanding is that alternative implementations of python such as pypy makes this kind of things possible (at least in theory). I asked a question about this a few months ago: http://www.mail-archive.com/pypy-dev@codespeak.net/msg02243.html David
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Anne Archibald wrote:
It would be perfectly possible, in principle, to implement an ATLAS-like library that handled a variety (perhaps all) of numpy's basic operations in platform-optimized fashion. But implementing ATLAS is not a simple process! And it's not clear how much gain would be available - it would almost certainly be noticeably faster only for very large numpy objects (where the python overhead is unimportant), and those objects can be very inefficient because of excessive copying. And the scope of improvement would be very limited; an expression like A*B+C*D would be much more efficient, probably, if the whole expression were evaluated at once for each element (due to memory locality and temporary allocation). But it is impossible for numpy, sitting inside python as it does, to do that.
numexpr (in the scipy sandbox) does something like this: it takes an expression like A*B+C*D and constructs a small bytecode program that does that calculation in chunks, minimising temporary variables and number of passes through memory. As it is, the speed is faster than the python expression, and comparable to that of weave. I've been thinking of making a JIT for it, but I haven't had the time :) - -- |>|\/|< /------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm@physics.mcmaster.ca -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGJbvTN9ixZKFWjRQRAi+WAJ9HmeCTeB59Jso5vlVzbgHQ0TDj9ACfdKWy jYEnsRYau8T5BVAKnZJWpLk= =75Jc -----END PGP SIGNATURE-----
On 4/17/07, Anne Archibald
On 18/04/07, Robert Kern
wrote: Sebastian Haase wrote:
Hi, I don't know much about ATLAS -- would there be other numpy functions that *could* or *should* be implemented using ATLAS !? Any ?
Not really, no.
ATLAS is a library designed to implement linear algebra functions efficiently on many machines. It does things like reorder the multiplications and additions in matrix multiplication to make the best possible use of your cache, as measured by empirical testing.
So, this means that 'matrixmultiply' could / should be using ATLAS for the same reason as 'dot' does - right ? -Seb.
(FFTW does something similar for the FFT.) But ATLAS is only designed for linear algebra. If what you want to do is linear algebra, look at scipy for a full selection of linear algebra routines that make fairly good use of ATLAS where applicable.
It would be perfectly possible, in principle, to implement an ATLAS-like library that handled a variety (perhaps all) of numpy's basic operations in platform-optimized fashion. But implementing ATLAS is not a simple process! And it's not clear how much gain would be available - it would almost certainly be noticeably faster only for very large numpy objects (where the python overhead is unimportant), and those objects can be very inefficient because of excessive copying. And the scope of improvement would be very limited; an expression like A*B+C*D would be much more efficient, probably, if the whole expression were evaluated at once for each element (due to memory locality and temporary allocation). But it is impossible for numpy, sitting inside python as it does, to do that.
Sebastian Haase wrote:
On 4/17/07, Anne Archibald
wrote: On 18/04/07, Robert Kern
wrote: Sebastian Haase wrote:
Hi, I don't know much about ATLAS -- would there be other numpy functions that *could* or *should* be implemented using ATLAS !? Any ? Not really, no. ATLAS is a library designed to implement linear algebra functions efficiently on many machines. It does things like reorder the multiplications and additions in matrix multiplication to make the best possible use of your cache, as measured by empirical testing.
So, this means that 'matrixmultiply' could / should be using ATLAS for the same reason as 'dot' does - right ?
matrixmultiply() is just a long-deprecated alias to dot(). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 4/18/07, Robert Kern
Sebastian Haase wrote:
On 4/17/07, Anne Archibald
wrote: On 18/04/07, Robert Kern
wrote: Sebastian Haase wrote:
Hi, I don't know much about ATLAS -- would there be other numpy functions that *could* or *should* be implemented using ATLAS !? Any ? Not really, no. ATLAS is a library designed to implement linear algebra functions efficiently on many machines. It does things like reorder the multiplications and additions in matrix multiplication to make the best possible use of your cache, as measured by empirical testing.
So, this means that 'matrixmultiply' could / should be using ATLAS for the same reason as 'dot' does - right ?
matrixmultiply() is just a long-deprecated alias to dot(). Of course - I should have turn my brain on before hitting 'send'.... Does ATLAS/BLAS do anything special for element wise multiplication and alike - if for example the data is not aligned or not contiguous?
-Seb.
Sebastian Haase wrote:
Does ATLAS/BLAS do anything special for element wise multiplication and alike - if for example the data is not aligned or not contiguous?
Nothing that ATLAS optimizes, no. They focus (rightly) on the more complicated matrix operations (BLAS Level 3, if you are familiar with the BLAS levels). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
El dt 17 de 04 del 2007 a les 16:43 +0000, en/na Simon Berube va escriure:
I recently made the switch from Matlab to Python and am very interested in optimizing certain routines that I find too slow in python/numpy (long loops).
I have looked and learned about the different methods used for such problems such as blitz, weave and pyrex but had a question for more experienced developpers.
It appears that pyrex is the fastest of the bunch with weave very close behind but at the same time pyrex requires entirely different modules while weave can be inserted almost painlessly into existing code. Is the speed gain and usefulness of pyrex severely limited by the extra maintenance required y having separate "fast" routines from the rest of the code files?
I am greatly interested in finding out what more experienced developers feel about these issues given that I may be completely off track and missing on a useful tool(pyrex) thinking weave is better than it actually is and I am quite frankly afraid of writing routines in one format and realizing later that it creates problems that I need to rewrite.
I have tried searching for previous similar posts but could not find any. My apologies if this is a repeat or a severly dumb question.
Well, this is a delicate question. Let me put something clear before. Pyrex code *might* be fast (as fast as C code can be in fact) if you write good code, which is not an easy thing in most of situations mainly because this does require mastery of not only the Pyrex language (which, due to its similarity to Python, is relatively simple to learn), but also (and specially) the internals of how your machine architecture works (CPU bottlenecks, memory bottlenecks...). When you compare weave (or whatever) against Pyrex in numerical computations, you should have in mind other features as well and specially the easy of use and the convenience to access the elements of your numerical objects. I'm not a weave user, but I know that it allows merging the weave code in your Python code and besides allows multidimensional indexing (Pyrex don't). So, generally speaking, weave is more high level (but still, fast!) than C, Fortran or Pyrex for doing this kind of computations and depending on your needs, these factors (and not only speed) can be worth considering. Having said that, if you need to get all the performance that you platform can offer to you, then Pyrex is an excellent option in that it allows getting the maximum performance (if well coded, of course) from the inside of the language. In addition, as it is heavily based on Python syntax, it allows object oriented programming and excellent interaction with Python code. These aforementioned factors are normally very important ones when you have to develop relatively large modules with high efficiency in mind. However, it must be stressed that Pyrex *doesn't* allow to access multidimensional data in a convenient way (you need to compute the indices your own for accessing the flat data array in memory). It is true that this should'nt be a handicap for undimensional or two-dimensional data, but it can be a pain if most of your code has to deal with highly multidimensional objects. Finally, don't let benchmarks fool you. If you can, it is always better to run your own benchmarks made of your own problems. A tool that can be killer for one application can be just mediocre for another (that's somewhat extreme, but I hope you get the point). Hope that helps, -- Francesc Altet | Be careful about using the following code -- Carabos Coop. V. | I've only proven that it works, www.carabos.com | I haven't tested it. -- Donald Knuth
On 17/04/07, Francesc Altet
Finally, don't let benchmarks fool you. If you can, it is always better to run your own benchmarks made of your own problems. A tool that can be killer for one application can be just mediocre for another (that's somewhat extreme, but I hope you get the point).
And, also important, don't forget that the time you usually care about is the time until you obtain a correct solution of your problem - which includes the time to write the code and the time to debug the code. I find that it's extremely rare that the extra time it takes to write highly-optimized code is worth the time it saves to run. Anne M. Archibald
participants (11)
-
Anne Archibald
-
Bruce Southey
-
David Cournapeau
-
David M. Cooke
-
Francesc Altet
-
James Turner
-
Lou Pecora
-
Matthieu Brucher
-
Robert Kern
-
Sebastian Haase
-
Simon Berube