python numpy code many times slower than c++
I tried a little experiment, implementing some code in numpy (usually I build modules in c++ to interface to python). Since these operations are all large vectors, I hoped it would be reasonably efficient. The code in question is simple. It is a model of an amplifier, modeled by it's AM/AM and AM/PM characteristics. The function in question is the __call__ operator. The test program plots a spectrum, calling this operator 1024 times each time with a vector of 4096. Any ideas? The code is not too big, so I'll try to attach it.
2009/1/20 Neal Becker
I tried a little experiment, implementing some code in numpy (usually I build modules in c++ to interface to python). Since these operations are all large vectors, I hoped it would be reasonably efficient.
The code in question is simple. It is a model of an amplifier, modeled by it's AM/AM and AM/PM characteristics.
The function in question is the __call__ operator. The test program plots a spectrum, calling this operator 1024 times each time with a vector of 4096.
Any ideas? The code is not too big, so I'll try to attach it.
Any chance you can make it self-contained? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
2009/1/20 Neal Becker
I tried a little experiment, implementing some code in numpy (usually I build modules in c++ to interface to python). Since these operations are all large vectors, I hoped it would be reasonably efficient.
The code in question is simple. It is a model of an amplifier, modeled by it's AM/AM and AM/PM characteristics.
The function in question is the __call__ operator. The test program plots a spectrum, calling this operator 1024 times each time with a vector of 4096.
If you want to find out what lines in that function are taking the most time, you can try my line_profiler module: http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi/line_profiler/ That might give us a better idea in the absence of a self-contained example. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Robert Kern wrote:
2009/1/20 Neal Becker
: I tried a little experiment, implementing some code in numpy (usually I build modules in c++ to interface to python). Since these operations are all large vectors, I hoped it would be reasonably efficient.
The code in question is simple. It is a model of an amplifier, modeled by it's AM/AM and AM/PM characteristics.
The function in question is the __call__ operator. The test program plots a spectrum, calling this operator 1024 times each time with a vector of 4096.
If you want to find out what lines in that function are taking the most time, you can try my line_profiler module:
http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi/line_profiler/
That might give us a better idea in the absence of a self-contained example.
Sounds interesting, I'll give that a try. But, not sure how to use it. If my main script is plot_spectrum.py, and I want to profile the ampl.__call__ function (defined in ampl.py), what do I need to do? I tried running kernprof.py plot_spectrum.py having added @profile decorators in ampl.py, but that didn't work: File "../mod/ampl.py", line 43, in ampl @profile NameError: name 'profile' is not defined
On Tue, Jan 20, 2009 at 20:44, Neal Becker
Robert Kern wrote:
2009/1/20 Neal Becker
: I tried a little experiment, implementing some code in numpy (usually I build modules in c++ to interface to python). Since these operations are all large vectors, I hoped it would be reasonably efficient.
The code in question is simple. It is a model of an amplifier, modeled by it's AM/AM and AM/PM characteristics.
The function in question is the __call__ operator. The test program plots a spectrum, calling this operator 1024 times each time with a vector of 4096.
If you want to find out what lines in that function are taking the most time, you can try my line_profiler module:
http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi/line_profiler/
That might give us a better idea in the absence of a self-contained example.
Sounds interesting, I'll give that a try. But, not sure how to use it.
If my main script is plot_spectrum.py, and I want to profile the ampl.__call__ function (defined in ampl.py), what do I need to do? I tried running kernprof.py plot_spectrum.py having added @profile decorators in ampl.py, but that didn't work: File "../mod/ampl.py", line 43, in ampl @profile NameError: name 'profile' is not defined
kernprof.py --line-by-line plot_spectrum.py -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Robert Kern wrote:
2009/1/20 Neal Becker
: I tried a little experiment, implementing some code in numpy (usually I build modules in c++ to interface to python). Since these operations are all large vectors, I hoped it would be reasonably efficient.
The code in question is simple. It is a model of an amplifier, modeled by it's AM/AM and AM/PM characteristics.
The function in question is the __call__ operator. The test program plots a spectrum, calling this operator 1024 times each time with a vector of 4096.
If you want to find out what lines in that function are taking the most time, you can try my line_profiler module:
http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi/line_profiler/
That might give us a better idea in the absence of a self-contained example.
I see the problem. Thanks for the great profiler! You ought to make this more widely known. It seems the big chunks of time are used in data conversion between numpy and my own vectors classes. Mine are wrappers around boost::ublas. The conversion must be falling back on a very inefficient method since there is no special code to handle numpy vectors. Not sure what is the best solution. It would be _great_ if I could make boost::python objects that export a buffer interface, but I have absolutely no idea how to do this (and so far noone else has volunteered any info on this).
On Tue, Jan 20, 2009 at 20:57, Neal Becker
I see the problem. Thanks for the great profiler! You ought to make this more widely known.
I'll be making a release shortly.
It seems the big chunks of time are used in data conversion between numpy and my own vectors classes. Mine are wrappers around boost::ublas. The conversion must be falling back on a very inefficient method since there is no special code to handle numpy vectors.
Not sure what is the best solution. It would be _great_ if I could make boost::python objects that export a buffer interface, but I have absolutely no idea how to do this (and so far noone else has volunteered any info on this).
Who's not volunteering information, boost::python or us? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Robert Kern wrote:
On Tue, Jan 20, 2009 at 20:57, Neal Becker
wrote: I see the problem. Thanks for the great profiler! You ought to make this more widely known.
I'll be making a release shortly.
It seems the big chunks of time are used in data conversion between numpy and my own vectors classes. Mine are wrappers around boost::ublas. The conversion must be falling back on a very inefficient method since there is no special code to handle numpy vectors.
Not sure what is the best solution. It would be _great_ if I could make boost::python objects that export a buffer interface, but I have absolutely no idea how to do this (and so far noone else has volunteered any info on this).
Who's not volunteering information, boost::python or us?
I've asked on python.c++, the home of boost::python and friends. I've spent a little time on it myself, but I think this job requires great knowledge of python c api as well as the mysteries of boost::python.
On Tue, Jan 20, 2009 at 6:57 PM, Neal Becker
It seems the big chunks of time are used in data conversion between numpy and my own vectors classes. Mine are wrappers around boost::ublas. The conversion must be falling back on a very inefficient method since there is no special code to handle numpy vectors.
Not sure what is the best solution. It would be _great_ if I could make boost::python objects that export a buffer interface, but I have absolutely no idea how to do this (and so far noone else has volunteered any info on this).
I'm not sure if I've understood everything here, but I think that pyublas provides exactly what you need. http://tiker.net/doc/pyublas/
T J wrote:
On Tue, Jan 20, 2009 at 6:57 PM, Neal Becker
wrote: It seems the big chunks of time are used in data conversion between numpy and my own vectors classes. Mine are wrappers around boost::ublas. The conversion must be falling back on a very inefficient method since there is no special code to handle numpy vectors.
Not sure what is the best solution. It would be _great_ if I could make boost::python objects that export a buffer interface, but I have absolutely no idea how to do this (and so far noone else has volunteered any info on this).
I'm not sure if I've understood everything here, but I think that pyublas provides exactly what you need.
It might if I had used this for all of my c++ code, but I have a big library of c++ wrapped code that doesn't use pyublas. Pyublas takes numpy objects from python and allows the use of c++ ublas on it (without conversion). Most of my code doesn't use numpy, it uses plain ublas to represent vectors, and ublas handles storage. I can only interface to/from numpy with conversion. I'm interested in pyublas, but devel seems very quiet for a while.
On 1/21/2009 1:27 PM, Neal Becker wrote:
It might if I had used this for all of my c++ code, but I have a big library of c++ wrapped code that doesn't use pyublas. Pyublas takes numpy objects from python and allows the use of c++ ublas on it (without conversion).
If you can get a pointer (as integer) to your C++ data, and the shape and dtype is known, you may use this (rather unsafe) 'fromaddress' hack: http://www.mail-archive.com/numpy-discussion@scipy.org/msg04974.html import numpy def fromaddress(address, dtype, shape, strides=None): """ Create a numpy array from an integer address, a dtype or dtype string, a shape tuple, and possibly strides. Make sure dtype is a dtype, not just "f" or whatever. """ dtype = numpy.dtype(dtype) class Dummy(object): pass d = Dummy() d.__array_interface__ = dict( data = (address, False), typestr = dtype.str, descr = dtype.descr, shape = shape, strides = strides, version = 3, ) return numpy.asarray(d) Example:
a = numpy.zeros(10) a array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) a.__array_interface__ {'descr': [('', '
Sturla Molden
On 1/21/2009 2:38 PM, Sturla Molden wrote:
If you can get a pointer (as integer) to your C++ data, and the shape and dtype is known, you may use this (rather unsafe) 'fromaddress' hack:
And opposite, if you need to get the address referenced to by an ndarray, you can do this: def addressof(array): return arr.__array_interface__['data'][0] Then you will have to cast this unsigned integer to a pointer type in C++. Note that arr.data returns a buffer. Sturla Molden
Hi Neal, On Wednesday 21 January 2009 07:27:04 Neal Becker wrote:
It might if I had used this for all of my c++ code, but I have a big library of c++ wrapped code that doesn't use pyublas. Pyublas takes numpy objects from python and allows the use of c++ ublas on it (without conversion).
Most of my code doesn't use numpy, it uses plain ublas to represent vectors, and ublas handles storage. I can only interface to/from numpy with conversion.
I pointed out my code to you on c++-sig[1] a while back that solves precisely this problem. You found a bug with memory management that I fixed in the updated code. Does that still not work for you? Regards, Ravi [1] http://mail.python.org/pipermail/cplusplus-sig/2008-October/013825.html
Ravi wrote:
Hi Neal,
On Wednesday 21 January 2009 07:27:04 Neal Becker wrote:
It might if I had used this for all of my c++ code, but I have a big library of c++ wrapped code that doesn't use pyublas. Pyublas takes numpy objects from python and allows the use of c++ ublas on it (without conversion).
Most of my code doesn't use numpy, it uses plain ublas to represent vectors, and ublas handles storage. I can only interface to/from numpy with conversion.
I pointed out my code to you on c++-sig[1] a while back that solves precisely this problem. You found a bug with memory management that I fixed in the updated code. Does that still not work for you?
Regards, Ravi
[1] [http://mail.python.org/pipermail/cplusplus-sig/2008-October/013825.html Thanks for reminding me about this!
Do you have a current version of the code? I grabbed the files from the above message, but I see some additional subsequent messages with more patches.
On Wednesday 21 January 2009 10:22:36 Neal Becker wrote:
[http://mail.python.org/pipermail/cplusplus-sig/2008-October/013825.html
Thanks for reminding me about this!
Do you have a current version of the code? I grabbed the files from the above message, but I see some additional subsequent messages with more patches.
That is the latest publicly posted code. Since then, there is just one minor patch (attached) which enables use of row-major (c-contiguous) arrays. This does *not* work with strided arrays which would be a fair bit of effort to support. Further, you will have to work with the numpy iterator interface, which, while well-designed, is a great illustration of the effort required to support OO programming in an non-OO language, and is pretty tedious to map to the ublas storage iterator interface. If you do implement it, I would very much like to take a look at it. Regards, Ravi
Ravi wrote:
On Wednesday 21 January 2009 10:22:36 Neal Becker wrote:
[http://mail.python.org/pipermail/cplusplus-sig/2008- October/013825.html
Thanks for reminding me about this!
Do you have a current version of the code? I grabbed the files from the above message, but I see some additional subsequent messages with more patches.
That is the latest publicly posted code. Since then, there is just one minor patch (attached) which enables use of row-major (c-contiguous) arrays.
This does *not* work with strided arrays which would be a fair bit of effort to support. Further, you will have to work with the numpy iterator interface, which, while well-designed, is a great illustration of the effort required to support OO programming in an non-OO language, and is pretty tedious to map to the ublas storage iterator interface. If you do implement it, I would very much like to take a look at it.
Regards, Ravi
I'm only interested in simple strided 1-d vectors. In that case, I think your code already works. If you have c++ code using the iterator interface, the iterators dereference will use (*array )[index]. This will use operator[], which will call PyArray_GETPTR. So I think this will obey strides. Unfortunately, it will also be slow. I suggest something like the enclosed. I have done some simple tests, and it seems to work.
On Wednesday 21 January 2009 13:55:49 Neal Becker wrote:
I'm only interested in simple strided 1-d vectors. In that case, I think your code already works. If you have c++ code using the iterator interface, the iterators dereference will use (*array )[index]. This will use operator[], which will call PyArray_GETPTR. So I think this will obey strides.
You are right. I had forgotten that I had simple strided vectors working.
Unfortunately, it will also be slow. I suggest something like the enclosed. I have done some simple tests, and it seems to work.
I wonder why PyArray_GETPTR1 is slow. Is it because of the implied integer multiplication? Unfortunately, your approach means that iterators can become invalid if the underlying array is resized to a larger size. Hmmh, perhaps we could make this configurable at compile-time ... Thanks for the code. Could you provide some benchmarks on the relative speeds of the two approaches? Regards, Ravi
On Wednesday 21 January 2009 13:55:49 Neal Becker wrote:
I'm only interested in simple strided 1-d vectors. In that case, I think your code already works. If you have c++ code using the iterator interface, the iterators dereference will use (*array )[index]. This will use operator[], which will call PyArray_GETPTR. So I think this will obey strides.
You are right. I had forgotten that I had simple strided vectors working.
Unfortunately, it will also be slow. I suggest something like the enclosed. I have done some simple tests, and it seems to work.
I wonder why PyArray_GETPTR1 is slow. Is it because of the implied integer multiplication? Unfortunately, your approach means that iterators can become invalid if the underlying array is resized to a larger size. Hmmh, perhaps we could make this configurable at compile-time ...
Thanks for the code. Could you provide some benchmarks on the relative speeds of the two approaches?
Regards, Ravi Do you know about pyublas? This is the same issue we ran into there. I did not benchmark the code you sent me. I was just going by my experience with
Ravi wrote: pyublas. I guess a benchmark would be a good idea.
Neal Becker wrote:
Ravi wrote:
On Wednesday 21 January 2009 13:55:49 Neal Becker wrote:
I'm only interested in simple strided 1-d vectors. In that case, I think your code already works. If you have c++ code using the iterator interface, the iterators dereference will use (*array )[index]. This will use operator[], which will call PyArray_GETPTR. So I think this will obey strides.
You are right. I had forgotten that I had simple strided vectors working.
Unfortunately, it will also be slow. I suggest something like the enclosed. I have done some simple tests, and it seems to work.
I wonder why PyArray_GETPTR1 is slow. Is it because of the implied integer multiplication? Unfortunately, your approach means that iterators can become invalid if the underlying array is resized to a larger size. Hmmh, perhaps we could make this configurable at compile-time ...
Iterators almost always become invalid under those sorts of changes, so I
don't think that's a surprise.
GETPTR1 has to do:
PyArray_STRIDES(obj)[0]
There's several memory references there, and I don't think the compiler can
assume that this value doesn't change from one access to another - so it
can't be cached.
That said, I have tried a few benchmarks. Surprisingly, I'm not seeing any
difference in a few quick tests.
I do have one cosmetic patch for you. This will shutup gcc giving the
longest warning message ever about an unused variable:
--- numpy.new.orig/numpyregister.hpp 2009-01-21 15:59:00.000000000 -0500
+++ numpy.new/numpyregister.hpp 2009-01-21 14:11:00.000000000 -0500
@@ -257,7 +257,8 @@
storage_t *the_storage = reinterpret_cast
Ravi wrote:
On Wednesday 21 January 2009 10:22:36 Neal Becker wrote:
[http://mail.python.org/pipermail/cplusplus-sig/2008- October/013825.html
Thanks for reminding me about this!
Do you have a current version of the code? I grabbed the files from the above message, but I see some additional subsequent messages with more patches.
That is the latest publicly posted code. Since then, there is just one minor patch (attached) which enables use of row-major (c-contiguous) arrays.
This does *not* work with strided arrays which would be a fair bit of effort to support. Further, you will have to work with the numpy iterator interface, which, while well-designed, is a great illustration of the effort required to support OO programming in an non-OO language, and is pretty tedious to map to the ublas storage iterator interface. If you do implement it, I would very much like to take a look at it.
Regards, Ravi
It seems your code works fine for my usual style:
ublas::vector<T> func (numpy::array_from_py<T>::type const&)
But not for a function that modifies it arg in-place (& instead of const&):
void func (numpy::array_from_py<T>::type &)
This gives:
ArgumentError: Python argument types in
test1.double(numpy.ndarray)
did not match C++ signature:
double(boost::numeric::ublas::vector
On Wednesday 21 January 2009 14:57:59 Neal Becker wrote:
ublas::vector<T> func (numpy::array_from_py<T>::type const&)
But not for a function that modifies it arg in-place (& instead of const&):
void func (numpy::array_from_py<T>::type &) ^^^^ Use void func (numpy::array_from_py<T>::type )
Why does this work? It is a tradeoff I had to make; I chose to use python conventions rather than C++ conventions. Essentially, what is passed back to you is a reference to the numpy array. Any copies you make of it are actually copies of the reference, not of the actual array. This simplifies the code quite a bit while maintaining the reference semantics that python programmers use. See dump_vec in decco.cc (the example module) for an example. Regards, Ravi
Ravi wrote:
Hi Neal,
On Wednesday 21 January 2009 07:27:04 Neal Becker wrote:
It might if I had used this for all of my c++ code, but I have a big library of c++ wrapped code that doesn't use pyublas. Pyublas takes numpy objects from python and allows the use of c++ ublas on it (without conversion).
Most of my code doesn't use numpy, it uses plain ublas to represent vectors, and ublas handles storage. I can only interface to/from numpy with conversion.
I pointed out my code to you on c++-sig[1] a while back that solves precisely this problem. You found a bug with memory management that I fixed in the updated code. Does that still not work for you?
Regards, Ravi
[1] [http://mail.python.org/pipermail/cplusplus-sig/2008-October/013825.html
Do you know if this code will work with strided vectors? If I pass a slice: u = array (...) F (u[::2]) What happens?
On Tue, Jan 20, 2009 at 20:57, Neal Becker
I see the problem. Thanks for the great profiler! You ought to make this more widely known.
http://pypi.python.org/pypi/line_profiler -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Robert-- this is a great little piece of code, I already think it will be a
part of my workflow. However, I seem to be getting negative % time taken on
the more time consuming lines, perhaps getting some overflow?
Thanks a lot,
Wes
On Wed, Jan 21, 2009 at 3:23 AM, Robert Kern
On Tue, Jan 20, 2009 at 20:57, Neal Becker
wrote: I see the problem. Thanks for the great profiler! You ought to make this more widely known.
http://pypi.python.org/pypi/line_profiler
-- Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Jan 21, 2009 at 12:13, Wes McKinney
Robert-- this is a great little piece of code, I already think it will be a part of my workflow. However, I seem to be getting negative % time taken on the more time consuming lines, perhaps getting some overflow?
That's odd. Can you send me the code (perhaps offlist) or at least the .lprof file? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
I have been using your profiler extensively and it has contributed to my
achieving significant improvements in the application I work on largely due
to the usefulness of the line by line breakdown enabling me to easily select
the next part of code to work on optimizing. So firstly many thanks for
writing it.
However back to my point, Wes, I have also experienced timing oddities, in
particular on Virtual machines (MS Hyper-V has very poor processor timings,
the older MS VM works fine though). I believe the negative timings arise
when the CPU (be it virtual or possibly physical) deviates from its standard
performance or rather the initial timer unit taken, would this make sense to
you Robert?
Hanni
2009/1/21 Robert Kern
On Wed, Jan 21, 2009 at 12:13, Wes McKinney
wrote: Robert-- this is a great little piece of code, I already think it will be a part of my workflow. However, I seem to be getting negative % time taken on the more time consuming lines, perhaps getting some overflow?
That's odd. Can you send me the code (perhaps offlist) or at least the .lprof file?
-- Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Jan 22, 2009 at 01:46, Hanni Ali
I have been using your profiler extensively and it has contributed to my achieving significant improvements in the application I work on largely due to the usefulness of the line by line breakdown enabling me to easily select the next part of code to work on optimizing. So firstly many thanks for writing it.
My pleasure.
However back to my point, Wes, I have also experienced timing oddities, in particular on Virtual machines (MS Hyper-V has very poor processor timings, the older MS VM works fine though). I believe the negative timings arise when the CPU (be it virtual or possibly physical) deviates from its standard performance or rather the initial timer unit taken, would this make sense to you Robert?
Can you try using cProfile with lots of calls to empty functions? I'm using the same timer functions as cProfile. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
import cProfile def f(): pass def g(): for i in xrange(1000000): f() cProfile.run("g()")
test.py 1000003 function calls in 1.225 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 1.225 1.225 <string>:1(<module>)
1000000 0.464 0.000 0.464 0.000 test.py:3(f)
1 0.761 0.761 1.225 1.225 test.py:6(g)
1 0.000 0.000 0.000 0.000 {method 'disable' of
'_lsprof.Profiler' objects}
Running this with line_profiler:
Timer unit: 2.9485e-010 s
File: test.py
Function: g at line 9
Total time: 0.855075 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
9 @profiler
10 def g():
11 1000001 1844697930 1844.7 63.6 for i in
xrange(1000000):
12 1000000 1055333053 1055.3 36.4 f()
Which is what I would expect. Hmm
On Thu, Jan 22, 2009 at 2:52 AM, Robert Kern
On Thu, Jan 22, 2009 at 01:46, Hanni Ali
wrote: I have been using your profiler extensively and it has contributed to my achieving significant improvements in the application I work on largely due to the usefulness of the line by line breakdown enabling me to easily select the next part of code to work on optimizing. So firstly many thanks for writing it.
My pleasure.
However back to my point, Wes, I have also experienced timing oddities, in particular on Virtual machines (MS Hyper-V has very poor processor timings, the older MS VM works fine though). I believe the negative timings arise when the CPU (be it virtual or possibly physical) deviates from its standard performance or rather the initial timer unit taken, would this make sense to you Robert?
Can you try using cProfile with lots of calls to empty functions? I'm using the same timer functions as cProfile.
-- Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Jan 22, 2009 at 17:00, Wes McKinney
import cProfile
def f(): pass
def g(): for i in xrange(1000000): f()
cProfile.run("g()")
test.py 1000003 function calls in 1.225 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 1.225 1.225 <string>:1(<module>) 1000000 0.464 0.000 0.464 0.000 test.py:3(f) 1 0.761 0.761 1.225 1.225 test.py:6(g) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Running this with line_profiler:
Timer unit: 2.9485e-010 s
File: test.py Function: g at line 9 Total time: 0.855075 s
Line # Hits Time Per Hit % Time Line Contents ============================================================== 9 @profiler 10 def g(): 11 1000001 1844697930 1844.7 63.6 for i in xrange(1000000): 12 1000000 1055333053 1055.3 36.4 f()
Which is what I would expect. Hmm
What platform are you on? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Windows XP, Pentium D, Python 2.5.2
On Thu, Jan 22, 2009 at 6:03 PM, Robert Kern
On Thu, Jan 22, 2009 at 17:00, Wes McKinney
wrote: import cProfile
def f(): pass
def g(): for i in xrange(1000000): f()
cProfile.run("g()")
test.py 1000003 function calls in 1.225 CPU seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 1.225 1.225 <string>:1(<module>) 1000000 0.464 0.000 0.464 0.000 test.py:3(f) 1 0.761 0.761 1.225 1.225 test.py:6(g) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Running this with line_profiler:
Timer unit: 2.9485e-010 s
File: test.py Function: g at line 9 Total time: 0.855075 s
Line # Hits Time Per Hit % Time Line Contents ============================================================== 9 @profiler 10 def g(): 11 1000001 1844697930 1844.7 63.6 for i in xrange(1000000): 12 1000000 1055333053 1055.3 36.4 f()
Which is what I would expect. Hmm
What platform are you on?
-- Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Jan 22, 2009 at 17:09, Wes McKinney
Windows XP, Pentium D, Python 2.5.2
I can replicate the negative numbers on my Windows VM. I'll take a look at it. Wrote profile results to foo.py.lprof Timer unit: 4.17601e-010 s File: foo.py Function: f at line 1 Total time: -3.02963 s Line # Hits Time Per Hit % Time Line Contents ============================================================== 1 @profile 2 def f(): 3 1000001 -1456737621 -1456.7 20.1 for i in xrange(1000000): 4 1000000 -1540435131 -1540.4 21.2 1+1 5 1000000 -1522306067 -1522.3 21.0 1+1 6 1000000 -1177199444 -1177.2 16.2 1+1 7 1000000 -1558164209 -1558.2 21.5 1+1 -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Neal Becker wrote:
I tried a little experiment, implementing some code in numpy
It sounds like you've found your core issue, but a couple comments:
from numpy import *
I'm convinced that "import *" is a bad idea. I think the "standard" syntax is now "import numpy as np"
from math import pi
numpy already has pi -- I find I never need math, if I'm using numpy. def db_to_volt (db): return 10**(0.05*db) ... class ampl (object): ... ampl_interp = linear_interp (vectorize (db_to_volt) (pin), db_to_volt (pout)) you shouldn't need to use vectorize here -- db_to_volt already takes array input. vectorize could kill performance, in fact. ampl_interp = linear_interp(db_to_volt(pin), db_to_volt(pout)) should work fine. also, if you want maximum performance, you can eliminate extraneous array creation in functions like that by: 1) using numexpr (see recent posts about it) 2) writing uglier code that explicitly passes in the output arrays: def db_to_volt (db): a = 0.05*db np.power(10, a, a) This will only help for large arrays, and help more for more complex functions. A minor style nit: I found it remarkably hard to read your code because of the spaces before the open parens for function calls: func (arg1, arg2) It's not just me: PEP 8 makes it very clear: """ Whitespace in Expressions and Statements Pet Peeves Avoid extraneous whitespace in the following situations: - Immediately before the open parenthesis that starts the argument list of a function call: Yes: spam(1) No: spam (1) """ http://www.python.org/dev/peps/pep-0008/ I image you've used that style for years for lots of code, but I couldn't help myself! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
participants (8)
-
Christopher Barker
-
Hanni Ali
-
Neal Becker
-
Ravi
-
Robert Kern
-
Sturla Molden
-
T J
-
Wes McKinney