Mailman 3 python numpy code many times slower than c++ - NumPy-Discussion

python numpy code many times slower than c++

Neal Becker

21 Jan 2009 21 Jan '09

7:39 a.m.

I tried a little experiment, implementing some code in numpy (usually I build modules in c++ to interface to python). Since these operations are all large vectors, I hoped it would be reasonably efficient. The code in question is simple. It is a model of an amplifier, modeled by it's AM/AM and AM/PM characteristics. The function in question is the __call__ operator. The test program plots a spectrum, calling this operator 1024 times each time with a vector of 4096. Any ideas? The code is not too big, so I'll try to attach it.

Attachments:

plot_spectrum.py (text/x-python — 4.5 KB)
linear_interp.py (text/x-python — 851 bytes)
ampl.py (text/x-python — 2.9 KB)

Show replies by date

Robert Kern

21 Jan 21 Jan

7:46 a.m.

2009/1/20 Neal Becker :

...

I tried a little experiment, implementing some code in numpy (usually I build modules in c++ to interface to python). Since these operations are all large vectors, I hoped it would be reasonably efficient.

The code in question is simple. It is a model of an amplifier, modeled by it's AM/AM and AM/PM characteristics.

The function in question is the __call__ operator. The test program plots a spectrum, calling this operator 1024 times each time with a vector of 4096.

Any ideas? The code is not too big, so I'll try to attach it.

Any chance you can make it self-contained? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Robert Kern

7:52 a.m.

2009/1/20 Neal Becker :

...

I tried a little experiment, implementing some code in numpy (usually I build modules in c++ to interface to python). Since these operations are all large vectors, I hoped it would be reasonably efficient.

The code in question is simple. It is a model of an amplifier, modeled by it's AM/AM and AM/PM characteristics.

The function in question is the __call__ operator. The test program plots a spectrum, calling this operator 1024 times each time with a vector of 4096.

If you want to find out what lines in that function are taking the most time, you can try my line_profiler module: http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi/line_profiler/ That might give us a better idea in the absence of a self-contained example. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Neal Becker

8:14 a.m.

Robert Kern wrote:

...

2009/1/20 Neal Becker :

...
I tried a little experiment, implementing some code in numpy (usually I build modules in c++ to interface to python). Since these operations are all large vectors, I hoped it would be reasonably efficient.

The code in question is simple. It is a model of an amplifier, modeled by it's AM/AM and AM/PM characteristics.

The function in question is the __call__ operator. The test program plots a spectrum, calling this operator 1024 times each time with a vector of 4096.

If you want to find out what lines in that function are taking the most time, you can try my line_profiler module:

http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi/line_profiler/

That might give us a better idea in the absence of a self-contained example.

Sounds interesting, I'll give that a try. But, not sure how to use it. If my main script is plot_spectrum.py, and I want to profile the ampl.__call__ function (defined in ampl.py), what do I need to do? I tried running kernprof.py plot_spectrum.py having added @profile decorators in ampl.py, but that didn't work: File "../mod/ampl.py", line 43, in ampl @profile NameError: name 'profile' is not defined

Robert Kern

8:22 a.m.

On Tue, Jan 20, 2009 at 20:44, Neal Becker wrote:

...

Robert Kern wrote:

...
2009/1/20 Neal Becker :

...
I tried a little experiment, implementing some code in numpy (usually I build modules in c++ to interface to python). Since these operations are all large vectors, I hoped it would be reasonably efficient.

The code in question is simple. It is a model of an amplifier, modeled by it's AM/AM and AM/PM characteristics.

The function in question is the __call__ operator. The test program plots a spectrum, calling this operator 1024 times each time with a vector of 4096.

If you want to find out what lines in that function are taking the most time, you can try my line_profiler module:

http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi/line_profiler/

That might give us a better idea in the absence of a self-contained example.

Sounds interesting, I'll give that a try. But, not sure how to use it.

If my main script is plot_spectrum.py, and I want to profile the ampl.__call__ function (defined in ampl.py), what do I need to do? I tried running kernprof.py plot_spectrum.py having added @profile decorators in ampl.py, but that didn't work: File "../mod/ampl.py", line 43, in ampl @profile NameError: name 'profile' is not defined

kernprof.py --line-by-line plot_spectrum.py -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Neal Becker

8:27 a.m.

Robert Kern wrote:

...

2009/1/20 Neal Becker :

...
I tried a little experiment, implementing some code in numpy (usually I build modules in c++ to interface to python). Since these operations are all large vectors, I hoped it would be reasonably efficient.

The code in question is simple. It is a model of an amplifier, modeled by it's AM/AM and AM/PM characteristics.

The function in question is the __call__ operator. The test program plots a spectrum, calling this operator 1024 times each time with a vector of 4096.

If you want to find out what lines in that function are taking the most time, you can try my line_profiler module:

http://www.enthought.com/~rkern/cgi-bin/hgwebdir.cgi/line_profiler/

That might give us a better idea in the absence of a self-contained example.

I see the problem. Thanks for the great profiler! You ought to make this more widely known. It seems the big chunks of time are used in data conversion between numpy and my own vectors classes. Mine are wrappers around boost::ublas. The conversion must be falling back on a very inefficient method since there is no special code to handle numpy vectors. Not sure what is the best solution. It would be _great_ if I could make boost::python objects that export a buffer interface, but I have absolutely no idea how to do this (and so far noone else has volunteered any info on this).

Robert Kern

8:30 a.m.

On Tue, Jan 20, 2009 at 20:57, Neal Becker wrote:

...

I see the problem. Thanks for the great profiler! You ought to make this more widely known.

I'll be making a release shortly.

...

It seems the big chunks of time are used in data conversion between numpy and my own vectors classes. Mine are wrappers around boost::ublas. The conversion must be falling back on a very inefficient method since there is no special code to handle numpy vectors.

Not sure what is the best solution. It would be _great_ if I could make boost::python objects that export a buffer interface, but I have absolutely no idea how to do this (and so far noone else has volunteered any info on this).

Who's not volunteering information, boost::python or us? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Neal Becker

5:58 p.m.

Robert Kern wrote:

...

On Tue, Jan 20, 2009 at 20:57, Neal Becker wrote:

...
I see the problem. Thanks for the great profiler! You ought to make this more widely known.

I'll be making a release shortly.

...
It seems the big chunks of time are used in data conversion between numpy and my own vectors classes. Mine are wrappers around boost::ublas. The conversion must be falling back on a very inefficient method since there is no special code to handle numpy vectors.

Not sure what is the best solution. It would be _great_ if I could make boost::python objects that export a buffer interface, but I have absolutely no idea how to do this (and so far noone else has volunteered any info on this).

Who's not volunteering information, boost::python or us?

I've asked on python.c++, the home of boost::python and friends. I've spent a little time on it myself, but I think this job requires great knowledge of python c api as well as the mysteries of boost::python.

T J

12:55 p.m.

On Tue, Jan 20, 2009 at 6:57 PM, Neal Becker wrote:

...

It seems the big chunks of time are used in data conversion between numpy and my own vectors classes. Mine are wrappers around boost::ublas. The conversion must be falling back on a very inefficient method since there is no special code to handle numpy vectors.

Not sure what is the best solution. It would be _great_ if I could make boost::python objects that export a buffer interface, but I have absolutely no idea how to do this (and so far noone else has volunteered any info on this).

I'm not sure if I've understood everything here, but I think that pyublas provides exactly what you need. http://tiker.net/doc/pyublas/

Neal Becker

5:57 p.m.

T J wrote:

...

On Tue, Jan 20, 2009 at 6:57 PM, Neal Becker wrote:

...
It seems the big chunks of time are used in data conversion between numpy and my own vectors classes. Mine are wrappers around boost::ublas. The conversion must be falling back on a very inefficient method since there is no special code to handle numpy vectors.

Not sure what is the best solution. It would be _great_ if I could make boost::python objects that export a buffer interface, but I have absolutely no idea how to do this (and so far noone else has volunteered any info on this).

I'm not sure if I've understood everything here, but I think that pyublas provides exactly what you need.

http://tiker.net/doc/pyublas/

It might if I had used this for all of my c++ code, but I have a big library of c++ wrapped code that doesn't use pyublas. Pyublas takes numpy objects from python and allows the use of c++ ublas on it (without conversion). Most of my code doesn't use numpy, it uses plain ublas to represent vectors, and ublas handles storage. I can only interface to/from numpy with conversion. I'm interested in pyublas, but devel seems very quiet for a while.

Sturla Molden

7:08 p.m.

On 1/21/2009 1:27 PM, Neal Becker wrote:

...

It might if I had used this for all of my c++ code, but I have a big library of c++ wrapped code that doesn't use pyublas. Pyublas takes numpy objects from python and allows the use of c++ ublas on it (without conversion).

If you can get a pointer (as integer) to your C++ data, and the shape and dtype is known, you may use this (rather unsafe) 'fromaddress' hack: http://www.mail-archive.com/numpy-discussion@scipy.org/msg04974.html import numpy def fromaddress(address, dtype, shape, strides=None): """ Create a numpy array from an integer address, a dtype or dtype string, a shape tuple, and possibly strides. Make sure dtype is a dtype, not just "f" or whatever. """ dtype = numpy.dtype(dtype) class Dummy(object): pass d = Dummy() d.__array_interface__ = dict( data = (address, False), typestr = dtype.str, descr = dtype.descr, shape = shape, strides = strides, version = 3, ) return numpy.asarray(d) Example:

...

...
...
a = numpy.zeros(10) a array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) a.__array_interface__ {'descr': [('', '

Sturla Molden

7:15 p.m.

On 1/21/2009 2:38 PM, Sturla Molden wrote:

...

If you can get a pointer (as integer) to your C++ data, and the shape and dtype is known, you may use this (rather unsafe) 'fromaddress' hack:

And opposite, if you need to get the address referenced to by an ndarray, you can do this: def addressof(array): return arr.__array_interface__['data'][0] Then you will have to cast this unsigned integer to a pointer type in C++. Note that arr.data returns a buffer. Sturla Molden

Ravi

8:07 p.m.

Hi Neal, On Wednesday 21 January 2009 07:27:04 Neal Becker wrote:

...

It might if I had used this for all of my c++ code, but I have a big library of c++ wrapped code that doesn't use pyublas. Pyublas takes numpy objects from python and allows the use of c++ ublas on it (without conversion).

Most of my code doesn't use numpy, it uses plain ublas to represent vectors, and ublas handles storage. I can only interface to/from numpy with conversion.

I pointed out my code to you on c++-sig[1] a while back that solves precisely this problem. You found a bug with memory management that I fixed in the updated code. Does that still not work for you? Regards, Ravi [1] http://mail.python.org/pipermail/cplusplus-sig/2008-October/013825.html

Neal Becker

8:52 p.m.

Ravi wrote:

...

Hi Neal,

On Wednesday 21 January 2009 07:27:04 Neal Becker wrote:

...
It might if I had used this for all of my c++ code, but I have a big library of c++ wrapped code that doesn't use pyublas. Pyublas takes numpy objects from python and allows the use of c++ ublas on it (without conversion).

Most of my code doesn't use numpy, it uses plain ublas to represent vectors, and ublas handles storage. I can only interface to/from numpy with conversion.

I pointed out my code to you on c++-sig[1] a while back that solves precisely this problem. You found a bug with memory management that I fixed in the updated code. Does that still not work for you?

Regards, Ravi

[1] [http://mail.python.org/pipermail/cplusplus-sig/2008-October/013825.html Thanks for reminding me about this!

Do you have a current version of the code? I grabbed the files from the above message, but I see some additional subsequent messages with more patches.

Ravi

10 p.m.

On Wednesday 21 January 2009 10:22:36 Neal Becker wrote:

...

...
[http://mail.python.org/pipermail/cplusplus-sig/2008-October/013825.html

Thanks for reminding me about this!

Do you have a current version of the code? I grabbed the files from the above message, but I see some additional subsequent messages with more patches.

That is the latest publicly posted code. Since then, there is just one minor patch (attached) which enables use of row-major (c-contiguous) arrays. This does *not* work with strided arrays which would be a fair bit of effort to support. Further, you will have to work with the numpy iterator interface, which, while well-designed, is a great illustration of the effort required to support OO programming in an non-OO language, and is pretty tedious to map to the ublas storage iterator interface. If you do implement it, I would very much like to take a look at it. Regards, Ravi

Neal Becker

22 Jan 22 Jan

12:25 a.m.

Ravi wrote:

...

On Wednesday 21 January 2009 10:22:36 Neal Becker wrote:

...
...
[http://mail.python.org/pipermail/cplusplus-sig/2008- October/013825.html

Thanks for reminding me about this!

Do you have a current version of the code? I grabbed the files from the above message, but I see some additional subsequent messages with more patches.

That is the latest publicly posted code. Since then, there is just one minor patch (attached) which enables use of row-major (c-contiguous) arrays.

This does *not* work with strided arrays which would be a fair bit of effort to support. Further, you will have to work with the numpy iterator interface, which, while well-designed, is a great illustration of the effort required to support OO programming in an non-OO language, and is pretty tedious to map to the ublas storage iterator interface. If you do implement it, I would very much like to take a look at it.

Regards, Ravi

I'm only interested in simple strided 1-d vectors. In that case, I think your code already works. If you have c++ code using the iterator interface, the iterators dereference will use (*array )[index]. This will use operator[], which will call PyArray_GETPTR. So I think this will obey strides. Unfortunately, it will also be slow. I suggest something like the enclosed. I have done some simple tests, and it seems to work.

Ravi

2:07 a.m.

On Wednesday 21 January 2009 13:55:49 Neal Becker wrote:

...

I'm only interested in simple strided 1-d vectors. In that case, I think your code already works. If you have c++ code using the iterator interface, the iterators dereference will use (*array )[index]. This will use operator[], which will call PyArray_GETPTR. So I think this will obey strides.

You are right. I had forgotten that I had simple strided vectors working.

...

Unfortunately, it will also be slow. I suggest something like the enclosed. I have done some simple tests, and it seems to work.

I wonder why PyArray_GETPTR1 is slow. Is it because of the implied integer multiplication? Unfortunately, your approach means that iterators can become invalid if the underlying array is resized to a larger size. Hmmh, perhaps we could make this configurable at compile-time ... Thanks for the code. Could you provide some benchmarks on the relative speeds of the two approaches? Regards, Ravi

Neal Becker

2:23 a.m.

...

On Wednesday 21 January 2009 13:55:49 Neal Becker wrote:

...
I'm only interested in simple strided 1-d vectors. In that case, I think your code already works. If you have c++ code using the iterator interface, the iterators dereference will use (*array )[index]. This will use operator[], which will call PyArray_GETPTR. So I think this will obey strides.

You are right. I had forgotten that I had simple strided vectors working.

...
Unfortunately, it will also be slow. I suggest something like the enclosed. I have done some simple tests, and it seems to work.

I wonder why PyArray_GETPTR1 is slow. Is it because of the implied integer multiplication? Unfortunately, your approach means that iterators can become invalid if the underlying array is resized to a larger size. Hmmh, perhaps we could make this configurable at compile-time ...

Thanks for the code. Could you provide some benchmarks on the relative speeds of the two approaches?

Regards, Ravi Do you know about pyublas? This is the same issue we ran into there. I did not benchmark the code you sent me. I was just going by my experience with

Ravi wrote: pyublas. I guess a benchmark would be a good idea.

Neal Becker

3:04 a.m.

Neal Becker wrote:

...

Ravi wrote:

...
On Wednesday 21 January 2009 13:55:49 Neal Becker wrote:

...
I'm only interested in simple strided 1-d vectors. In that case, I think your code already works. If you have c++ code using the iterator interface, the iterators dereference will use (*array )[index]. This will use operator[], which will call PyArray_GETPTR. So I think this will obey strides.

You are right. I had forgotten that I had simple strided vectors working.

...
Unfortunately, it will also be slow. I suggest something like the enclosed. I have done some simple tests, and it seems to work.

I wonder why PyArray_GETPTR1 is slow. Is it because of the implied integer multiplication? Unfortunately, your approach means that iterators can become invalid if the underlying array is resized to a larger size. Hmmh, perhaps we could make this configurable at compile-time ...

Iterators almost always become invalid under those sorts of changes, so I don't think that's a surprise. GETPTR1 has to do: PyArray_STRIDES(obj)[0] There's several memory references there, and I don't think the compiler can assume that this value doesn't change from one access to another - so it can't be cached. That said, I have tried a few benchmarks. Surprisingly, I'm not seeing any difference in a few quick tests. I do have one cosmetic patch for you. This will shutup gcc giving the longest warning message ever about an unused variable: --- numpy.new.orig/numpyregister.hpp 2009-01-21 15:59:00.000000000 -0500 +++ numpy.new/numpyregister.hpp 2009-01-21 14:11:00.000000000 -0500 @@ -257,7 +257,8 @@ storage_t *the_storage = reinterpret_cast( data ); void *memory_chunk = the_storage->storage.bytes; array_storage_t dd( obj ); - array_t *v = new ( memory_chunk ) array_t( dd.size(), dd ); + //array_t *v = new ( memory_chunk ) array_t( dd.size(), dd ); + new ( memory_chunk ) array_t( dd.size(), dd ); data->convertible = memory_chunk; }

Neal Becker

1:27 a.m.

Ravi wrote:

...

On Wednesday 21 January 2009 10:22:36 Neal Becker wrote:

...
...
[http://mail.python.org/pipermail/cplusplus-sig/2008- October/013825.html

Thanks for reminding me about this!

Do you have a current version of the code? I grabbed the files from the above message, but I see some additional subsequent messages with more patches.

That is the latest publicly posted code. Since then, there is just one minor patch (attached) which enables use of row-major (c-contiguous) arrays.

This does *not* work with strided arrays which would be a fair bit of effort to support. Further, you will have to work with the numpy iterator interface, which, while well-designed, is a great illustration of the effort required to support OO programming in an non-OO language, and is pretty tedious to map to the ublas storage iterator interface. If you do implement it, I would very much like to take a look at it.

Regards, Ravi

It seems your code works fine for my usual style: ublas::vector<T> func (numpy::array_from_py<T>::type const&) But not for a function that modifies it arg in-place (& instead of const&): void func (numpy::array_from_py<T>::type &) This gives: ArgumentError: Python argument types in test1.double(numpy.ndarray) did not match C++ signature: double(boost::numeric::ublas::vector > {lvalue}) My instinct is to ignore it, because I think I don't need it, but do you have a workaround?

Ravi

2:34 a.m.

On Wednesday 21 January 2009 14:57:59 Neal Becker wrote:

...

ublas::vector<T> func (numpy::array_from_py<T>::type const&)

But not for a function that modifies it arg in-place (& instead of const&):

void func (numpy::array_from_py<T>::type &) ^^^^ Use void func (numpy::array_from_py<T>::type )

Why does this work? It is a tradeoff I had to make; I chose to use python conventions rather than C++ conventions. Essentially, what is passed back to you is a reference to the numpy array. Any copies you make of it are actually copies of the reference, not of the actual array. This simplifies the code quite a bit while maintaining the reference semantics that python programmers use. See dump_vec in decco.cc (the example module) for an example. Regards, Ravi

Neal Becker

21 Jan 21 Jan

9:08 p.m.

Ravi wrote:

...

Hi Neal,

On Wednesday 21 January 2009 07:27:04 Neal Becker wrote:

...
It might if I had used this for all of my c++ code, but I have a big library of c++ wrapped code that doesn't use pyublas. Pyublas takes numpy objects from python and allows the use of c++ ublas on it (without conversion).

Most of my code doesn't use numpy, it uses plain ublas to represent vectors, and ublas handles storage. I can only interface to/from numpy with conversion.

I pointed out my code to you on c++-sig[1] a while back that solves precisely this problem. You found a bug with memory management that I fixed in the updated code. Does that still not work for you?

Regards, Ravi

[1] [http://mail.python.org/pipermail/cplusplus-sig/2008-October/013825.html

Do you know if this code will work with strided vectors? If I pass a slice: u = array (...) F (u[::2]) What happens?

Robert Kern

1:53 p.m.

On Tue, Jan 20, 2009 at 20:57, Neal Becker wrote:

...

I see the problem. Thanks for the great profiler! You ought to make this more widely known.

http://pypi.python.org/pypi/line_profiler -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Wes McKinney

11:43 p.m.

Robert-- this is a great little piece of code, I already think it will be a part of my workflow. However, I seem to be getting negative % time taken on the more time consuming lines, perhaps getting some overflow? Thanks a lot, Wes On Wed, Jan 21, 2009 at 3:23 AM, Robert Kern wrote:

...

On Tue, Jan 20, 2009 at 20:57, Neal Becker wrote:

...
I see the problem. Thanks for the great profiler! You ought to make this more widely known.

http://pypi.python.org/pypi/line_profiler

-- Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Robert Kern

22 Jan 22 Jan

3:06 a.m.

On Wed, Jan 21, 2009 at 12:13, Wes McKinney wrote:

...

Robert-- this is a great little piece of code, I already think it will be a part of my workflow. However, I seem to be getting negative % time taken on the more time consuming lines, perhaps getting some overflow?

That's odd. Can you send me the code (perhaps offlist) or at least the .lprof file? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Hanni Ali

1:16 p.m.

I have been using your profiler extensively and it has contributed to my achieving significant improvements in the application I work on largely due to the usefulness of the line by line breakdown enabling me to easily select the next part of code to work on optimizing. So firstly many thanks for writing it. However back to my point, Wes, I have also experienced timing oddities, in particular on Virtual machines (MS Hyper-V has very poor processor timings, the older MS VM works fine though). I believe the negative timings arise when the CPU (be it virtual or possibly physical) deviates from its standard performance or rather the initial timer unit taken, would this make sense to you Robert? Hanni 2009/1/21 Robert Kern

...

On Wed, Jan 21, 2009 at 12:13, Wes McKinney wrote:

...
Robert-- this is a great little piece of code, I already think it will be a part of my workflow. However, I seem to be getting negative % time taken on the more time consuming lines, perhaps getting some overflow?

That's odd. Can you send me the code (perhaps offlist) or at least the .lprof file?

-- Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Robert Kern

1:22 p.m.

On Thu, Jan 22, 2009 at 01:46, Hanni Ali wrote:

...

I have been using your profiler extensively and it has contributed to my achieving significant improvements in the application I work on largely due to the usefulness of the line by line breakdown enabling me to easily select the next part of code to work on optimizing. So firstly many thanks for writing it.

My pleasure.

...

However back to my point, Wes, I have also experienced timing oddities, in particular on Virtual machines (MS Hyper-V has very poor processor timings, the older MS VM works fine though). I believe the negative timings arise when the CPU (be it virtual or possibly physical) deviates from its standard performance or rather the initial timer unit taken, would this make sense to you Robert?

Can you try using cProfile with lots of calls to empty functions? I'm using the same timer functions as cProfile. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Wes McKinney

23 Jan 23 Jan

4:30 a.m.

import cProfile def f(): pass def g(): for i in xrange(1000000): f() cProfile.run("g()")

...

test.py 1000003 function calls in 1.225 CPU seconds

Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 1.225 1.225 <string>:1(<module>) 1000000 0.464 0.000 0.464 0.000 test.py:3(f) 1 0.761 0.761 1.225 1.225 test.py:6(g) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} Running this with line_profiler: Timer unit: 2.9485e-010 s File: test.py Function: g at line 9 Total time: 0.855075 s Line # Hits Time Per Hit % Time Line Contents ============================================================== 9 @profiler 10 def g(): 11 1000001 1844697930 1844.7 63.6 for i in xrange(1000000): 12 1000000 1055333053 1055.3 36.4 f() Which is what I would expect. Hmm On Thu, Jan 22, 2009 at 2:52 AM, Robert Kern wrote:

...

On Thu, Jan 22, 2009 at 01:46, Hanni Ali wrote:

...
I have been using your profiler extensively and it has contributed to my achieving significant improvements in the application I work on largely due to the usefulness of the line by line breakdown enabling me to easily select the next part of code to work on optimizing. So firstly many thanks for writing it.

My pleasure.

...
However back to my point, Wes, I have also experienced timing oddities, in particular on Virtual machines (MS Hyper-V has very poor processor timings, the older MS VM works fine though). I believe the negative timings arise when the CPU (be it virtual or possibly physical) deviates from its standard performance or rather the initial timer unit taken, would this make sense to you Robert?

Can you try using cProfile with lots of calls to empty functions? I'm using the same timer functions as cProfile.

-- Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Robert Kern

4:33 a.m.

On Thu, Jan 22, 2009 at 17:00, Wes McKinney wrote:

...

import cProfile

def f(): pass

def g(): for i in xrange(1000000): f()

cProfile.run("g()")

...
test.py 1000003 function calls in 1.225 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 1.225 1.225 <string>:1(<module>) 1000000 0.464 0.000 0.464 0.000 test.py:3(f) 1 0.761 0.761 1.225 1.225 test.py:6(g) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

Running this with line_profiler:

Timer unit: 2.9485e-010 s

File: test.py Function: g at line 9 Total time: 0.855075 s

Line # Hits Time Per Hit % Time Line Contents ============================================================== 9 @profiler 10 def g(): 11 1000001 1844697930 1844.7 63.6 for i in xrange(1000000): 12 1000000 1055333053 1055.3 36.4 f()

Which is what I would expect. Hmm

What platform are you on? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Wes McKinney

4:39 a.m.

Windows XP, Pentium D, Python 2.5.2 On Thu, Jan 22, 2009 at 6:03 PM, Robert Kern wrote:

...

On Thu, Jan 22, 2009 at 17:00, Wes McKinney wrote:

...
import cProfile

def f(): pass

def g(): for i in xrange(1000000): f()

cProfile.run("g()")

...
test.py 1000003 function calls in 1.225 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 1.225 1.225 <string>:1(<module>) 1000000 0.464 0.000 0.464 0.000 test.py:3(f) 1 0.761 0.761 1.225 1.225 test.py:6(g) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

Running this with line_profiler:

Timer unit: 2.9485e-010 s

File: test.py Function: g at line 9 Total time: 0.855075 s

Line # Hits Time Per Hit % Time Line Contents ============================================================== 9 @profiler 10 def g(): 11 1000001 1844697930 1844.7 63.6 for i in xrange(1000000): 12 1000000 1055333053 1055.3 36.4 f()

Which is what I would expect. Hmm

What platform are you on?

-- Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Robert Kern

1:49 p.m.

On Thu, Jan 22, 2009 at 17:09, Wes McKinney wrote:

...

Windows XP, Pentium D, Python 2.5.2

I can replicate the negative numbers on my Windows VM. I'll take a look at it. Wrote profile results to foo.py.lprof Timer unit: 4.17601e-010 s File: foo.py Function: f at line 1 Total time: -3.02963 s Line # Hits Time Per Hit % Time Line Contents ============================================================== 1 @profile 2 def f(): 3 1000001 -1456737621 -1456.7 20.1 for i in xrange(1000000): 4 1000000 -1540435131 -1540.4 21.2 1+1 5 1000000 -1522306067 -1522.3 21.0 1+1 6 1000000 -1177199444 -1177.2 16.2 1+1 7 1000000 -1558164209 -1558.2 21.5 1+1 -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Christopher Barker

21 Jan 21 Jan

11:14 p.m.

Neal Becker wrote:

...

I tried a little experiment, implementing some code in numpy

It sounds like you've found your core issue, but a couple comments:

...

from numpy import *

I'm convinced that "import *" is a bad idea. I think the "standard" syntax is now "import numpy as np"

...

from math import pi

numpy already has pi -- I find I never need math, if I'm using numpy. def db_to_volt (db): return 10**(0.05*db) ... class ampl (object): ... ampl_interp = linear_interp (vectorize (db_to_volt) (pin), db_to_volt (pout)) you shouldn't need to use vectorize here -- db_to_volt already takes array input. vectorize could kill performance, in fact. ampl_interp = linear_interp(db_to_volt(pin), db_to_volt(pout)) should work fine. also, if you want maximum performance, you can eliminate extraneous array creation in functions like that by: 1) using numexpr (see recent posts about it) 2) writing uglier code that explicitly passes in the output arrays: def db_to_volt (db): a = 0.05*db np.power(10, a, a) This will only help for large arrays, and help more for more complex functions. A minor style nit: I found it remarkably hard to read your code because of the spaces before the open parens for function calls: func (arg1, arg2) It's not just me: PEP 8 makes it very clear: """ Whitespace in Expressions and Statements Pet Peeves Avoid extraneous whitespace in the following situations: - Immediately before the open parenthesis that starts the argument list of a function call: Yes: spam(1) No: spam (1) """ http://www.python.org/dev/peps/pep-0008/ I image you've used that style for years for lots of code, but I couldn't help myself! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

5565

Age (days ago)

5567

Last active (days ago)

List overview

Download

31 comments

8 participants

participants (8)

Christopher Barker
Hanni Ali
Neal Becker
Ravi
Robert Kern
Sturla Molden
T J
Wes McKinney

python numpy code many times slower than c++

Sturla Molden

Sturla Molden

tags

participants (8)