New subject: Speed bottlenecks on simple tasks - suggested improvement

3 Dec 2012

      Hello,

First a quick summary of my problem and at the end I include the basic 
changes I am suggesting to the source (they may benefit others)

I am ages behind in times and I am still using Numeric in Python 2.2.3. 
The main reason why it has taken so long to upgrade is because NumPy 
kills performance on several of my tests.

I am sorry if this topic has been discussed before. I tried parsing the 
mailing list and also google and all I found were comments related to 
the fact that such is life when you use NumPy for small arrays.

In my case I have several thousands of lines of code where data 
structures rely heavily on Numeric arrays but it is unpredictable if the 
problem at hand will result in large or small arrays. Furthermore, once 
the vectorized operations complete, the values could be assigned into 
scalars and just do simple math or loops. I am fairly sure the core of 
my problems is that the 'float64' objects start propagating all over the 
program data structures (not in arrays) and they are considerably slower 
for just about everything when compared to the native python float.

Conclusion, it is not practical for me to do a massive re-structuring of 
code to improve speed on simple things like "a[0] < 4" (assuming "a" is 
an array) which is about 10 times slower than "b < 4" (assuming "b" is a 
float)

I finally decided to track down the problem and I started by getting 
Python 2.6 from source and profiling it in one of my cases. By far the 
biggest bottleneck came out to be PyString_FromFormatV which is a 
function to assemble a string for a Python error caused by a failure to 
find an attribute when "multiarray" calls PyObject_GetAttrString. This 
function seems to get called way too often from NumPy. The real 
bottleneck of trying to find the attribute when it does not exist is not 
that it fails to find it, but that it builds a string to set a Python 
error. In other words, something as simple as "a[0] < 3.5" internally 
result in a call to set a python error .

I downloaded NumPy code (for Python 2.6) and tracked down all the calls 
like this,

  ret = PyObject_GetAttrString(obj, "__array_priority__");

and changed to
     if (PyList_CheckExact(obj) ||  (Py_None == obj) ||
         PyTuple_CheckExact(obj) ||
         PyFloat_CheckExact(obj) ||
         PyInt_CheckExact(obj) ||
         PyString_CheckExact(obj) ||
         PyUnicode_CheckExact(obj)){
         //Avoid expensive calls when I am sure the attribute
         //does not exist
         ret = NULL;
     }
     else{
         ret = PyObject_GetAttrString(obj, "__array_priority__");

( I think I found about 7 spots )

I also noticed (not as bad in my case) that calls to PyObject_GetBuffer 
also resulted in Python errors being set thus unnecessarily slower code.

With this change, something like this,
     for i in xrange(1000000):
         if a[1] < 35.0:
             pass

went down from 0.8 seconds to 0.38 seconds.

A bogus test like this,
for i in xrange(1000000):
         a = array([1., 2., 3.])

went down from 8.5 seconds to 2.5 seconds.

Altogether, these simple changes got me half way to the speed I used to 
get in Numeric and I could not see any slow down in any of my cases that 
benefit from heavy array manipulation. I am out of ideas on how to 
improve further though.

Few questions:
- Is there any interest for me to provide the exact details of the code 
I changed ?

- I managed to compile NumPy through setup.py but I am not sure how to 
force it to generate pdb files from my Visual Studio Compiler. I need 
the pdb files such that I can run my profiler on NumPy. Anybody has any 
experience with this ? (Visual Studio)

- The core of my problems I think boil down to things like this
s = a[0]
assigning a float64 into s as opposed to a native float ?
Is there any way to hack code to change it to extract a native float 
instead ? (probably crazy talk, but I thought I'd ask :) ).
I'd prefer to not use s = a.item(0) because I would have to change too 
much code and it is not even that much faster. For example,
     for i in xrange(1000000):
         if a.item(1) < 35.0:
             pass
is 0.23 seconds (as opposed to 0.38 seconds with my suggested changes)

I apologize again if this topic has already been discussed.

Regards,

Raul

Speed bottlenecks on simple tasks - suggested improvement

Raul Cota

Christoph Gohlke

Raul Cota

Travis Oliphant

Raul Cota

Nathaniel Smith

josef.pktd＠gmail.com

Raul Cota

Chris Barker - NOAA Federal

Raul Cota

tags

participants (6)