Why is weave.inline()/blitz++ code 3 times slower than innerproduct()?
![](https://secure.gravatar.com/avatar/8cf54013b28206957c152071a27efb59.jpg?s=120&d=mm&r=g)
Hi all, I think one of the strongest points in favor of python for scientific computing is the ability to write low-level code, when necessary, which can perform on-par with hand-rolled Fortran. In the past, I've been very pleased using weave's inline() tool, which relies on blitz for manipulating Numpy arrays with an very clean and convenient syntax. This is important, because manipulating multidimensional Numeric arrays in C is rather messy, and the resulting code isn't exactly an example of readability. Blitz arrays end up looking just like regular arrays, using (i,j,k) instead of [i][j][k] for indexing. Recently, I needed to do an operation which turned out to be pretty much what Numpy's innerproduct() does. I'd forgotten about innerproduct(), so I just wrote my own using inline(). Later I saw innerproduct(), and decided to compare the results. I'm a little worried by what I found, and I'd like to hear some input from the experts on this problem. I've attached all the necessary code to run my tests, in case someone is willing to do it and take a look. In summary, I found some things which concern me (a README is included in the .tgz with more info): - the blitz code, whether via inline() or a purely hand-written extension, is ~2.5 to 3 times slower than innerproduct(). Considering that this code is specialized to a few sizes and data types, this comes as a big surprise. If the only way to get maximum performance with Numpy arrays is to write by hand to the full low-level api, I know that many people will shy away from python for a certain class of projects. I truly hope I'm missing something here. - There is a significant numerical discrepancy between the two approaches (blitz vs numpy). In an innerproduct operation over 7000 entries, the discrepancy is O(1e-10) (in l2 norm). This is more than I'm comfortable with, but perhaps I'm being naive or optimistic. I view the ability to get blitzed code which performs on par with Fortran as a very important aspect of python's suitability for large-scale project where every last bit of performance matters, but where one still wants to have the ability to work with a reasonably clean syntax. I hope I'm just misusing some tools and not faced with a fundamental limitation. By the way, I'll come to Scipy'03 with many more questions/concerns along these lines, and I think it would be great to have some discussions on these issues there with the experts. Thanks in advance. Cheers, f.
participants (1)
-
Fernando Perez