
On Tue, 2005-03-15 at 11:03, Perry Greenfield wrote:
On Mar 15, 2005, at 1:18 PM, Ralf Juengling wrote:
konrad.hinsen@laposte.net wrote:
How much do you expect to gain compared to a Python loop in such a case?
I'd expect a factor 5 to 10.
How did you come to that conclusion? It's not at all clear to me that the overhead of the Python operation (i.e., calling the appropriate Python method or function from C) will add appreciably to the time it takes to call it from C.
Good question. Per experiment and profiling I found that I could speed up the code by redefining a few functions. E.g., by setting dot = multiarray.matrixpultiply sum = add.reduce and rewriting outerproduct as a array multiplication (using appropriately reshaped arrays; outerproduct does not occur in forward but in another HMM function) I got a speedup close to 3 over my prototype implementation for the Baum-Welch algorithm (which calls forward). The idea is to specialize a function and avoid dispatching code in the loop. I guess that a factor of 5 to 10 is reasonable to achieve by specializing other functions in the loop, too.
Remember, the speed of the C version of the Python function may have much more overhead than what you envision for an equivalent C function that you would write.
Yes, because of argument checking and dispatching code. I have not studied the implementation of Numeric, but I assume that there are different specialized implementations (for performance reasons) of array functions. To have an example, let's say that there are three special implementations for '*', for the special cases a) both arguments contiguous and of same shape b) both arguments contiguous but of different shape c) otherwise The __mul__ method then has to examine its arguments and dispatch to one of the specialized implementations a), b) or use the generic one c). If I know in advance that both arguments are contiguous and of same shape, then, in a C implementation, I could call a) directly and avoid calling the dispatching code 10000 times in a row. Since the specialized implementations are already there (presumably), the real work in extending the C API is design, i.e., to expose them in a principled way. Please don't get me wrong, I'm not saying that this is an easy thing to do. If you think that this idea is too far off, consider Pyrex. The idea behind Pyrex is essentially the same: You take advantage of special cases by annotating variables. So far this only concerns the type of object, but it is conceivable to extend it to array properties like contiguity.
Adding these to the API is extra work, and worse, it perhaps risks making it harder to change the internals since so much more of what is in C is exposed.
That's a good point.
The current API is essentially centered around exposing the data and means of converting and copying the data, and to a lesser extent, building new UFuncs (for use at the Python level).
Yes. The question is whether it should be more than just that. I believe that, currently, when somebody decides to move a significant portion of numerical code from Python to C, he or she will likely end up writing (specialized versions of) things like 'sum', and 'dot'. But shouldn't those things be provided by an programming environment for scientific computing? Does Scipy have, for instance, a documented C interface to blas and lapack functions? You answer, "Well, there is CBLAS and CLAPACK already." Yes, but by the same argument that pushes Travis to reconsider what should go into scipy_core: it would be nice to be able to use the blas_lite and lapack_lite functions if they cover my needs, and to tell my client, "All else you need to have installed is Python and scipy_core." Ralf