[Numpy-discussion] Half baked C API?

Tue Mar 15 14:34:55 EST 2005

On Tue, 2005-03-15 at 11:03, Perry Greenfield wrote:
> On Mar 15, 2005, at 1:18 PM, Ralf Juengling wrote:
> > konrad.hinsen at laposte.net wrote:
> > >
> > > How much do you expect to gain compared to a Python loop in such
> > > a case?
> >
> > I'd expect a factor 5 to 10.
> >
> 
> How did you come to that conclusion? It's not at all clear to me that 
> the overhead of the Python operation (i.e., calling the appropriate 
> Python method or function from C) will add appreciably to the time it 
> takes to call it from C.

Good question.
Per experiment and profiling I found that I could speed up the code 
by redefining a few functions. E.g., by setting

dot = multiarray.matrixpultiply
sum = add.reduce
and rewriting outerproduct as a array multiplication (using
appropriately reshaped arrays; outerproduct does not occur in 
forward but in another HMM function)

I got a speedup close to 3 over my prototype implementation for the
Baum-Welch algorithm (which calls forward). The idea is to specialize 
a function and avoid dispatching code in the loop. I guess that a 
factor of 5 to 10 is reasonable to achieve by specializing other
functions in the loop, too. 

>  Remember, the speed of the C version of the 
> Python function may have much more overhead than what you envision for 
> an equivalent C function that you would write. 

Yes, because of argument checking and dispatching code. I have not
studied the implementation of Numeric, but I assume that there are
different specialized implementations (for performance reasons) of
array functions. To have an example, let's say that there are three
special implementations for '*', for the special cases
a) both arguments contiguous and of same shape
b) both arguments contiguous but of different shape
c) otherwise
The __mul__ method then has to examine its arguments and dispatch
to one of the specialized implementations a), b) or use the 
generic one c).

If I know in advance that both arguments are contiguous and of 
same shape, then, in a C implementation, I could call a) directly
and avoid calling the dispatching code 10000 times in a row.  Since
the specialized implementations are already there (presumably), 
the real work in extending the C API is design, i.e., to expose 
them in a principled way. Please don't get me wrong, I'm not
saying that this is an easy thing to do.

If you think that this idea is too far off, consider Pyrex. The 
idea behind Pyrex is essentially the same: You take advantage
of special cases by annotating variables. So far this only 
concerns the type of object, but it is conceivable to extend it
to array properties like contiguity.

> Adding these to the API is extra work, and worse, 
> it perhaps risks making it harder to change the internals since so much 
> more of what is in C is exposed. 

That's a good point.

> The current API is essentially 
> centered around exposing the data and means of converting and copying 
> the data, and to a lesser extent, building new UFuncs (for use at the 
> Python level).

Yes. The question is whether it should be more than just that. 

I believe that, currently, when somebody decides to move a 
significant portion of numerical code from Python to C, he or 
she will likely end up writing (specialized versions of) things 
like 'sum', and 'dot'. But shouldn't those things be provided by
an programming environment for scientific computing? 

Does Scipy have, for instance, a documented C interface to blas 
and lapack functions? You answer, "Well, there is CBLAS and
CLAPACK already." Yes, but by the same argument that pushes 
Travis to reconsider what should go into scipy_core: it would be
nice to be able to use the blas_lite and lapack_lite functions
if they cover my needs, and to tell my client, "All else you
need to have installed is Python and scipy_core."

Ralf