Re: [Numpy-discussion] Half baked C API?

konrad.hinsen@laposte.net wrote:
How much do you expect to gain compared to a Python loop in such a case?
I'd expect a factor 5 to 10.
Did you consider Pyrex? It lets you move from pure Python to pure C with Python syntax, mixing both within a single function.
I looked at it, but haven't tried it out yet. As far as I understand it, if I'd give Pyrex the example code in my previous posting to translate it to C, the result would contain calls to the Python interpreter to have it eveluate unknown functions like 'dot', 'sum' etc. That would be quite slow. So besides having counterparts in the C API, the tool that does the translation also needs to know about those. Ralf

On Mar 15, 2005, at 1:18 PM, Ralf Juengling wrote:
konrad.hinsen@laposte.net wrote:
How much do you expect to gain compared to a Python loop in such a case?
I'd expect a factor 5 to 10.
How did you come to that conclusion? It's not at all clear to me that the overhead of the Python operation (i.e., calling the appropriate Python method or function from C) will add appreciably to the time it takes to call it from C. Remember, the speed of the C version of the Python function may have much more overhead than what you envision for an equivalent C function that you would write. So it isn't good enough to compare the speed of a python loop to the C code to do sum and dot that you would write. Adding these to the API is extra work, and worse, it perhaps risks making it harder to change the internals since so much more of what is in C is exposed. The current API is essentially centered around exposing the data and means of converting and copying the data, and to a lesser extent, building new UFuncs (for use at the Python level). Perry

On Tue, 2005-03-15 at 11:03, Perry Greenfield wrote:
On Mar 15, 2005, at 1:18 PM, Ralf Juengling wrote:
konrad.hinsen@laposte.net wrote:
How much do you expect to gain compared to a Python loop in such a case?
I'd expect a factor 5 to 10.
How did you come to that conclusion? It's not at all clear to me that the overhead of the Python operation (i.e., calling the appropriate Python method or function from C) will add appreciably to the time it takes to call it from C.
Good question. Per experiment and profiling I found that I could speed up the code by redefining a few functions. E.g., by setting dot = multiarray.matrixpultiply sum = add.reduce and rewriting outerproduct as a array multiplication (using appropriately reshaped arrays; outerproduct does not occur in forward but in another HMM function) I got a speedup close to 3 over my prototype implementation for the Baum-Welch algorithm (which calls forward). The idea is to specialize a function and avoid dispatching code in the loop. I guess that a factor of 5 to 10 is reasonable to achieve by specializing other functions in the loop, too.
Remember, the speed of the C version of the Python function may have much more overhead than what you envision for an equivalent C function that you would write.
Yes, because of argument checking and dispatching code. I have not studied the implementation of Numeric, but I assume that there are different specialized implementations (for performance reasons) of array functions. To have an example, let's say that there are three special implementations for '*', for the special cases a) both arguments contiguous and of same shape b) both arguments contiguous but of different shape c) otherwise The __mul__ method then has to examine its arguments and dispatch to one of the specialized implementations a), b) or use the generic one c). If I know in advance that both arguments are contiguous and of same shape, then, in a C implementation, I could call a) directly and avoid calling the dispatching code 10000 times in a row. Since the specialized implementations are already there (presumably), the real work in extending the C API is design, i.e., to expose them in a principled way. Please don't get me wrong, I'm not saying that this is an easy thing to do. If you think that this idea is too far off, consider Pyrex. The idea behind Pyrex is essentially the same: You take advantage of special cases by annotating variables. So far this only concerns the type of object, but it is conceivable to extend it to array properties like contiguity.
Adding these to the API is extra work, and worse, it perhaps risks making it harder to change the internals since so much more of what is in C is exposed.
That's a good point.
The current API is essentially centered around exposing the data and means of converting and copying the data, and to a lesser extent, building new UFuncs (for use at the Python level).
Yes. The question is whether it should be more than just that. I believe that, currently, when somebody decides to move a significant portion of numerical code from Python to C, he or she will likely end up writing (specialized versions of) things like 'sum', and 'dot'. But shouldn't those things be provided by an programming environment for scientific computing? Does Scipy have, for instance, a documented C interface to blas and lapack functions? You answer, "Well, there is CBLAS and CLAPACK already." Yes, but by the same argument that pushes Travis to reconsider what should go into scipy_core: it would be nice to be able to use the blas_lite and lapack_lite functions if they cover my needs, and to tell my client, "All else you need to have installed is Python and scipy_core." Ralf

Ralf Juengling wrote:
I believe that, currently, when somebody decides to move a significant portion of numerical code from Python to C, he or she will likely end up writing (specialized versions of) things like 'sum', and 'dot'. But shouldn't those things be provided by an programming environment for scientific computing?
Does Scipy have, for instance, a documented C interface to blas and lapack functions? You answer, "Well, there is CBLAS and CLAPACK already." Yes, but by the same argument that pushes Travis to reconsider what should go into scipy_core: it would be nice to be able to use the blas_lite and lapack_lite functions if they cover my needs, and to tell my client, "All else you need to have installed is Python and scipy_core."
I am not sure about the particular case Ralf is considering, but in the past I have been in the situation that I wanted to access algorithms in Numerical Python (such as blas or lapack) at the C level and I couldn't find a way to do it. Note that for ranlib, the header files are actually installed as Numeric/ranlib.h, but as far as I know it is not possible to link a C extension module to Numerical Python's ranlib at the C level. So I would welcome what Ralf is suggesting. --Michiel

On 16.03.2005, at 02:34, Michiel Jan Laurens de Hoon wrote:
do it. Note that for ranlib, the header files are actually installed as Numeric/ranlib.h, but as far as I know it is not possible to link a C extension module to Numerical Python's ranlib at the C level. So I would welcome what Ralf is suggesting.
That's not possible in a portable way, right. For those reasons I usually propose a C API in my C extension modules (Scientific.IO.NetCDF and Scientific.MPI for example) that is accessible through C pointer objects in Python. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen@cea.fr ------------------------------------------------------------------------ -------

konrad.hinsen@laposte.net wrote:
On 16.03.2005, at 02:34, Michiel Jan Laurens de Hoon wrote:
do it. Note that for ranlib, the header files are actually installed as Numeric/ranlib.h, but as far as I know it is not possible to link a C extension module to Numerical Python's ranlib at the C level. So I would welcome what Ralf is suggesting.
That's not possible in a portable way, right.
I'm not sure why that wouldn't be portable, since we wouldn't be distributing binaries. The idea is that both a ranlib/blas/lapack library and the extension module is compiled when installing Numerical Python, installing the library in /usr/local/lib/python2.4/Numeric (and the module as usual in /usr/local/lib/python2.4/site-packages/Numeric). Extension modules that what to use ranlib/blas/lapack at the C level can then use the include file from /usr/local/include/python2.4/Numeric and link to the library in /usr/local/lib/python2.4/Numeric. Well maybe I'm missing something basic here ... --Michiel.

On 17.03.2005, at 01:51, Michiel Jan Laurens de Hoon wrote:
I'm not sure why that wouldn't be portable, since we wouldn't be distributing binaries. The idea is that both a ranlib/blas/lapack library and the extension
In general, shared library A cannot rely on having access to the symbols of shared library B. So if shared library A (NumPy) wants to make symbols that it got from ranlib or BLAS available to other modules, it must make them available through C objects.
ranlib/blas/lapack at the C level can then use the include file from /usr/local/include/python2.4/Numeric and link to the library in /usr/local/lib/python2.4/Numeric.
If it placed there as a standard linkable library, that would of course work, but that would be an additional step in NumPy installation. I am not sure it's a good idea in the long run. I'd rather have libraries of general interests in /usr/local/lib or /usr/lib. Konrad. -- ------------------------------------------------------------------------ ------- Konrad Hinsen Laboratoire Leon Brillouin, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France Tel.: +33-1 69 08 79 25 Fax: +33-1 69 08 82 61 E-Mail: khinsen@cea.fr ------------------------------------------------------------------------ -------

On Tue, Mar 15, 2005 at 02:32:06PM -0800, Ralf Juengling wrote:
dot = multiarray.matrixpultiply sum = add.reduce and rewriting outerproduct as a array multiplication (using appropriately reshaped arrays; outerproduct does not occur in forward but in another HMM function)
I got a speedup close to 3 over my prototype implementation for the Baum-Welch algorithm (which calls forward). The idea is to specialize a function and avoid dispatching code in the loop. I guess that a factor of 5 to 10 is reasonable to achieve by specializing other functions in the loop, too.
Hi, this is only side-related to your problem, but are you aware of the existence of http://www.logilab.org/projects/hmm/ ? It may not be very fast (we mainly looked for clarity in the code, and ended with something "fast enough" for our needs), but maybe it will match yours. Or it may provide a starting poing for your implementation. -- Alexandre Fayolle LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org
participants (5)
-
Alexandre
-
konrad.hinsen@laposte.net
-
Michiel Jan Laurens de Hoon
-
Perry Greenfield
-
Ralf Juengling