[Numpy-discussion] Exported symbols and code reorganization.

Wed Jan 10 17:41:33 EST 2007

On Jan 10, 2007, at 13:52 , Charles R Harris wrote:
> On 1/10/07, David M. Cooke <cookedm at physics.mcmaster.ca> wrote: On  
> Jan 7, 2007, at 00:16 , Charles R Harris wrote:
> >
> > That brings up the main question I have about how to break up the C
> > files. I note that most of the functions in multiarraymodule.c, for
> > instance, are part of the C-API, and are tagged as belonging to
> > either the MULTIARRAY_API or the OBJECT_API. Apparently the build
> > system scans for these tags and extracts the files somewhere. So,
> > what is this API, is it available somewhere or is the code just
> > copied somewhere convenient. As to breaking up the files, the scan
> > only covers the code in the two current files, included code from
> > broken out parts is not seen. This strikes me as a bit of a kludge,
> > but I am sure there is a reason for it. Anyway, I assume the build
> > system can be fixed, so that brings up the question of how to break
> > up the files. The maximal strategy is to make every API functions,
> > with it's helper functions, a separate file. This adds a *lot* of
> > files, but it is straight forward and modular. A less drastic
> > approach is to start by breaking multiarraymodule into four files:
> > the converters, the two apis, and the module functions. My own
> > preference is for the bunch of files, but I suspect some will  
> object.
>
> The code for pulling out the ``MULTIARRAY_API`` and ``OBJECT_API``
> (also ``UFUNC_API``) is in ``numpy/core/code_generators``. Taking
> ``MULTIARRAY_API`` as an example, the ``generate_array_api.py`` is
> run by the ``numpy/core/setup.py`` file to generate the multiarray
> (and object) API. The file ``numpy/core/code_generators/
> array_api_order.txt`` is the order in which the API functions are
> added to the  ``PyArray_API`` array; this is our guarantee that the
> binary API doesn't change when functions are added. The files scanned
> are listed ``in numpy/core/code_generators/genapi.py``, which is also
> the module that does the heavy lifting in extracting the tagged
> functions.
>
> Looked to me like the order could change without causing problems.  
> The include file was also written by the code generator and for  
> extension modules was just a bunch of macros assigning the proper  
> function pointer to the correct name. That brings up another bit,  
> however. At some point I would like to break the include file into  
> two parts, one for inclusion in the other numpy modules and another  
> for inclusion in extension modules, the big #ifdef in the current  
> file offends my sense of esthetics. It should also be possible to  
> attach the function pointers to real function prototype like  
> declarations, which would help extension modules check the code at  
> compile time.

No, the order is necessary for binary compatibility. If PyArray_API 
[3] points to function 'A', and PyArray_API[4] points to function  
'B', then, if A and B are reversed in a newer version, any extension  
module compiled with the previous version will now call function 'B'  
instead of 'A', and vice versa. Adding functions to the end is ok,  
though.

Instead of using an array, we could instead use a large struct, whose  
members are of right type as the function assigned to them, as in

struct PyArray_API_t {
	PyObject *(*transpose)(PyArrayObject *, PyArray_Dims *);
	PyObject *(*take_from)(PyArrayObject *, PyObject *, int,  
PyArrayObject *, NPY_CLIPMODE);
}

struct PyArray_API_t PyArray_API = {
	PyArray_Transpose,
	PyArray_TakeFrom,
}

#define PyArray_Transpose (PyArray_API->transpose)

This would give us better type-checking when compiling, and make it  
easier when running under gdb (when your extension crashes when  
calling into numpy, gdb would report the function as something like  
PyArray_API[31], because that's all the information it has). We would  
still have to guarantee the order for binary compability. One problem  
is that you'll have to make sure that the alignment of the fields  
doesn't change either (something that's not a problem for an array of  
pointers).

Now, I was going to try to remove the order requirement, but never  
got around to it (you can see some of the initial work in numpy/core/ 
code_generators/genapi.py in the api_hash() routines). The idea is to  
have a unique identifier for each function (I use a hash of the name  
and the arguments, but for this example, let's just use the function  
name). An extension module, when compiled, would have a list of  
function names in the order it expects. In the import_array(), it  
would call numpy to give it the addresses corresponding to those names.

As Python code, the above would look something like this:

COMPILED_WITH_NUMPY_VERSION="1.2"
API_names = ["PyArray_Transpose", "PyArray_TakeFrom"]
def import_array():
     global API
     API = numpy.get_c_api(API_names, COMPILED_WITH_NUMPY_VERSION)

def a_routine():
     API[3](an_array)

(Of course, that'd have to be translated to C.) numpy.get_c_api would  
be responsible for putting things in the order that the module  
expects. One advantage of this method is that numpy.get_c_api can  
worry about how to be compatible with previous versions, if things  
change more than just adding functions. For instance, supplying  
different versions of functions becomes possible.

-- 
|>|\/|<
/------------------------------------------------------------------\
|David M. Cooke              http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca