[Numpy-discussion] Exported symbols and code reorganization.

Wed Jan 10 14:57:45 EST 2007

On Jan 7, 2007, at 00:16 , Charles R Harris wrote:
>
> That brings up the main question I have about how to break up the C  
> files. I note that most of the functions in multiarraymodule.c, for  
> instance, are part of the C-API, and are tagged as belonging to  
> either the MULTIARRAY_API or the OBJECT_API. Apparently the build  
> system scans for these tags and extracts the files somewhere. So,  
> what is this API, is it available somewhere or is the code just  
> copied somewhere convenient. As to breaking up the files, the scan  
> only covers the code in the two current files, included code from  
> broken out parts is not seen. This strikes me as a bit of a kludge,  
> but I am sure there is a reason for it. Anyway, I assume the build  
> system can be fixed, so that brings up the question of how to break  
> up the files. The maximal strategy is to make every API functions,  
> with it's helper functions, a separate file. This adds a *lot* of  
> files, but it is straight forward and modular. A less drastic  
> approach is to start by breaking multiarraymodule into four files:  
> the converters, the two apis, and the module functions. My own  
> preference is for the bunch of files, but I suspect some will object.

The code for pulling out the ``MULTIARRAY_API`` and ``OBJECT_API``  
(also ``UFUNC_API``) is in ``numpy/core/code_generators``. Taking  
``MULTIARRAY_API`` as an example, the ``generate_array_api.py`` is  
run by the ``numpy/core/setup.py`` file to generate the multiarray  
(and object) API. The file ``numpy/core/code_generators/ 
array_api_order.txt`` is the order in which the API functions are  
added to the  ``PyArray_API`` array; this is our guarantee that the  
binary API doesn't change when functions are added. The files scanned  
are listed ``in numpy/core/code_generators/genapi.py``, which is also  
the module that does the heavy lifting in extracting the tagged  
functions.

The parameters and a leading comment for tagged functions are  
extracted from the API source files; code is generated that sets  
``PyArray_API`` up correctly (this is put into ``__multiarray.c`` in  
the build ``src/`` directory), and a header file is created  
(``__multiarray.h`` in the build ``include/`` directory) that  
contains the ``#define's`` for calling the API functions as function  
pointers into ``PyArray_API``. Also, a file ``multiarray_api.txt`` is  
created that contains the leading comments for the tagged functions,  
along with the function signatures, in reStructuredText format.

As to how to break up the files, I would prefer the four files  
approach. I find that a bunch of files (one for each function) to be  
difficult to work with, as I'm continually opening different files to  
find functions. It's also to me taking modularity to the extreme,  
where your first level of division into chunks (functions) are now  
the same size as your second level (files).

Some of the functions, however, are quite large, and could do with  
breaking up into smaller functions.

-- 
|>|\/|<
/------------------------------------------------------------------\
|David M. Cooke              http://arbutus.physics.mcmaster.ca/dmc/
|cookedm at physics.mcmaster.ca