PEP: XXX Title: numerical array headers Version: $Revision: 1.3 $ Last-Modified: $Date: 2002/08/30 04:11:20 $ Author: Todd Miller , Perry Greenfield Discussions-To: numpy-discussion@lists.sf.net Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 02-Jun-2004 Python-Version: 2.4 Post-History: 30-Aug-2002 Abstract ======== We propose the inclusion of three numarray header files within the CPython distribution to facilitate use of numarray array objects as an optional data data format for 3rd party modules. The PEP illustrates a simple technique by which a 3rd party extension may support numarrays as input or output values if numarray is installed, and yet the 3rd party extension does not require numarray to be installed to be built. Nothing needs to be changed in the setup.py or makefile for installing with or without numarray, and a subsequent installation of numarray will allow its use without rebuilding the 3rd party extension. Specification ============= This PEP applies only to the CPython platform and only to numarray. Analogous PEPs could be written for Jython and Python.NET and Numeric, but what is discussed here is a speed optimization that is tightly coupled to CPython and numarray. Three header files to support the numarray C-API should be included in the CPython distribution within a numarray subdirectory of the Python include directory: * numarray/arraybase.h * numarray/libnumeric.h * numarray/arrayobject.h The files are shown prefixed with "numarray" to leave the door open for doing similar PEPs with other packages, such as Numeric. If a plethora of such header contributions is anticipated, a further refinement would be to locate the headers under something like "third_party/numarray". In order to provide enhanced performance for array objects, an extension writer would start by including the numarray C-API in addition to any other Python headers: :: #include "numarray/arrayobject.h" Not shown in this PEP are the API calls which operate on numarrays. These are documented in the numarray manual. What is shown here are two calls which are guaranteed to be safe even when numarray is not installed: * PyArray_Present() * PyArray_isArray() In an extension function that wants to access the numarray API, a test needs to be performed to determine if the API functions are safely callable: :: PyObject * some_array_returning_function(PyObject *m, PyObject *args) { int param; PyObject *result; if (!PyArg_ParseTuple(args, "i", ¶m)) return NULL; if (PyArray_Present()) { result = numarray_returning_function(param); } else { result = list_returning_function(param); } return result; } Within **numarray_returning_function**, a subset of the numarray C-API (the Numeric compatible API) is available for use so it is possible to create and return numarrays. Within **list_returning_function**, only the standard Python C-API can be used because numarray is assumed to be unavailable in that particular Python installation. In an extension function that wants to accept numarrays as inputs and provide improved performance over the Python sequence protocol, an additional convenience function exists which diverts arrays to specialized code when numarray is present and the input is an array: :: PyObject * some_array_accepting_function(PyObject *m, PyObject *args) { PyObject *sequence, *result; if (!PyArg_ParseTuple(args, "O", &sequence)) return NULL; if (PyArray_isArray(sequence)) { result = numarray_input_function(sequence); } else { result = sequence_input_function(sequence); } return result; } During module initialization, a numarray enhanced extension must call **import_array()**, a macro which imports numarray and assigns a value to a static API pointer: PyArray_API. Since the API pointer starts with the value NULL and remains so if the numarray import fails, the API pointer serves as a flag that indicates that numarray was sucessfully imported whenever it is non-NULL. :: static void initfoo(void) { PyObject *m = Py_InitModule3( "foo", _foo_functions, _foo__doc__); if (m == NULL) return; import_array(); } **PyArray_Present()** indicates that numarray was successfully imported. It is defined in terms of the API function pointer as: :: #define PyArray_Present() (PyArray_API != NULL) **PyArray_isArray(s)** indicates that numarray was successfully imported and the given parameter is a numarray instance. It is defined as: :: #define PyArray_isArray(s) (PyArray_Present() && PyArray_Check(s)) Motivation ========== The use of numeric arrays as an interchange format is eminently sensible for many kinds of modules. For example, image, graphics, and audio modules all can accept or generate large amounts of numerical data that could easily use the numarray format. But since numarray is not part of the standard distribution, some authors of 3rd party extensions may be reluctant to add a dependency on a different 3rd party extension that isn't absolutely essential for its use fearing dissuading users who may be put off by extra installation requirements. Yet, not allowing easy interchange with numarray introduces annoyances that need not be present. Normally, in the absence of an explicit ability to generate or use numarray objects, one must write conversion utilities to convert from the data representation used to that for numarray. This typically involves excess copying of data (usually from internal to string to numarray). In cases where the 3rd party uses buffer objects, the data may not need copying at all. Either many users may have to develop their own conversion routines or numarray will have to include adapters for many other 3rd party packages. Since numarray is used by many projects, it makes more sense to put the conversion logic on the other side of the fence. There is a clear need for a mechanism that allows 3rd party software to use numarray objects if it is available without requiring numarray's presence to build and install properly. Rationale ========= One solution is to make numarray part of the standard distribution. That may be a good long-term solution, but at the moment, the numeric community is in transition period between the Numeric and numarray packages which may take years to complete. It is not likely that numarray will be considered for adoption until the transition is complete. Numarray is also a large package, and there is legitimate concern about its inclusion as regards the long-term commitment to support. We can solve that problem by making a few include files part of the Python Standard Distribution and demonstrating how extension writers can write code that uses numarray conditionally. The API submitted in this PEP is the subset of the numarray API which is most source compatible with Numeric. The headers consist of two handwritten files (arraybase.h and arrayobject.h) and one generated file (libnumeric.h). arraybase.h contains typedefs and enumerations which are important to both the API presented here and to the larger numarray specific API. arrayobject.h glues together arraybase and libnumeric and is needed for Numeric compatibility. libnumeric.h consists of macros generated from a template and a list of function prototypes. The macros themselves are somewhat intricate in order to provide the compile time checking effect of function prototypes. Further, the interface takes two forms: one form is used to compile numarray and defines static function prototypes. The other form is used to compile extensions which use the API and defines macros which execute function calls through pointers which are found in a table located using a single public API pointer. These macros also test the value of the API pointer in order to deliver a fatal error should a developer forget to initialize by calling import_array(). The interface chosen here is the subset of numarray most useful for porting existing Numeric code or creating new extensions which can be compiled for either numarray or Numeric. There are a number of other numarray API functions which are omitted here for the sake of simplicity. By choosing to support only the Numeric compatible subset of the numarray C-API, concerns about interface stability are minimized because the Numeric API is well established. However, it should be made clear that the numarray API subset proposed here is source compatible, not binary compatible, with Numeric. Resources ========= * numarray/arraybase.h (http://cvs.sourceforge.net/viewcvs.py/numpy/numarray/Include/numarray/arraybase.h) * numarray/libnumeric.h (http://cvs.sourceforge.net/viewcvs.py/numpy/numarray/Include/numarray/libnumeric.h) * numarray/arrayobject.h (http://cvs.sourceforge.net/viewcvs.py/numpy/numarray/Include/numarray/arrayobject.h) * numarray-1.0 manual PDF * numarray-1.0 source distribution * numarray website at STSCI (http://www.stsci.edu/resources/software_hardware/numarray) * example numarray enhanced extension References ========== . [1] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton (http://www.python.org/peps/pep-0001.html) . [2] PEP 9, Sample Plaintext PEP Template, Warsaw (http://www.python.org/peps/pep-0009.html) Copyright ========= This document has been placed in the public domain. . Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: