[Numpy-discussion] Numarray header PEP

Todd Miller jmiller at stsci.edu
Mon Jun 28 12:42:02 EDT 2004


Perry and I have written a draft PEP for the inclusion of some of the
numarray headers within the Python distribution.  The headers enable
extension writers to provide better support for numarray without adding
any additional dependencies (from a user's perspective) to their
packages.  For users who have numarray, an "improved extension" could
provide better performance, and for those without numarray, the
extension could work normally using the Sequence protocol.

The PEP is attached.  It is formatted using the docutils package which
can be used to generate HTML or PDF.  Comments and corrections would be
appreciated.

Regards,
Todd
-------------- next part --------------
PEP: XXX
Title: numerical array headers
Version: $Revision: 1.3 $
Last-Modified: $Date: 2002/08/30 04:11:20 $
Author: Todd Miller <jmiller at stsci.edu>, Perry Greenfield <perry at stsci.edu>
Discussions-To:  numpy-discussion at lists.sf.net
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 02-Jun-2004
Python-Version: 2.4
Post-History: 30-Aug-2002


Abstract
========

We propose the inclusion of three numarray header files within the
CPython distribution to facilitate use of numarray array objects as an
optional data data format for 3rd party modules. The PEP illustrates a
simple technique by which a 3rd party extension may support numarrays
as input or output values if numarray is installed, and yet the 3rd
party extension does not require numarray to be installed to be
built. Nothing needs to be changed in the setup.py or makefile for
installing with or without numarray, and a subsequent installation of
numarray will allow its use without rebuilding the 3rd party
extension.

Specification
=============

This PEP applies only to the CPython platform and only to numarray.
Analogous PEPs could be written for Jython and Python.NET and Numeric,
but what is discussed here is a speed optimization that is tightly
coupled to CPython and numarray.  Three header files to support the
numarray C-API should be included in the CPython distribution within a
numarray subdirectory of the Python include directory:

*    numarray/arraybase.h
*    numarray/libnumeric.h
*    numarray/arrayobject.h


The files are shown prefixed with "numarray" to leave the door open
for doing similar PEPs with other packages, such as Numeric.  If a
plethora of such header contributions is anticipated, a further
refinement would be to locate the headers under something like
"third_party/numarray".

In order to provide enhanced performance for array objects, an
extension writer would start by including the numarray C-API in
addition to any other Python headers:

::

    #include "numarray/arrayobject.h"

Not shown in this PEP are the API calls which operate on numarrays.
These are documented in the numarray manual.  What is shown here are
two calls which are guaranteed to be safe even when numarray is not
installed:

* PyArray_Present()
* PyArray_isArray()

In an extension function that wants to access the numarray API,
a test needs to be performed to determine if the API functions
are safely callable:

::

    PyObject *
    some_array_returning_function(PyObject *m, PyObject *args)
    {
            int param;
            PyObject *result;

            if (!PyArg_ParseTuple(args, "i", &param))
               return NULL;

            if (PyArray_Present()) {
               result = numarray_returning_function(param);
            } else {
               result = list_returning_function(param);
            }
            return result;
    }

Within **numarray_returning_function**, a subset of the numarray C-API
(the Numeric compatible API) is available for use so it is possible to
create and return numarrays.

Within **list_returning_function**, only the standard Python C-API can
be used because numarray is assumed to be unavailable in that
particular Python installation.

In an extension function that wants to accept numarrays as inputs and
provide improved performance over the Python sequence protocol, an
additional convenience function exists which diverts arrays to
specialized code when numarray is present and the input is an array:

::

    PyObject *
    some_array_accepting_function(PyObject *m, PyObject *args)
    {
            PyObject *sequence, *result;

            if (!PyArg_ParseTuple(args, "O", &sequence))
               return NULL;

            if (PyArray_isArray(sequence)) {
               result = numarray_input_function(sequence);
            } else {
               result = sequence_input_function(sequence);
            }
            return result;
    }

During module initialization, a numarray enhanced extension must call
**import_array()**, a macro which imports numarray and assigns a value
to a static API pointer: PyArray_API.  Since the API pointer starts
with the value NULL and remains so if the numarray import fails, the
API pointer serves as a flag that indicates that numarray was
sucessfully imported whenever it is non-NULL.

::

    static void
    initfoo(void)
    {
	PyObject *m = Py_InitModule3(
                 "foo",
                  _foo_functions,
                  _foo__doc__);
	if (m == NULL) return;
        import_array();
    }       

**PyArray_Present()** indicates that numarray was successfully
imported.  It is defined in terms of the API function pointer as:

::

    #define PyArray_Present()  (PyArray_API != NULL)


**PyArray_isArray(s)** indicates that numarray was successfully
imported and the given parameter is a numarray instance.  It is
defined as:

::

    #define PyArray_isArray(s)  (PyArray_Present() && PyArray_Check(s))

Motivation
==========

The use of numeric arrays as an interchange format is eminently 
sensible for many kinds of modules. For example, image, graphics,
and audio modules all can accept or generate large amounts of
numerical data that could easily use the numarray format. But since
numarray is not part of the standard distribution, some authors
of 3rd party extensions may be reluctant to add a dependency
on a different 3rd party extension that isn't absolutely essential
for its use fearing dissuading users who may be put off by extra
installation requirements. Yet, not allowing easy interchange with
numarray introduces annoyances that need not be present. Normally,
in the absence of an explicit ability to generate or use numarray
objects, one must write conversion utilities to convert from the
data representation used to that for numarray. This typically involves
excess copying of data (usually from internal to string to numarray).
In cases where the 3rd party uses buffer objects, the data may not
need copying at all.

Either many users may have to develop their own conversion routines
or numarray will have to include adapters for many other 3rd party
packages. Since numarray is used by many projects, it makes more
sense to put the conversion logic on the other side of the
fence.

There is a clear need for a mechanism that allows 3rd party software
to use numarray objects if it is available without requiring
numarray's presence to build and install properly.

Rationale
=========

One solution is to make numarray part of the standard distribution.
That may be a good long-term solution, but at the moment, the numeric
community is in transition period between the Numeric and numarray
packages which may take years to complete. It is not likely that
numarray will be considered for adoption until the transition is
complete. Numarray is also a large package, and there is legitimate
concern about its inclusion as regards the long-term commitment to
support.

We can solve that problem by making a few include files part of the
Python Standard Distribution and demonstrating how extension writers
can write code that uses numarray conditionally.

The API submitted in this PEP is the subset of the numarray API which
is most source compatible with Numeric.  The headers consist of two
handwritten files (arraybase.h and arrayobject.h) and one generated
file (libnumeric.h).  

arraybase.h contains typedefs and enumerations which are important to
both the API presented here and to the larger numarray specific API.

arrayobject.h glues together arraybase and libnumeric and is needed
for Numeric compatibility.  

libnumeric.h consists of macros generated from a template and a list
of function prototypes.  The macros themselves are somewhat intricate
in order to provide the compile time checking effect of function
prototypes.  Further, the interface takes two forms: one form is used
to compile numarray and defines static function prototypes.  The other
form is used to compile extensions which use the API and defines
macros which execute function calls through pointers which are found
in a table located using a single public API pointer.  These macros
also test the value of the API pointer in order to deliver a fatal
error should a developer forget to initialize by calling
import_array().

The interface chosen here is the subset of numarray most useful for
porting existing Numeric code or creating new extensions which can be
compiled for either numarray or Numeric.  There are a number of other
numarray API functions which are omitted here for the sake of
simplicity.

By choosing to support only the Numeric compatible subset of the
numarray C-API, concerns about interface stability are minimized
because the Numeric API is well established.  However, it should be
made clear that the numarray API subset proposed here is source
compatible, not binary compatible, with Numeric.

Resources
=========

* numarray/arraybase.h      (http://cvs.sourceforge.net/viewcvs.py/numpy/numarray/Include/numarray/arraybase.h)

* numarray/libnumeric.h     (http://cvs.sourceforge.net/viewcvs.py/numpy/numarray/Include/numarray/libnumeric.h)

* numarray/arrayobject.h    (http://cvs.sourceforge.net/viewcvs.py/numpy/numarray/Include/numarray/arrayobject.h)

* numarray-1.0 manual PDF

* numarray-1.0 source distribution

* numarray website at STSCI (http://www.stsci.edu/resources/software_hardware/numarray)

* example numarray enhanced extension

References
==========

. [1] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton
   (http://www.python.org/peps/pep-0001.html)

. [2] PEP 9, Sample Plaintext PEP Template, Warsaw
   (http://www.python.org/peps/pep-0009.html)


Copyright
=========

This document has been placed in the public domain.



.
   Local Variables:
   mode: indented-text
   indent-tabs-mode: nil
   sentence-end-double-space: t
   fill-column: 70
   End:


More information about the NumPy-Discussion mailing list