[Matrix-SIG] coding examples/advice for building Numeric Python extensions?
Konrad Hinsen
hinsen@cnrs-orleans.fr
Mon, 25 Jan 1999 19:35:02 +0100
> 1) When (if ever) I should be concerned about "non-contiguous" arrays.
Whenever your input arrays might be non-contiguous, i.e. the result
of slicing operations. In many situations you can be sure to have
contiguous arrays, e.g. if the array has just been created from
scratch (in the C extension or in the Python code that calls it).
> 2) If I want to avoid the overhead of copying megabytes of data to insure
> that it is contiguous, how do I access the data in the PyArrayObject?
All you have to do is take into account the stride information. It is
not at all difficult to access a non-contiguous array if you are
writing NumPy-specific code; the only real problem is that the array
data does not have the layout of standard C-style arrays, so most C
library routines that are not NumPy-aware won't work on it. In that
case you must either copy or rewrite the code.
> 3) Has someone who is familiar with NumPy written a brief extension writer's guide?
Not me :-(
> 4) At the very least, could someone write somes brief descriptions of the
> functions that an extension writer should be using?
There are only very few functions that are needed for most
applications:
PyArray_FromDims(ndim, dim, datatype)
creates a new array of "ndim" dimensions and type "datatype",
with the length of each dimension (i.e. the shape) defined by
the int array "dim".
PyArray_FromDimsAndData(ndim, dim, datatype, data)
same as above, but uses the memory pointed to by "data"
as the data space of the array, no new memory is allocated
other than the small amount needed for the array header.
Useful to pass arrays created in C/Fortran code to Python,
but also dangerous; the user must ensure that the data space
is never freed before the end of the program.
PyArray_ContiguousFromObject(object, datatype, min_dimensions, max_dimensions)
used in C functions that want to accept any Python sequence object
as input (like most NumPy functions do). "object" is the input
object, "datatype" the array data type. In general,
"min_dimensions" and "max_dimensions" are set to zero.
The resulting is guaranteed to be a contiguous array. If the
input is a contiguous array, the output is just the input, nothing
is copied. In all other circumstances, a new array object is
allocated and the data is copied (and perhaps converted).
PyArray_Return(arrayobject)
typically used for returning data at the end of a C function.
Returns the array object if its number of dimensions is larger
than zero, or an equivalent scalar object (int/float...) instead
of a zero-dimensional array.
In my experience, these are the only functions that are needed
for most C extension modules.
> 5) Can someone recommend a short NumPy extension that would serve as an
> example?
The shortest one that I am aware of is the random number module from
the LLNL distribution, but it doesn't use all the functions listed
above. Other examples are the LAPACK and FFTPACK interfaces that
come with NumPy itself, and my netCDF module. Then there are
the FFTW and Cephes modules by Travis Oliphant, which must also
use NumPy calls, although I haven't looked at the source. Plus
probably many more...
--
-------------------------------------------------------------------------------
Konrad Hinsen | E-Mail: hinsen@cnrs-orleans.fr
Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.55.69
Rue Charles Sadron | Fax: +33-2.38.63.15.17
45071 Orleans Cedex 2 | Deutsch/Esperanto/English/
France | Nederlands/Francais
-------------------------------------------------------------------------------