C-API for non-contiguous arrays
Hi, I am working on a Python extension module using of the NumPy C-API. The extension module is an interface to an image processing and analysis library written in C++. The C++ functions are exported with boos::python. Currently I am implementing the support of three-dimensional data sets which can consume a huge amount of memory. The 3D data is stored in a numpy.ndarray. This array is passed to C++ functions which do the calculations. In general, multi-dimensional arrays can be organized in memory in four different ways: 1. C order contiguous 2. Fortran order contiguous 3. C order non-contiguous 4. Fortran order non-contiguous Am I right that the NumPy C-API can only distinguish between three ways the array is organized in memory? These are: 1. C order contiguous e.g. with PyArray_ISCONTIGUOUS(arr) 2. Fortran order contiguous e.g. with PyArray_ISFORTRAN(arr) 3. non-contiguous e.g. with !PyArray_ISCONTIGUOUS(arr) && !PyArray_ISFORTRAN(arr) So there is no way to find out if a non-contiguous array has C order or Fortran order. The same holds for Python code e.g. by use of the flags. a.flags.contiguous a.flags.fortran This is very important for me because I just want to avoid to copy every non-contiguous array into a contiguous array. This would be very inefficient. But I can't find an other solution than copying the array. Also the iterator provided by the C-API only loops over the array in C order. Even if the array is in Fortran non-contiguous order. Or are there just no Fortran order non-contiguous arrays? I think I can construct one. a = numpy.ndarray((3,4,5), order="F") b = a[:,1:2,:] Now, I think b's elements are organized in memory in Fortran non-contiguous order. But the flags only tell me that it is non-contiguous and not if it is in Fortran order or in C order. And if b would be passed to a C++ function it would not be possible to find out with the C-API if it is in Fortran order or in C order, too. Any ideas? Or do I always have to create contiguous arrays? Cheers, Oliver
On 10/25/07, Oliver Kranz <o.kranz@gmx.de> wrote:
I believe that this is incorrect. Consider the following:
I believe that the last transpose doesn't fit any of these four categories and is simply discontiguous. Am I right that the NumPy C-API can only distinguish between three ways
By Fortran and C-Order discontiguous, do you simply mean that the strides are in increasing and decreasing order respectively? If so, you could check for that without too much trouble. -- . __ . |-\ . . tim.hochberg@ieee.org
Oliver Kranz wrote: the array into a contiguous array in C order is faster in most if not all cases, because of memory access times. You may want to read the following article from Ulrich Drepper on memory and cache: http://lwn.net/Articles/252125/ cheers, David
David Cournapeau wrote:
That's an interesting note. We already thought about this. At the moment, we decided to consequently avoid copying in our apecial case. It's not unusal to work with data sets consuming about 1 GB of memory. In the case of arrays not being in contiguous C order we have to live with the inefficiency in speed. Cheers, Oliver
participants (3)
-
David Cournapeau
-
Oliver Kranz
-
Timothy Hochberg