CAPI for noncontiguous arrays
Hi, I am working on a Python extension module using of the NumPy CAPI. The extension module is an interface to an image processing and analysis library written in C++. The C++ functions are exported with boos::python. Currently I am implementing the support of threedimensional data sets which can consume a huge amount of memory. The 3D data is stored in a numpy.ndarray. This array is passed to C++ functions which do the calculations. In general, multidimensional arrays can be organized in memory in four different ways: 1. C order contiguous 2. Fortran order contiguous 3. C order noncontiguous 4. Fortran order noncontiguous Am I right that the NumPy CAPI can only distinguish between three ways the array is organized in memory? These are: 1. C order contiguous e.g. with PyArray_ISCONTIGUOUS(arr) 2. Fortran order contiguous e.g. with PyArray_ISFORTRAN(arr) 3. noncontiguous e.g. with !PyArray_ISCONTIGUOUS(arr) && !PyArray_ISFORTRAN(arr) So there is no way to find out if a noncontiguous array has C order or Fortran order. The same holds for Python code e.g. by use of the flags. a.flags.contiguous a.flags.fortran This is very important for me because I just want to avoid to copy every noncontiguous array into a contiguous array. This would be very inefficient. But I can't find an other solution than copying the array. Also the iterator provided by the CAPI only loops over the array in C order. Even if the array is in Fortran noncontiguous order. Or are there just no Fortran order noncontiguous arrays? I think I can construct one. a = numpy.ndarray((3,4,5), order="F") b = a[:,1:2,:] Now, I think b's elements are organized in memory in Fortran noncontiguous order. But the flags only tell me that it is noncontiguous and not if it is in Fortran order or in C order. And if b would be passed to a C++ function it would not be possible to find out with the CAPI if it is in Fortran order or in C order, too. Any ideas? Or do I always have to create contiguous arrays? Cheers, Oliver
On 10/25/07, Oliver Kranz
I believe that this is incorrect. Consider the following:
import numpy as np a = np.arange(27).reshape(3,3,3) a.strides (36, 12, 4) a.transpose(2,1,0).strides (4, 12, 36) a.transpose(0,2,1).strides (36, 4, 12)
I believe that the last transpose doesn't fit any of these four categories and is simply discontiguous. Am I right that the NumPy CAPI can only distinguish between three ways
By Fortran and COrder discontiguous, do you simply mean that the strides are in increasing and decreasing order respectively? If so, you could check for that without too much trouble.  . __ . \ . . tim.hochberg@ieee.org
Timothy Hochberg wrote:
I believe that this is incorrect. Consider the following:
import numpy as np a = np.arange(27).reshape(3,3,3) a.strides (36, 12, 4) a.transpose(2,1,0).strides (4, 12, 36) a.transpose(0,2,1).strides (36, 4, 12)
I believe that the last transpose doesn't fit any of these four categories and is simply discontiguous.
Yes, you are right. I did not consider this case.
By Fortran and COrder discontiguous, do you simply mean that the strides are in increasing and decreasing order respectively? If so, you could check for that without too much trouble.
Since I want to support all the different contiguous and noncontiguous arrays the best solution for me is always checking the strides if the array is not in C order contiguous. Thanks, Oliver
This is very important for me because I just want to avoid to copy every noncontiguous array into a contiguous array. This would be very inefficient. But I can't find an other solution than copying the array. It is inefficient depending on what you mean by inefficient. Memorywise, copying is obviously inefficient. But speedwise, copying
Oliver Kranz wrote: the array into a contiguous array in C order is faster in most if not all cases, because of memory access times. You may want to read the following article from Ulrich Drepper on memory and cache: http://lwn.net/Articles/252125/ cheers, David
David Cournapeau wrote:
This is very important for me because I just want to avoid to copy every noncontiguous array into a contiguous array. This would be very inefficient. But I can't find an other solution than copying the array. It is inefficient depending on what you mean by inefficient. Memorywise, copying is obviously inefficient. But speedwise, copying
Oliver Kranz wrote: the array into a contiguous array in C order is faster in most if not all cases, because of memory access times.
You may want to read the following article from Ulrich Drepper on memory and cache:
That's an interesting note. We already thought about this. At the moment, we decided to consequently avoid copying in our apecial case. It's not unusal to work with data sets consuming about 1 GB of memory. In the case of arrays not being in contiguous C order we have to live with the inefficiency in speed. Cheers, Oliver
