C-API for non-contiguous arrays

Hi, I am working on a Python extension module using of the NumPy C-API. The extension module is an interface to an image processing and analysis library written in C++. The C++ functions are exported with boos::python. Currently I am implementing the support of three-dimensional data sets which can consume a huge amount of memory. The 3D data is stored in a numpy.ndarray. This array is passed to C++ functions which do the calculations. In general, multi-dimensional arrays can be organized in memory in four different ways: 1. C order contiguous 2. Fortran order contiguous 3. C order non-contiguous 4. Fortran order non-contiguous Am I right that the NumPy C-API can only distinguish between three ways the array is organized in memory? These are: 1. C order contiguous e.g. with PyArray_ISCONTIGUOUS(arr) 2. Fortran order contiguous e.g. with PyArray_ISFORTRAN(arr) 3. non-contiguous e.g. with !PyArray_ISCONTIGUOUS(arr) && !PyArray_ISFORTRAN(arr) So there is no way to find out if a non-contiguous array has C order or Fortran order. The same holds for Python code e.g. by use of the flags. a.flags.contiguous a.flags.fortran This is very important for me because I just want to avoid to copy every non-contiguous array into a contiguous array. This would be very inefficient. But I can't find an other solution than copying the array. Also the iterator provided by the C-API only loops over the array in C order. Even if the array is in Fortran non-contiguous order. Or are there just no Fortran order non-contiguous arrays? I think I can construct one. a = numpy.ndarray((3,4,5), order="F") b = a[:,1:2,:] Now, I think b's elements are organized in memory in Fortran non-contiguous order. But the flags only tell me that it is non-contiguous and not if it is in Fortran order or in C order. And if b would be passed to a C++ function it would not be possible to find out with the C-API if it is in Fortran order or in C order, too. Any ideas? Or do I always have to create contiguous arrays? Cheers, Oliver

On 10/25/07, Oliver Kranz <o.kranz@gmx.de> wrote:
Hi,
I am working on a Python extension module using of the NumPy C-API. The extension module is an interface to an image processing and analysis library written in C++. The C++ functions are exported with boos::python. Currently I am implementing the support of three-dimensional data sets which can consume a huge amount of memory. The 3D data is stored in a numpy.ndarray. This array is passed to C++ functions which do the calculations.
In general, multi-dimensional arrays can be organized in memory in four different ways: 1. C order contiguous 2. Fortran order contiguous 3. C order non-contiguous 4. Fortran order non-contiguous
I believe that this is incorrect. Consider the following:
import numpy as np a = np.arange(27).reshape(3,3,3) a.strides (36, 12, 4) a.transpose(2,1,0).strides (4, 12, 36) a.transpose(0,2,1).strides (36, 4, 12)
I believe that the last transpose doesn't fit any of these four categories and is simply discontiguous. Am I right that the NumPy C-API can only distinguish between three ways
the array is organized in memory? These are: 1. C order contiguous e.g. with PyArray_ISCONTIGUOUS(arr) 2. Fortran order contiguous e.g. with PyArray_ISFORTRAN(arr) 3. non-contiguous e.g. with !PyArray_ISCONTIGUOUS(arr) && !PyArray_ISFORTRAN(arr)
So there is no way to find out if a non-contiguous array has C order or Fortran order. The same holds for Python code e.g. by use of the flags.
a.flags.contiguous a.flags.fortran
This is very important for me because I just want to avoid to copy every non-contiguous array into a contiguous array. This would be very inefficient. But I can't find an other solution than copying the array.
Also the iterator provided by the C-API only loops over the array in C order. Even if the array is in Fortran non-contiguous order.
Or are there just no Fortran order non-contiguous arrays? I think I can construct one.
a = numpy.ndarray((3,4,5), order="F") b = a[:,1:2,:]
Now, I think b's elements are organized in memory in Fortran non-contiguous order. But the flags only tell me that it is non-contiguous and not if it is in Fortran order or in C order. And if b would be passed to a C++ function it would not be possible to find out with the C-API if it is in Fortran order or in C order, too.
Any ideas? Or do I always have to create contiguous arrays?
By Fortran and C-Order discontiguous, do you simply mean that the strides are in increasing and decreasing order respectively? If so, you could check for that without too much trouble. -- . __ . |-\ . . tim.hochberg@ieee.org

Timothy Hochberg wrote:
On 10/25/07, *Oliver Kranz* <o.kranz@gmx.de <mailto:o.kranz@gmx.de>> wrote:
Hi,
I am working on a Python extension module using of the NumPy C-API. The extension module is an interface to an image processing and analysis library written in C++. The C++ functions are exported with boos::python. Currently I am implementing the support of three-dimensional data sets which can consume a huge amount of memory. The 3D data is stored in a numpy.ndarray. This array is passed to C++ functions which do the calculations.
In general, multi-dimensional arrays can be organized in memory in four different ways: 1. C order contiguous 2. Fortran order contiguous 3. C order non-contiguous 4. Fortran order non-contiguous
I believe that this is incorrect. Consider the following:
import numpy as np a = np.arange(27).reshape(3,3,3) a.strides (36, 12, 4) a.transpose(2,1,0).strides (4, 12, 36) a.transpose(0,2,1).strides (36, 4, 12)
I believe that the last transpose doesn't fit any of these four categories and is simply discontiguous.
Yes, you are right. I did not consider this case.
Am I right that the NumPy C-API can only distinguish between three ways the array is organized in memory? These are: 1. C order contiguous e.g. with PyArray_ISCONTIGUOUS(arr) 2. Fortran order contiguous e.g. with PyArray_ISFORTRAN(arr) 3. non-contiguous e.g. with !PyArray_ISCONTIGUOUS(arr) && !PyArray_ISFORTRAN(arr)
So there is no way to find out if a non-contiguous array has C order or Fortran order. The same holds for Python code e.g. by use of the flags.
a.flags.contiguous a.flags.fortran
This is very important for me because I just want to avoid to copy every non-contiguous array into a contiguous array. This would be very inefficient. But I can't find an other solution than copying the array.
Also the iterator provided by the C-API only loops over the array in C order. Even if the array is in Fortran non-contiguous order.
Or are there just no Fortran order non-contiguous arrays? I think I can construct one.
a = numpy.ndarray((3,4,5), order="F") b = a[:,1:2,:]
Now, I think b's elements are organized in memory in Fortran non-contiguous order. But the flags only tell me that it is non-contiguous and not if it is in Fortran order or in C order. And if b would be passed to a C++ function it would not be possible to find out with the C-API if it is in Fortran order or in C order, too.
Any ideas? Or do I always have to create contiguous arrays?
By Fortran and C-Order discontiguous, do you simply mean that the strides are in increasing and decreasing order respectively? If so, you could check for that without too much trouble.
Since I want to support all the different contiguous and non-contiguous arrays the best solution for me is always checking the strides if the array is not in C order contiguous. Thanks, Oliver

Hi,
I am working on a Python extension module using of the NumPy C-API. The extension module is an interface to an image processing and analysis library written in C++. The C++ functions are exported with boos::python. Currently I am implementing the support of three-dimensional data sets which can consume a huge amount of memory. The 3D data is stored in a numpy.ndarray. This array is passed to C++ functions which do the calculations.
In general, multi-dimensional arrays can be organized in memory in four different ways: 1. C order contiguous 2. Fortran order contiguous 3. C order non-contiguous 4. Fortran order non-contiguous
Am I right that the NumPy C-API can only distinguish between three ways the array is organized in memory? These are: 1. C order contiguous e.g. with PyArray_ISCONTIGUOUS(arr) 2. Fortran order contiguous e.g. with PyArray_ISFORTRAN(arr) 3. non-contiguous e.g. with !PyArray_ISCONTIGUOUS(arr) && !PyArray_ISFORTRAN(arr)
So there is no way to find out if a non-contiguous array has C order or Fortran order. The same holds for Python code e.g. by use of the flags.
a.flags.contiguous a.flags.fortran
This is very important for me because I just want to avoid to copy every non-contiguous array into a contiguous array. This would be very inefficient. But I can't find an other solution than copying the array. It is inefficient depending on what you mean by inefficient. Memory-wise, copying is obviously inefficient. But speed-wise, copying
Oliver Kranz wrote: the array into a contiguous array in C order is faster in most if not all cases, because of memory access times. You may want to read the following article from Ulrich Drepper on memory and cache: http://lwn.net/Articles/252125/ cheers, David

David Cournapeau wrote:
Hi,
I am working on a Python extension module using of the NumPy C-API. The extension module is an interface to an image processing and analysis library written in C++. The C++ functions are exported with boos::python. Currently I am implementing the support of three-dimensional data sets which can consume a huge amount of memory. The 3D data is stored in a numpy.ndarray. This array is passed to C++ functions which do the calculations.
In general, multi-dimensional arrays can be organized in memory in four different ways: 1. C order contiguous 2. Fortran order contiguous 3. C order non-contiguous 4. Fortran order non-contiguous
Am I right that the NumPy C-API can only distinguish between three ways the array is organized in memory? These are: 1. C order contiguous e.g. with PyArray_ISCONTIGUOUS(arr) 2. Fortran order contiguous e.g. with PyArray_ISFORTRAN(arr) 3. non-contiguous e.g. with !PyArray_ISCONTIGUOUS(arr) && !PyArray_ISFORTRAN(arr)
So there is no way to find out if a non-contiguous array has C order or Fortran order. The same holds for Python code e.g. by use of the flags.
a.flags.contiguous a.flags.fortran
This is very important for me because I just want to avoid to copy every non-contiguous array into a contiguous array. This would be very inefficient. But I can't find an other solution than copying the array. It is inefficient depending on what you mean by inefficient. Memory-wise, copying is obviously inefficient. But speed-wise, copying
Oliver Kranz wrote: the array into a contiguous array in C order is faster in most if not all cases, because of memory access times.
You may want to read the following article from Ulrich Drepper on memory and cache:
That's an interesting note. We already thought about this. At the moment, we decided to consequently avoid copying in our apecial case. It's not unusal to work with data sets consuming about 1 GB of memory. In the case of arrays not being in contiguous C order we have to live with the inefficiency in speed. Cheers, Oliver
participants (3)
-
David Cournapeau
-
Oliver Kranz
-
Timothy Hochberg