Accessing irregular sized array data from C
Hi, If you setup an M x N array like this a = 1.0*numpy.arange(24).reshape(8,3) you can access the data from a C function like this void foo(PyObject * numpy_data) { // Get dimension and data pointer int const m = static_cast<int>(PyArray_DIMS(numpy_data)[0]); int const n = static_cast<int>(PyArray_DIMS(numpy_data)[1]); double * const data = (double *) PyArray_DATA(numpy_data); // Access data ... } Now, suppose I have an irregular shaped numpy array like this a1 = numpy.array([ 1.0, 2.0, 3.0]) a2 = numpy.array([-2.0, 4.0]) a3 = numpy.array([5.0]) b = numpy.array([a1,a2,a3]) How can open up the doors to the array data of b on the C-side? Best regards, Mads -- +---------------------------------------------------------+ | Mads Ipsen | +----------------------+----------------------------------+ | Gåsebæksvej 7, 4. tv | phone: +45-29716388 | | DK-2500 Valby | email: mads.ipsen@gmail.com | | Denmark | map : www.tinyurl.com/ns52fpa | +----------------------+----------------------------------+
On Wed, Jul 2, 2014 at 12:15 PM, Mads Ipsen <mads.ipsen@gmail.com> wrote:
Hi,
If you setup an M x N array like this
a = 1.0*numpy.arange(24).reshape(8,3)
you can access the data from a C function like this
void foo(PyObject * numpy_data) { // Get dimension and data pointer int const m = static_cast<int>(PyArray_DIMS(numpy_data)[0]); int const n = static_cast<int>(PyArray_DIMS(numpy_data)[1]); double * const data = (double *) PyArray_DATA(numpy_data);
// Access data ... }
Now, suppose I have an irregular shaped numpy array like this
a1 = numpy.array([ 1.0, 2.0, 3.0]) a2 = numpy.array([-2.0, 4.0]) a3 = numpy.array([5.0]) b = numpy.array([a1,a2,a3])
How can open up the doors to the array data of b on the C-side?
numpy does not directly support irregular shaped arrays (or ragged arrays). If you look at the result of your example you will see this: In [5]: b Out[5]: array([array([ 1., 2., 3.]), array([-2., 4.]), array([ 5.])], dtype=object) b has datatype object, this means it is a 1d array containing more array objects. Numpy does not directly know about the shapes or types the sub arrays. It is not necessarily homogeneous anymore, but compared to a regular python list you still have elementwise operations (if the contained python objects support them) and it can have multiple dimensions. In C you would access such an array it like this: PyArrayObject * const data = (PyArrayObject *) PyArray_DATA(numpy_data); for (i=0; i < PyArray_DIMS(numpy_data)[0]; i++) { assert(PyArray_Check(data[i])); double * const sub_data = (double *) PyArray_DATA(data[i]); }
On 02/07/14 12:46, Julian Taylor wrote:
On Wed, Jul 2, 2014 at 12:15 PM, Mads Ipsen <mads.ipsen@gmail.com> wrote:
Hi,
If you setup an M x N array like this
a = 1.0*numpy.arange(24).reshape(8,3)
you can access the data from a C function like this
void foo(PyObject * numpy_data) { // Get dimension and data pointer int const m = static_cast<int>(PyArray_DIMS(numpy_data)[0]); int const n = static_cast<int>(PyArray_DIMS(numpy_data)[1]); double * const data = (double *) PyArray_DATA(numpy_data);
// Access data ... }
Now, suppose I have an irregular shaped numpy array like this
a1 = numpy.array([ 1.0, 2.0, 3.0]) a2 = numpy.array([-2.0, 4.0]) a3 = numpy.array([5.0]) b = numpy.array([a1,a2,a3])
How can open up the doors to the array data of b on the C-side?
numpy does not directly support irregular shaped arrays (or ragged arrays). If you look at the result of your example you will see this: In [5]: b Out[5]: array([array([ 1., 2., 3.]), array([-2., 4.]), array([ 5.])], dtype=object)
b has datatype object, this means it is a 1d array containing more array objects. Numpy does not directly know about the shapes or types the sub arrays. It is not necessarily homogeneous anymore, but compared to a regular python list you still have elementwise operations (if the contained python objects support them) and it can have multiple dimensions.
In C you would access such an array it like this:
PyArrayObject * const data = (PyArrayObject *) PyArray_DATA(numpy_data); for (i=0; i < PyArray_DIMS(numpy_data)[0]; i++) { assert(PyArray_Check(data[i])); double * const sub_data = (double *) PyArray_DATA(data[i]); } _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Thanks - that'll get me going! Best, Mads -- +---------------------------------------------------------+ | Mads Ipsen | +----------------------+----------------------------------+ | Gåsebæksvej 7, 4. tv | phone: +45-29716388 | | DK-2500 Valby | email: mads.ipsen@gmail.com | | Denmark | map : www.tinyurl.com/ns52fpa | +----------------------+----------------------------------+
On 02.07.2014 13:44, Mads Ipsen wrote:
On 02/07/14 12:46, Julian Taylor wrote:
On Wed, Jul 2, 2014 at 12:15 PM, Mads Ipsen <mads.ipsen@gmail.com> wrote:
Hi,
If you setup an M x N array like this
a = 1.0*numpy.arange(24).reshape(8,3)
you can access the data from a C function like this
void foo(PyObject * numpy_data) { // Get dimension and data pointer int const m = static_cast<int>(PyArray_DIMS(numpy_data)[0]); int const n = static_cast<int>(PyArray_DIMS(numpy_data)[1]); double * const data = (double *) PyArray_DATA(numpy_data);
// Access data ... }
Now, suppose I have an irregular shaped numpy array like this
a1 = numpy.array([ 1.0, 2.0, 3.0]) a2 = numpy.array([-2.0, 4.0]) a3 = numpy.array([5.0]) b = numpy.array([a1,a2,a3])
How can open up the doors to the array data of b on the C-side?
numpy does not directly support irregular shaped arrays (or ragged arrays). If you look at the result of your example you will see this: In [5]: b Out[5]: array([array([ 1., 2., 3.]), array([-2., 4.]), array([ 5.])], dtype=object)
b has datatype object, this means it is a 1d array containing more array objects. Numpy does not directly know about the shapes or types the sub arrays. It is not necessarily homogeneous anymore, but compared to a regular python list you still have elementwise operations (if the contained python objects support them) and it can have multiple dimensions.
In C you would access such an array it like this:
PyArrayObject * const data = (PyArrayObject *) PyArray_DATA(numpy_data); for (i=0; i < PyArray_DIMS(numpy_data)[0]; i++) { assert(PyArray_Check(data[i])); double * const sub_data = (double *) PyArray_DATA(data[i]); } _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Thanks - that'll get me going!
another thing, don't use int as the index to the array, use npy_intp which is large enough to also index arrays > 4GB if the platform supports it. Also note that object arrays are not very well optimized in numpy, so numerous operations can be slow.
Julian Taylor <jtaylor.debian@googlemail.com> wrote:
another thing, don't use int as the index to the array, use npy_intp which is large enough to also index arrays > 4GB if the platform supports it.
With double* a 32-bit int can index 16 GB, a 32-bit unsigned int can index 32 GB. With char* a 32-bit int can only index 2 GB. Sturla
On 2 Jul 2014 20:12, "Sturla Molden" <sturla.molden@gmail.com> wrote:
Julian Taylor <jtaylor.debian@googlemail.com> wrote:
another thing, don't use int as the index to the array, use npy_intp which is large enough to also index arrays > 4GB if the platform supports it.
With double* a 32-bit int can index 16 GB, a 32-bit unsigned int can index 32 GB.
With char* a 32-bit int can only index 2 GB.
Per dimension, if we're talking about addressing. Numpy internally does all index/stride calculations in units of bytes, though, so if accessing the data array directly and using strides, the only reliable approach is to use intp or equivalent. -n
Nathaniel Smith <njs@pobox.com> wrote:
Numpy internally does all index/stride calculations in units of bytes, though, so if accessing the data array directly and using strides, the only reliable approach is to use intp or equivalent.
If we use PyArray_STRIDES we should use npy_intp, yes, because we are computing the address directly from a char*. It depends on how much we know about the array in advance. Also a C standard pendant would point out we can only assume an int will be at least 16 bit, and we should use long to make sure it is at least 32 bit. Sturla
On Wed, Jul 2, 2014 at 3:46 AM, Julian Taylor <jtaylor.debian@googlemail.com
wrote:
numpy does not directly support irregular shaped arrays (or ragged arrays). If you look at the result of your example you will see this: In [5]: b Out[5]: array([array([ 1., 2., 3.]), array([-2., 4.]), array([ 5.])], dtype=object)
b has datatype object, this means it is a 1d array containing more array objects. Numpy does not directly know about the shapes or types the sub arrays. It is not necessarily homogeneous anymore, but compared to a regular python list you still have elementwise operations (if the contained python objects support them) and it can have multiple dimensions.
All true, but afiew notes: 1) you probably wan to look at Cython for making this sor tof thing easier. 2) a numpy=based ragged array implementation might make sense as well. You essentially store the data in a rank-1 shaped numpy array, and provide custom indexing to get the "rows" out. This would allow you to have all the data in a single memory block available to C (or Cython), so that you could fully optimize indexing and access, and have a data structure that makes sense in pure C. I've enclosed a start off such a class ( I honestly can't remember how far I got with it!, but it was at least useful for one project of mine.) HTH, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Chris Barker <chris.barker@noaa.gov> wrote:
2) a numpy=based ragged array implementation might make sense as well. You essentially store the data in a rank-1 shaped numpy array, and provide custom indexing to get the "rows" out. This would allow you to have all the data in a single memory block available to C (or Cython), so that you could fully optimize indexing and access, and have a data structure that makes sense in pure C.
If the sub-arrays are contiguous, an ndarray of ndarrays is not inherently slower in C than the common double** idiom. As with double** the performance depends on iterating along the contiguous sub-arrays in the innermost loop.
From the Python side it will be more hurtful, yes, but not when working with the NumPy C API.
Sturla
participants (5)
-
Chris Barker -
Julian Taylor -
Mads Ipsen -
Nathaniel Smith -
Sturla Molden