Initializing array from buffer
Hello. Using the numpy.frombuffer function [1] one can initialize a numpy array using an existing python object that implements the buffer protocol [2]. This is great, but currently this function supports only 1D buffers, even if the provided buffer is multidimensional and it exposes all information about its structure (shape, strides, data type). Apparently, one can extract every kind of buffer information out of a buffer of a numpy array (pointer, number of dimensions, shape, strides, suboffsets,...), but the other way around is only partially implemented: providing a multidimensional buffer does not mean being able of creating a numpy array the uses that buffer with the desired structure. My use case is the following: we have a some 3D arrays in our C++ framework. The ordering of the elements in these arrays is neither C nor Fortran style: it might be IJK (i.e. C style, 3rd dimension contiguous in memory), KJI (i.e. Fortran style, first dimension contiguous) or, e.g. IKJ. Moreover we put some padding to optimize aligned access. This kind of memory structure cannot be just expressed as 'C' or 'Fortran', but it can be perfectly expressed using the Python buffer protocol by providing the shape and the strides. We would like to export this structure to a numpy array that should be able of accessing the same memory locations in a consistent way and make some operations like initializing the content or plotting it. Is this currently possible? If not, is it planned to implement such a feature? ========== Maybe just to clarify I could show an example entirely in python. Assume a in a 2D numpy array: a = np.ones((10,20)) It contains information about its structure which can be portably accessed using its data member: a.data.format == 'd' a.data.ndim == 2 a.data.shape == (10,20) a.data.strides == (160,8) Unfortunately, when initializing an array b from this buffer, the structure of the buffer is "downgraded" to unidimensional shape: b = np.frombuffer(a.data) b.ndim == 1 b.shape == (200,) b.strides == (8,) I wished b had the same multi-dimensional structure of a. (This is of course a very simple example. In my use case I would initialize b with my own buffer instead of that of another numpy array). Best regards [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.frombuffer.html [2] https://docs.python.org/3/c-api/buffer.html
Andrea Arteaga <andyspiros@gmail.com> wrote:
My use case is the following: we have a some 3D arrays in our C++ framework. The ordering of the elements in these arrays is neither C nor Fortran style: it might be IJK (i.e. C style, 3rd dimension contiguous in memory), KJI (i.e. Fortran style, first dimension contiguous) or, e.g. IKJ. Moreover we put some padding to optimize aligned access. This kind of memory structure cannot be just expressed as 'C' or 'Fortran', but it can be perfectly expressed using the Python buffer protocol by providing the shape and the strides. We would like to export this structure to a numpy array that should be able of accessing the same memory locations in a consistent way and make some operations like initializing the content or plotting it.
Is this currently possible? If not, is it planned to implement such a feature?
If you are already coding in C++, just use PyArray_New or PyArray_NewFromDescr: http://docs.scipy.org/doc/numpy/reference/c-api.array.html#c.PyArray_New http://docs.scipy.org/doc/numpy/reference/c-api.array.html#c.PyArray_NewFrom... Apart from that, numpy.array and numpy.asarray can also accept a PEP 3118 buffer. Sturla
The np.ndarray constructor <http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html> takes a strides argument argument, and a buffer. Is it not sufficiently flexible? -Robert On Sun, Nov 16, 2014 at 4:27 PM, Sturla Molden <sturla.molden@gmail.com> wrote:
Andrea Arteaga <andyspiros@gmail.com> wrote:
My use case is the following: we have a some 3D arrays in our C++ framework. The ordering of the elements in these arrays is neither C nor Fortran style: it might be IJK (i.e. C style, 3rd dimension contiguous in memory), KJI (i.e. Fortran style, first dimension contiguous) or, e.g. IKJ. Moreover we put some padding to optimize aligned access. This kind of memory structure cannot be just expressed as 'C' or 'Fortran', but it can be perfectly expressed using the Python buffer protocol by providing the shape and the strides. We would like to export this structure to a numpy array that should be able of accessing the same memory locations in a consistent way and make some operations like initializing the content or plotting it.
Is this currently possible? If not, is it planned to implement such a feature?
If you are already coding in C++, just use PyArray_New or PyArray_NewFromDescr:
http://docs.scipy.org/doc/numpy/reference/c-api.array.html#c.PyArray_New
http://docs.scipy.org/doc/numpy/reference/c-api.array.html#c.PyArray_NewFrom...
Apart from that, numpy.array and numpy.asarray can also accept a PEP 3118 buffer.
Sturla
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 18/11/14 04:21, Robert McGibbon wrote:
The np.ndarray constructor <http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html> takes a strides argument argument, and a buffer. Is it not sufficiently flexible?
-Robert
AFAIK the buffer argument is not a memory address but an object exporting the old buffer protocol. We can abuse the __array_interface__ to do this though, but I prefer the C API functions. Wrapping a C pointer with __array_interface__ then becomes something like this (not tested, but should work): import numpy as np cdef class wrapper_array(object): cdef: object readonly __array_interface__ def __init__(wrapper_array self, addr, shape, dtype, order, strides, offset): if strides is None: if order == 'C': strides = None else: strides = _get_fortran_strides(shape, dtype) self.__array_interface__ = dict( data = (addr + offset, False), descr = dtype.descr, shape = shape, strides = strides, typestr = dtype.str, version = 3, ) cdef object _get_fortran_strides(shape, dtype): strides = tuple(dtype.itemsize * np.cumprod((1,) + shape[:-1])) return strides def wrap_pointer(void *addr, shape, dtype, order, strides, offset): """Wraps a C pointer with an ndarray""" return np.asarray(wrapper_array(<Py_uintptr_t> addr, shape, dtype, order, strides, offset)) https://github.com/sturlamolden/sharedmem-numpy/blob/master/sharedmem/array.... Sturla
Thanks everybody for suggesting many different ways to achieve this result. While all of them seem valid methods, I decided to use the constructor, as proposed by Robert:
The np.ndarray constructor <http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html> takes a strides argument argument, and a buffer.
I could easily do this from within C++ in a clean way and without making use of Numpy-specific C code. It sounds just a bit redundant to me the fact that you have to provide the strides and the shape, even if the buffer object already contains this information, but I suppose this is done so to support an older buffer protocol, where only 1D arrays could be defined. I have it working. Thanks once more. All the best Andrea 2014-11-22 4:04 GMT+01:00 Sturla Molden <sturla.molden@gmail.com>:
On 18/11/14 04:21, Robert McGibbon wrote:
The np.ndarray constructor <http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html> takes a strides argument argument, and a buffer. Is it not sufficiently flexible?
-Robert
AFAIK the buffer argument is not a memory address but an object exporting the old buffer protocol. We can abuse the __array_interface__ to do this though, but I prefer the C API functions. Wrapping a C pointer with __array_interface__ then becomes something like this (not tested, but should work):
import numpy as np
cdef class wrapper_array(object):
cdef: object readonly __array_interface__
def __init__(wrapper_array self, addr, shape, dtype, order, strides, offset): if strides is None: if order == 'C': strides = None else: strides = _get_fortran_strides(shape, dtype) self.__array_interface__ = dict( data = (addr + offset, False), descr = dtype.descr, shape = shape, strides = strides, typestr = dtype.str, version = 3, )
cdef object _get_fortran_strides(shape, dtype): strides = tuple(dtype.itemsize * np.cumprod((1,) + shape[:-1])) return strides
def wrap_pointer(void *addr, shape, dtype, order, strides, offset): """Wraps a C pointer with an ndarray""" return np.asarray(wrapper_array(<Py_uintptr_t> addr, shape, dtype, order, strides, offset))
https://github.com/sturlamolden/sharedmem-numpy/blob/master/sharedmem/array....
Sturla
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi Andrea On 2014-11-16 19:42:09, Andrea Arteaga <andyspiros@gmail.com> wrote:
My use case is the following: we have a some 3D arrays in our C++ framework. The ordering of the elements in these arrays is neither C nor Fortran style: it might be IJK (i.e. C style, 3rd dimension contiguous in memory), KJI (i.e. Fortran style, first dimension contiguous) or, e.g. IKJ. Moreover we put some padding to optimize aligned access. This kind of memory structure cannot be just expressed as 'C' or 'Fortran', but it can be perfectly expressed using the Python buffer protocol by providing the shape and the strides. We would like to export this structure to a numpy array that should be able of accessing the same memory locations in a consistent way and make some operations like initializing the content or plotting it.
Is this currently possible? If not, is it planned to implement such a feature?
This looks like something that should be accomplished fairly easily using the ``__array_interface__`` dictionary, as described here: http://docs.scipy.org/doc/numpy/reference/arrays.interface.html Any object that exposes a suitable dictionary named ``__array_interface__`` may be converted to a NumPy array. It has the following important keys: shape typestr data: (20495857, True); 2-tuple—pointer to data and boolean to indicate whether memory is read-only strides version: 3 Regards Stéfan
Have you tried using the C-API to create the array? This link might be of help: http://docs.scipy.org/doc/numpy/reference/c-api.array.html#creating-arrays I know that Boost.Python can handle this. On Sun, Nov 16, 2014 at 3:42 PM, Andrea Arteaga <andyspiros@gmail.com> wrote:
Hello. Using the numpy.frombuffer function [1] one can initialize a numpy array using an existing python object that implements the buffer protocol [2]. This is great, but currently this function supports only 1D buffers, even if the provided buffer is multidimensional and it exposes all information about its structure (shape, strides, data type).
Apparently, one can extract every kind of buffer information out of a buffer of a numpy array (pointer, number of dimensions, shape, strides, suboffsets,...), but the other way around is only partially implemented: providing a multidimensional buffer does not mean being able of creating a numpy array the uses that buffer with the desired structure.
My use case is the following: we have a some 3D arrays in our C++ framework. The ordering of the elements in these arrays is neither C nor Fortran style: it might be IJK (i.e. C style, 3rd dimension contiguous in memory), KJI (i.e. Fortran style, first dimension contiguous) or, e.g. IKJ. Moreover we put some padding to optimize aligned access. This kind of memory structure cannot be just expressed as 'C' or 'Fortran', but it can be perfectly expressed using the Python buffer protocol by providing the shape and the strides. We would like to export this structure to a numpy array that should be able of accessing the same memory locations in a consistent way and make some operations like initializing the content or plotting it.
Is this currently possible? If not, is it planned to implement such a feature?
==========
Maybe just to clarify I could show an example entirely in python. Assume a in a 2D numpy array:
a = np.ones((10,20))
It contains information about its structure which can be portably accessed using its data member:
a.data.format == 'd' a.data.ndim == 2 a.data.shape == (10,20) a.data.strides == (160,8)
Unfortunately, when initializing an array b from this buffer, the structure of the buffer is "downgraded" to unidimensional shape:
b = np.frombuffer(a.data)
b.ndim == 1 b.shape == (200,) b.strides == (8,)
I wished b had the same multi-dimensional structure of a.
(This is of course a very simple example. In my use case I would initialize b with my own buffer instead of that of another numpy array).
Best regards
[1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.frombuffer.html [2] https://docs.python.org/3/c-api/buffer.html
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi. Yesterday I tried to make use of the C API, but I did not manage to have anything useful. The reference is very well done, but I feel the lack for some tutorial that would guide with some examples. Do you know of any? The array interface looks sounds like a very good solution. In a sense it is a Numpy specific version of the buffer protocol, a bit simpler but less generic. It looks very easy to implement and clean. I will try this way. Thank you so much for the useful links. Andrea Arteaga 2014-11-17 18:08 GMT+01:00 Edison Gustavo Muenz <edisongustavo@gmail.com>:
Have you tried using the C-API to create the array? This link might be of help: http://docs.scipy.org/doc/numpy/reference/c-api.array.html#creating-arrays
I know that Boost.Python can handle this.
On Sun, Nov 16, 2014 at 3:42 PM, Andrea Arteaga <andyspiros@gmail.com> wrote:
Hello. Using the numpy.frombuffer function [1] one can initialize a numpy array using an existing python object that implements the buffer protocol [2]. This is great, but currently this function supports only 1D buffers, even if the provided buffer is multidimensional and it exposes all information about its structure (shape, strides, data type).
Apparently, one can extract every kind of buffer information out of a buffer of a numpy array (pointer, number of dimensions, shape, strides, suboffsets,...), but the other way around is only partially implemented: providing a multidimensional buffer does not mean being able of creating a numpy array the uses that buffer with the desired structure.
My use case is the following: we have a some 3D arrays in our C++ framework. The ordering of the elements in these arrays is neither C nor Fortran style: it might be IJK (i.e. C style, 3rd dimension contiguous in memory), KJI (i.e. Fortran style, first dimension contiguous) or, e.g. IKJ. Moreover we put some padding to optimize aligned access. This kind of memory structure cannot be just expressed as 'C' or 'Fortran', but it can be perfectly expressed using the Python buffer protocol by providing the shape and the strides. We would like to export this structure to a numpy array that should be able of accessing the same memory locations in a consistent way and make some operations like initializing the content or plotting it.
Is this currently possible? If not, is it planned to implement such a feature?
==========
Maybe just to clarify I could show an example entirely in python. Assume a in a 2D numpy array:
a = np.ones((10,20))
It contains information about its structure which can be portably accessed using its data member:
a.data.format == 'd' a.data.ndim == 2 a.data.shape == (10,20) a.data.strides == (160,8)
Unfortunately, when initializing an array b from this buffer, the structure of the buffer is "downgraded" to unidimensional shape:
b = np.frombuffer(a.data)
b.ndim == 1 b.shape == (200,) b.strides == (8,)
I wished b had the same multi-dimensional structure of a.
(This is of course a very simple example. In my use case I would initialize b with my own buffer instead of that of another numpy array).
Best regards
[1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.frombuffer.html [2] https://docs.python.org/3/c-api/buffer.html
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sun, Nov 16, 2014 at 5:42 PM, Andrea Arteaga <andyspiros@gmail.com> wrote:
Hello. Using the numpy.frombuffer function [1] one can initialize a numpy array using an existing python object that implements the buffer protocol [2]. This is great, but currently this function supports only 1D buffers, even if the provided buffer is multidimensional and it exposes all information about its structure (shape, strides, data type).
np.frombuffer is not often used, and I'm not sure if it's been updated for the new py3 buffer protocol. (The old buffer protocol only supported 1d buffers.) Have you tried just using np.asarray? It seems to work fine with multidimensional memoryview objects at least, which use the new buffer protocol: In [12]: a = np.ones((2, 3)) In [13]: a_buf = memoryview(a) In [14]: a_buf Out[14]: <memory at 0x7ffd071a2d60> In [15]: np.asarray(a_buf) Out[15]: array([[ 1., 1., 1.], [ 1., 1., 1.]]) -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
participants (6)
-
Andrea Arteaga -
Edison Gustavo Muenz -
Nathaniel Smith -
Robert McGibbon -
Stefan van der Walt -
Sturla Molden