Hello, I would like to expose an existing (C++) object as a NumPy array to Python. Right now I'm using PyArray_New, passing the pointer to my object's storage. It now happens that the storage point of my object may change over its lifetime, so I'd like to change the pointer that is used in the PyArrayObject. Is there any API to do this ? (I'd like to avoid allocating a new PyArrayObject, as that is presumably a costly operation.) If not, may I access (i.e., change) the "data" member of the array object, or would I risk corrupting the application state doing that ? Many thanks, Stefan -- ...ich hab' noch einen Koffer in Berlin...
Hi Stefan, Allocating a new PyArrayObject isn't terribly expensive (compared to all the other allocations that Python programs are constantly doing), but I'm afraid you have a more fundamental problem. The reason there is no supported API to change the storage pointer of a PyArrayObject is that the semantics of PyArrayObject are that the data must remain allocated, and in the same place, until the PyArrayObject is freed (and when this happens is in general is up to the garbage collector, not you). You could make a copy, but you can't free the original buffer until Python tells you you can. The problem is that many simple operations on arrays return views, which are implemented as independent PyArrayObjects whose data field points directly into your memory buffer; these views will hold a reference to your PyArrayObject, but there's no supported way to reverse this mapping to find all the views that might be pointing into your buffer. If you're very determined there are probably hacks you could use (be very careful never to allocate views, or maybe gc.getreferrers() will work to let you run around and fix up all the views), but at that point you're kind of on your own anyway, and violating PyArrayObject's encapsulation boundary is the least of your worries :-). Hope things are well with you, -n On Thu, May 22, 2014 at 12:03 AM, Stefan Seefeld <stefan@seefeld.name> wrote:
Hello,
I would like to expose an existing (C++) object as a NumPy array to Python. Right now I'm using PyArray_New, passing the pointer to my object's storage. It now happens that the storage point of my object may change over its lifetime, so I'd like to change the pointer that is used in the PyArrayObject. Is there any API to do this ? (I'd like to avoid allocating a new PyArrayObject, as that is presumably a costly operation.) If not, may I access (i.e., change) the "data" member of the array object, or would I risk corrupting the application state doing that ?
Many thanks, Stefan
--
...ich hab' noch einen Koffer in Berlin...
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
Hi Nathaniel, thanks for the prompt and thorough answer. You are entirely right, I hadn't thought things through properly, so let me back up a bit. I want to provide Python bindings to a C++ library I'm writing, which is based on vector/matrix/tensor data types. In my naive view I would expose these data types as NumPy arrays, creating PyArrayObject instances as "wrappers", i.e. who borrow raw pointers to the storage managed by the C++ objects. To make things slightly more interesting, those C++ objects have their own storage management mechanism, which allows data to migrate across different address spaces (such as from host to GPU-device memory), and thus whether the host storage is valid (i.e., contains up-to-date data) or not depends on where the last operation was performed (which is controlled by an operation dispatcher that is part of the library, too). It seems if I let Python control the data lifetime, and borrow the data temporarily from C++ I may be fine. However, I may want to expose pre-existing C++ objects into Python, though, and it sounds like that might be dangerous unless I am willing to clone the data so the Python runtime can hold on to that even after my C++ runtime has released theirs. But that changes the semantics, as the Python runtime no longer sees the same data as the C++ runtime, unless I keep the two in sync each time I cross the language boundary, which may be quite a costly operation... Does all that sound sensible ? It seems I have some more design to do. Thanks, Stefan -- ...ich hab' noch einen Koffer in Berlin...
Hi Stefan, One possibility that comes to mind: you may want in any case some way to temporarily "pin" an object's memory in place (e.g., to prevent one thread trying to migrate it while some other thread is working on it). If so then the Python wrapper could acquire a pin when the ndarray is allocated, and release it when it is released. (The canonical way to do this is to create a little opaque Python class that knows how to do the acquire/release, and then assign it to the 'base' attribute of your array -- the semantics of 'base' are simply that ndarray.__del__ will decref whatever object is in 'base'.) -n On Thu, May 22, 2014 at 12:44 AM, Stefan Seefeld <stefan@seefeld.name> wrote:
Hi Nathaniel,
thanks for the prompt and thorough answer. You are entirely right, I hadn't thought things through properly, so let me back up a bit.
I want to provide Python bindings to a C++ library I'm writing, which is based on vector/matrix/tensor data types. In my naive view I would expose these data types as NumPy arrays, creating PyArrayObject instances as "wrappers", i.e. who borrow raw pointers to the storage managed by the C++ objects. To make things slightly more interesting, those C++ objects have their own storage management mechanism, which allows data to migrate across different address spaces (such as from host to GPU-device memory), and thus whether the host storage is valid (i.e., contains up-to-date data) or not depends on where the last operation was performed (which is controlled by an operation dispatcher that is part of the library, too).
It seems if I let Python control the data lifetime, and borrow the data temporarily from C++ I may be fine. However, I may want to expose pre-existing C++ objects into Python, though, and it sounds like that might be dangerous unless I am willing to clone the data so the Python runtime can hold on to that even after my C++ runtime has released theirs. But that changes the semantics, as the Python runtime no longer sees the same data as the C++ runtime, unless I keep the two in sync each time I cross the language boundary, which may be quite a costly operation...
Does all that sound sensible ?
It seems I have some more design to do.
Thanks, Stefan
--
...ich hab' noch einen Koffer in Berlin...
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
Hi Nathaniel, On 2014-05-21 20:15, Nathaniel Smith wrote:
Hi Stefan,
One possibility that comes to mind: you may want in any case some way to temporarily "pin" an object's memory in place (e.g., to prevent one thread trying to migrate it while some other thread is working on it). If so then the Python wrapper could acquire a pin when the ndarray is allocated, and release it when it is released. (The canonical way to do this is to create a little opaque Python class that knows how to do the acquire/release, and then assign it to the 'base' attribute of your array -- the semantics of 'base' are simply that ndarray.__del__ will decref whatever object is in 'base'.)
That's an interesting thought. So instead of creating an ndarray with a lifetime as long as the wrapped C++ object, I would create an ndarray only temporarily, as a view into my C++ object, and over whose lifetime the storage is pinned to host memory. The (Python) API needs to make it clear that, while it is ok to reference vector and matrix objects, referring to their "array" members should be confined to small scopes, as within those scopes the underlying memory is pinned, and no operation that would involve a relocation of the data (such as OpenCL kernels) may be called. Not following such rules may result in deadlocks... I think I like that approach. Explicit is better than implicit. :-) Thanks ! Stefan
-n
On Thu, May 22, 2014 at 12:44 AM, Stefan Seefeld <stefan@seefeld.name> wrote:
Hi Nathaniel,
thanks for the prompt and thorough answer. You are entirely right, I hadn't thought things through properly, so let me back up a bit.
I want to provide Python bindings to a C++ library I'm writing, which is based on vector/matrix/tensor data types. In my naive view I would expose these data types as NumPy arrays, creating PyArrayObject instances as "wrappers", i.e. who borrow raw pointers to the storage managed by the C++ objects. To make things slightly more interesting, those C++ objects have their own storage management mechanism, which allows data to migrate across different address spaces (such as from host to GPU-device memory), and thus whether the host storage is valid (i.e., contains up-to-date data) or not depends on where the last operation was performed (which is controlled by an operation dispatcher that is part of the library, too).
It seems if I let Python control the data lifetime, and borrow the data temporarily from C++ I may be fine. However, I may want to expose pre-existing C++ objects into Python, though, and it sounds like that might be dangerous unless I am willing to clone the data so the Python runtime can hold on to that even after my C++ runtime has released theirs. But that changes the semantics, as the Python runtime no longer sees the same data as the C++ runtime, unless I keep the two in sync each time I cross the language boundary, which may be quite a costly operation...
Does all that sound sensible ?
It seems I have some more design to do.
Thanks, Stefan
--
...ich hab' noch einen Koffer in Berlin...
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- ...ich hab' noch einen Koffer in Berlin...
participants (2)
-
Nathaniel Smith
-
Stefan Seefeld