[Numpy-discussion] Unexpected behavior with numpy array

Mon Feb 4 00:47:28 EST 2008

Damian Eads wrote:
> Robert Kern wrote:
>> Damian Eads wrote:
>>> Here's another question: is there any way to construct a numpy array and 
>>> specify the buffer address where it should store its values? I ask 
>>> because I would like to construct numpy arrays that work on buffers that 
>>> come from mmap.
>> Can you clarify that a little? By "buffer" do you mean a Python buffer() object? 
> 
> Yes, I mean the .data field of a numpy array, which is a buffer object, 
> and points to the memory where an array's values are stored.

Actually, the .data field is always constructed by ndarray; it is never provided 
*to* ndarray even if you construct the ndarray from a buffer object. The buffer 
object's information is interpreted to construct the ndarray object and then the 
original buffer object is ignored. The .data attribute will be constructed 
"on-the-fly" when it is requested.

In [9]: from numpy import *

In [10]: s = 'aaaa'

In [11]: b = buffer(s)

In [12]: a = frombuffer(b, dtype=int32)

In [13]: a.data is b
Out[13]: False

In [14]: d1 = a.data

In [15]: d2 = a.data

In [16]: d1 is d2
Out[16]: False

>> By "mmap" do you mean Python's mmap in the standard library?
> 
> I actually was referring to the C Standard Library's mmap. My intention 
> was to use a pointer returned by C-mmap as the ".data" buffer to store 
> array values.
> 
>> numpy has a memmap class which subclasses ndarray to wrap a mmapped file. It 
>> handles the opening and mmapping of the file itself, but it could be subclassed 
>> to override this behavior to take an already opened mmap object.
> 
> This may satisfy my needs. I'm going to look into it and get back to you.
> 
>> In general, if you have a buffer() object, you can make an array from it using 
>> numpy.frombuffer(). This will be a standard ndarray and won't have the 
>> conveniences of syncing to disk that the memmap class provides.
> 
> This is good to know because there have been a few situations when this 
> would have been very useful.
> 
> Suppose I do something like (in Python):
> 
>    import ctypes
>    mylib = ctypes.CDLL('libmylib.so')
>    y = mylib.get_float_array_from_c_function()
> 
> which returns a float* as a Python int, and then I do
> 
>    nelems = mylib.get_float_array_num_elems()
>    x = numpy.frombuffer(ctypes.c_buffer(y), 'float', nelems)
> 
> This gives me an ndarray x with its (.data) buffer pointing to the 
> memory address give by y. When the ndarray x is no longer referenced 
> (even as another array's base), does numpy attempt to free the memory 
> pointed to by y? In other words, does numpy always deallocate the 
> (.data) buffer in the __del__ method? Or, does fromarray set a flag 
> telling it not to?

By default, frombuffer() creates an array that is flagged as not owning the 
data. That means it will not delete the data memory when the ndarray object is 
destroyed.

In [69]: import ctypes

In [70]: ca = (ctypes.c_int*8)()

In [71]: a = frombuffer(ci, int)

In [72]: a
Out[72]: array([0, 0, 0, 0, 0, 0, 0, 0])

In [73]: a.flags
Out[73]:
   C_CONTIGUOUS : True
   F_CONTIGUOUS : True
   OWNDATA : False
   WRITEABLE : True
   ALIGNED : True
   UPDATEIFCOPY : False

>> If you don't have a buffer() object, but just have a pointer allocated from some 
>> C code, then you *could* fake an object which exposes the __array_interface__() 
>> method to describe the memory. The numpy.asarray() constructor will use that to 
>> make an ndarray object that uses the specified memory. This is advanced stuff 
>> and difficult to get right because of memory ownership and object lifetime 
>> issues.
> 
> Allocating memory in C code would be very useful for me. If I were to 
> use such a numpy.asarray() function (seems the frombuffer you mentioned 
> would also work as described above),

Yes, if you can create the buffer object or something that obeys the buffer 
protocol. ctypes arrays work fine; ctypes pointers don't.

> it makes sense for the C code to be 
> responsible for deallocating the memory, not numpy. I understand that I 
> would need to ensure that the deallocation happens only when the 
> containing ndarray is no longer referenced anywhere in Python 
> (hopefully, ndarray's finalization code does not need access to the 
> .data buffer).

My experience has been that this is fairly difficult to do. If you have 
*complete* control of the ndarray object over its entire lifetime, then this is 
reasonable. If you don't, then you are going to run into (nondeterministic!) 
segfaulting bugs eventually. For example, if you are only using it as a 
temporary inside a function and never return it, this is fine. You will also 
need to be very careful about constructing views from the ndarray; these will 
need to be controlled, too. You will have a bug if you delete myarray but return 
reversed_array=myarray[::-1], for example.

I see that you are using ctypes. Be sure to take a look at the .ctypes attribute 
on ndarrays. This allows you to get a ctypes pointer object from an array. This 
might help you use numpy to allocate the memory and pass that in to your C 
functions.

In [47]: a.ctypes.data_as(ctypes.POINTER(ctypes.c_int))
Out[47]: <ctypes.LP_c_long object at 0x1c7c800>

   http://www.scipy.org/Cookbook/Ctypes

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco