Re: [Numpy-discussion] Setting contents of buffer for array object

Matthew Brett wrote:
import numpy as np a = np.arange(10) b = np.arange(10)+1 a.data = b.data # raises error, but I hope you see what I mean
? Not really, no. Can you describe your use case in more detail?
Yes - I am just writing the new median implementation. To allow future optimization, I would like to have the same signature as mean():
def median(a, axis=0, dtype=None, out=None)
(axis=0 to change to axis=None default at some point).
To do this, I need to copy the results of the median calculation in the routine into the array object given by 'out' - when passed.
My understanding of numerical routines that accept an "out" parameter is that this is a convention for in-place algorithms. When None is passed in the out parameter, it's the caller's way of indicating that in-place is not needed, and a new array is allocated to store the result; otherwise, the result is stored in the 'out' array. Either way, the result is returned. One can break from this convention by allocating more memory than provided by the out array but that's a performance issue that may or may not be unavoidable. Remember that A[:] = <expr> sets the value of the elements in A to the values of array elements in the expression expr, and this copying is done in-place. To copy an array C, and make the copy contiguous, use the .copy() method on C. Assigning the .data buffers is not something I have seen before in non-constructor (or npn=pseudo-constructor like from_buffer) code. I think it might even be dangerous if you don't do it right. If one does not properly recalculate the strides of A, slicing operations on A may not behave as expected. If this is library code, reassigning the .data buffer can confuse the user, since it messes up array view semantics. Suppose I'm an ignorant user and I write the following code: A=numpy.random.rand(10,20) dummy_input=numpy.random.rand(10,20) B=A.T C=B[0::-1,:] then I use a library function foo (suppose foo accepts an input array inp and an output array out, and assigns out.data to something else) foo(in=dummy_input, out=B) Now, A and B point to two different .data buffers, B's base points to A, and C's base points to B but A and C share the same .data buffer. As a user, I may expect B and C to be a view of A (certainly B isn't), and C to be a view of B (which is verified by checking 'C.base is B') but changing C's values changes A's but not B's. That's confusing. Also, suppose B's new data buffer has less elements than its original data buffer. I may be clever and set B's size and strides attributes accordingly but changing C's values might cause the manipulation of undefined memory. Damian
participants (1)
-
Damian R. Eads