Setting contents of buffer for array object
Hi, I am sorry if I have missed something obvious, but is there any way in python of doing this: import numpy as np a = np.arange(10) b = np.arange(10)+1 a.data = b.data # raises error, but I hope you see what I mean ? Thanks a lot for any pointers. Matthew
On Feb 10, 2008 5:15 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
I am sorry if I have missed something obvious, but is there any way in python of doing this:
import numpy as np a = np.arange(10) b = np.arange(10)+1 a.data = b.data # raises error, but I hope you see what I mean
?
Not really, no. Can you describe your use case in more detail? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
import numpy as np a = np.arange(10) b = np.arange(10)+1 a.data = b.data # raises error, but I hope you see what I mean
?
Not really, no. Can you describe your use case in more detail?
Yes - I am just writing the new median implementation. To allow future optimization, I would like to have the same signature as mean(): def median(a, axis=0, dtype=None, out=None) (axis=0 to change to axis=None default at some point). To do this, I need to copy the results of the median calculation in the routine into the array object given by 'out' - when passed. Matthew
On Feb 10, 2008 6:48 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
import numpy as np a = np.arange(10) b = np.arange(10)+1 a.data = b.data # raises error, but I hope you see what I mean
?
Not really, no. Can you describe your use case in more detail?
Yes - I am just writing the new median implementation. To allow future optimization, I would like to have the same signature as mean():
def median(a, axis=0, dtype=None, out=None)
(axis=0 to change to axis=None default at some point).
To do this, I need to copy the results of the median calculation in the routine into the array object given by 'out' - when passed.
Ah, I see. You definitely do not want to reassign the .data buffer in this case. An out= parameter does not reassign the memory location that the array object points to. It should use the allocated memory that was already there. It shouldn't "copy" anything at all; otherwise, "median(x, out=out)" is no better than "out[:] = median(x)". Personally, I don't think that a function should expose an out= parameter unless if it can make good on that promise of memory efficency. Can you show us the current implementation that you have? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Ah, I see. You definitely do not want to reassign the .data buffer in this case. An out= parameter does not reassign the memory location that the array object points to. It should use the allocated memory that was already there. It shouldn't "copy" anything at all; otherwise, "median(x, out=out)" is no better than "out[:] = median(x)". Personally, I don't think that a function should expose an out= parameter unless if it can make good on that promise of memory efficency.
I agree - but there are more efficient median algorithms out there which can make use of the memory efficiently. I wanted to establish the call signature to allow that. I don't feel strongly about it though.
Can you show us the current implementation that you have?
is attached, comments welcome... Matthew
On Feb 10, 2008 7:17 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Ah, I see. You definitely do not want to reassign the .data buffer in this case. An out= parameter does not reassign the memory location that the array object points to. It should use the allocated memory that was already there. It shouldn't "copy" anything at all; otherwise, "median(x, out=out)" is no better than "out[:] = median(x)". Personally, I don't think that a function should expose an out= parameter unless if it can make good on that promise of memory efficency.
I agree - but there are more efficient median algorithms out there which can make use of the memory efficiently. I wanted to establish the call signature to allow that. I don't feel strongly about it though.
I say add the out= parameter when you use such an algorithm. But if you like, just use slice assignment for now. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 10/02/2008, Matthew Brett <matthew.brett@gmail.com> wrote:
Ah, I see. You definitely do not want to reassign the .data buffer in this case. An out= parameter does not reassign the memory location that the array object points to. It should use the allocated memory that was already there. It shouldn't "copy" anything at all; otherwise, "median(x, out=out)" is no better than "out[:] = median(x)". Personally, I don't think that a function should expose an out= parameter unless if it can make good on that promise of memory efficency.
I agree - but there are more efficient median algorithms out there which can make use of the memory efficiently. I wanted to establish the call signature to allow that. I don't feel strongly about it though.
This is a startling claim! Are there really median algorithms that are faster for having the use of a single float as storage space? If it were permissible to mutilate the original array in-place, I can certainly see a good median algorithm (based on quicksort, perhaps) being faster, but modifying the input array is a different question from using an output array. I can also see that this could possibly be improved by using a for loop to iterate over the output elements, so that there was no need to duplicate the large input array, or perhaps a "blocked" iteration that duplicated arrays of modest size would be better. But how can a single float per data set whose median is being taken help? Anne
Hi,
I can also see that this could possibly be improved by using a for loop to iterate over the output elements, so that there was no need to duplicate the large input array, or perhaps a "blocked" iteration that duplicated arrays of modest size would be better. But how can a single float per data set whose median is being taken help?
Sorry, you are right to call me on this very sloppy late-night phrasing - I only meant that it would be useful in due course to use a C implementation for median such as the ones you're describing, and that this could write the result directly into the in-place memory - in the same way that mean() does. It's quite true that it's difficult to imagine the algorithm itself benefiting from the memory buffer. Thanks, Matthew
On 11/02/2008, Matthew Brett <matthew.brett@gmail.com> wrote:
I can also see that this could possibly be improved by using a for loop to iterate over the output elements, so that there was no need to duplicate the large input array, or perhaps a "blocked" iteration that duplicated arrays of modest size would be better. But how can a single float per data set whose median is being taken help?
Sorry, you are right to call me on this very sloppy late-night phrasing - I only meant that it would be useful in due course to use a C implementation for median such as the ones you're describing, and that this could write the result directly into the in-place memory - in the same way that mean() does. It's quite true that it's difficult to imagine the algorithm itself benefiting from the memory buffer.
My point was not to catch you in an error - goodness knows I make enough of those, and not only late at night! - but to point out that there may not really be much need for an output argument. Even with a C code, for the median to be of much use, the output array can be at most half the size of the input array. The extra storage space required is not that big a concern, unlike a ufunc, and including an output argument forces you to deal with all sorts of data conversion issues. On the other hand, there is something to be said for allowing the code to destroy the input array. Perhaps *that* should be an optional argument (defaulting to zero)? Anne
participants (3)
-
Anne Archibald -
Matthew Brett -
Robert Kern