Re: [Numpy-discussion] Should arr.diagonal() return a copy or aview? (1.7 compatibility issue)
On Wed, 2012-05-23 at 17:31 -0500, Nathaniel Smith wrote:
On Wed, May 23, 2012 at 10:53 PM, Travis Oliphant <travis@continuum.io> wrote:
To be clear, I'm not opposed to the change, and it looks like we should go forward.
In my mind it's not about developers vs. users as satisfying users is the whole point. The purpose of NumPy is not to make its developers happy :-). But, users also want there to *be* developers on NumPy so developer happiness is not irrelevant.
In this case, though, there are consequences for users because of the double copy if a user wants to make their code future proof. We are always trading off predicted user-experiences. I hope that we all don't have the same perspective on every issue or more than likely their aren't enough voices being heard from real users.
I'm not really worried about users who have a problem with the double-copy. It's a totally legitimate concern, but anyone who has that concern has already understood the issues well enough to be able to take care of themselves, and decided that it's worth the effort to special-case this. They can check whether the returned array has .base set to tell whether it's an array or a view, use a temporary hack to check for the secret warning flag in arr.flags.num, check the numpy version, all sorts of things to get them through the one version where this matters. The suggestion in the docs to make a copy is not exactly binding :-).
-- Nathaniel
As a "real user", if I care about whether an array arr2 is a copy or a view, I usually either check arr2.flags.owndata or append copy() to the statement that created arr2, e.g., arr2 = arr.diagonal().copy(). Numpy rules on views vs. copies sometimes require a bit of thought, and so I'll frequently just check the flags or make a copy instead of thinking. (More foolproof :).) Kathy -
On Wed, May 23, 2012 at 4:16 PM, Kathleen M Tacina < Kathleen.M.Tacina@nasa.gov> wrote:
** On Wed, 2012-05-23 at 17:31 -0500, Nathaniel Smith wrote:
On Wed, May 23, 2012 at 10:53 PM, Travis Oliphant <travis@continuum.io> wrote:> To be clear, I'm not opposed to the change, and it looks like we should go forward.>> In my mind it's not about developers vs. users as satisfying users is the whole point. The purpose of NumPy is not to make its developers happy :-). But, users also want there to *be* developers on NumPy so developer happiness is not irrelevant.>> In this case, though, there are consequences for users because of the double copy if a user wants to make their code future proof. We are always trading off predicted user-experiences. I hope that we all don't have the same perspective on every issue or more than likely their aren't enough voices being heard from real users. I'm not really worried about users who have a problem with thedouble-copy. It's a totally legitimate concern, but anyone who hasthat concern has already understood the issues well enough to be ableto take care of themselves, and decided that it's worth the effort tospecial-case this. They can check whether the returned array has .baseset to tell whether it's an array or a view, use a temporary hack tocheck for the secret warning flag in arr.flags.num, check the numpyversion, all sorts of things to get them through the one version wherethis matters. The suggestion in the docs to make a copy is not exactlybinding :-). -- Nathaniel
As a "real user", if I care about whether an array arr2 is a copy or a view, I usually either check arr2.flags.owndata or append copy() to the statement that created arr2, e.g., arr2 = arr.diagonal().copy().
Numpy rules on views vs. copies sometimes require a bit of thought, and so I'll frequently just check the flags or make a copy instead of thinking. (More foolproof :).)
It seems that there are a number of ways to check if an array is a view. Do we have a preferred way in the API that is guaranteed to stay available? Or are all of the various methods "here to stay"?
On 05/23/2012 05:31 PM, T J wrote:
It seems that there are a number of ways to check if an array is a view. Do we have a preferred way in the API that is guaranteed to stay available? Or are all of the various methods "here to stay"?
We've settled on checking array.base, which I think was the outcome of a stackoverflow thread that I can't dig up. (I'll check with the guy who wrote the code.) -- Jonathan Niehof ISR-3 Space Data Systems Los Alamos National Laboratory MS-D466 Los Alamos, NM 87545 Phone: 505-667-9595 email: jniehof@lanl.gov Correspondence / Technical data or Software Publicly Available
On Thu, May 24, 2012 at 10:56 AM, Jonathan T. Niehof <jniehof@lanl.gov>wrote:
On 05/23/2012 05:31 PM, T J wrote:
It seems that there are a number of ways to check if an array is a view. Do we have a preferred way in the API that is guaranteed to stay available? Or are all of the various methods "here to stay"?
We've settled on checking array.base, which I think was the outcome of a stackoverflow thread that I can't dig up. (I'll check with the guy who wrote the code.)
Just as a quick word to the wise. I think I can recall a situation where this could be misleading. In particular, I think it had to do with boolean/fancy indexing of an array. In some cases, what you get is a view of the copy of the original data. So, if you simply check to see if it is a view, and then assume that because it is a view, it must be a view of the original data, then that assumption can come back and bite you in strange ways. Cheers! Ben Root
On Thu, May 24, 2012 at 3:56 PM, Jonathan T. Niehof <jniehof@lanl.gov> wrote:
On 05/23/2012 05:31 PM, T J wrote:
It seems that there are a number of ways to check if an array is a view. Do we have a preferred way in the API that is guaranteed to stay available? Or are all of the various methods "here to stay"?
We've settled on checking array.base, which I think was the outcome of a stackoverflow thread that I can't dig up. (I'll check with the guy who wrote the code.)
The problem is that "is a view" isn't a very meaningful concept... checking .base will tell you whether writes to an array are likely to affect some object that existed before that array was created. But it doesn't tell you whether writes to that array can affect any *particular* other object (at least without a fair amount of groveling around the innards of both objects), and it can happen that an object has base == None yet writes to it will affect another object, and it can happen that an object has base != None and yet writes to it won't affect any object that was ever accessible to your code. AFAICT it's really these other questions that one would like to answer, and checking .base won't answer them. -- Nathaniel
On Thu, May 24, 2012 at 4:10 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, May 24, 2012 at 3:56 PM, Jonathan T. Niehof <jniehof@lanl.gov> wrote:
On 05/23/2012 05:31 PM, T J wrote:
It seems that there are a number of ways to check if an array is a view. Do we have a preferred way in the API that is guaranteed to stay available? Or are all of the various methods "here to stay"?
We've settled on checking array.base, which I think was the outcome of a stackoverflow thread that I can't dig up. (I'll check with the guy who wrote the code.)
The problem is that "is a view" isn't a very meaningful concept... checking .base will tell you whether writes to an array are likely to affect some object that existed before that array was created. But it doesn't tell you whether writes to that array can affect any *particular* other object (at least without a fair amount of groveling around the innards of both objects), and it can happen that an object has base == None yet writes to it will affect another object, and it can happen that an object has base != None and yet writes to it won't affect any object that was ever accessible to your code. AFAICT it's really these other questions that one would like to answer, and checking .base won't answer them.
numpy.may_share_memory() gets closer, but it can be defeated by certain striding patterns. At least, it is conservative and reports false positives but not false negatives. Implementing numpy.does_share_memory() correctly involves some number theory and hairy edge cases. (Hmm, now that I think about it, the edge cases are when the strides are 0 or negative. 0-stride axes can simply be removed, and I think we should be able to work back to a first item and flip the sign on the negative strides. The typical positive-stride solution can be found in an open source C++ global array code, IIRC. Double-hmmm...) -- Robert Kern
On Thu, May 24, 2012 at 5:52 PM, Robert Kern <robert.kern@gmail.com> wrote:
(Hmm, now that I think about it, the edge cases are when the strides are 0 or negative. 0-stride axes can simply be removed, and I think we should be able to work back to a first item and flip the sign on the negative strides. The typical positive-stride solution can be found in an open source C++ global array code, IIRC. Double-hmmm...)
Except that it's still NP-complete. -- Robert Kern
On 05/25/2012 03:17 PM, Robert Kern wrote:
On Thu, May 24, 2012 at 5:52 PM, Robert Kern<robert.kern@gmail.com> wrote:
(Hmm, now that I think about it, the edge cases are when the strides are 0 or negative. 0-stride axes can simply be removed, and I think we should be able to work back to a first item and flip the sign on the negative strides. The typical positive-stride solution can be found in an open source C++ global array code, IIRC. Double-hmmm...)
Except that it's still NP-complete.
Well, I guess N would be the number of dimensions, so that by itself doesn't tell us all that much. Question is if the worst case is no better than the trivial O(number of elements in the matrices), which would be bad. Dag
On May 25, 2012 2:21 PM, "Robert Kern" <robert.kern@gmail.com> wrote:
On Thu, May 24, 2012 at 5:52 PM, Robert Kern <robert.kern@gmail.com> wrote:
(Hmm, now that I think about it, the edge cases are when the strides are 0 or negative. 0-stride axes can simply be removed, and I think we should be able to work back to a first item and flip the sign on the negative strides. The typical positive-stride solution can be found in an open source C++ global array code, IIRC. Double-hmmm...)
Except that it's still NP-complete.
Huh, is it really? I'm pretty sure checking the existence of a solution to a linear Diophantine equation is cheap, but I guess figuring out whether it falls within the "shape" bounds is less obvious... -- Nathaniel
On Fri, May 25, 2012 at 3:55 PM, Nathaniel Smith <njs@pobox.com> wrote:
On May 25, 2012 2:21 PM, "Robert Kern" <robert.kern@gmail.com> wrote:
On Thu, May 24, 2012 at 5:52 PM, Robert Kern <robert.kern@gmail.com> wrote:
(Hmm, now that I think about it, the edge cases are when the strides are 0 or negative. 0-stride axes can simply be removed, and I think we should be able to work back to a first item and flip the sign on the negative strides. The typical positive-stride solution can be found in an open source C++ global array code, IIRC. Double-hmmm...)
Except that it's still NP-complete.
Huh, is it really? I'm pretty sure checking the existence of a solution to a linear Diophantine equation is cheap, but I guess figuring out whether it falls within the "shape" bounds is less obvious...
I believe that's what this is telling me: http://permalink.gmane.org/gmane.comp.gcc.fortran/11797 -- Robert Kern
On Fri, May 25, 2012 at 3:55 PM, Nathaniel Smith <njs@pobox.com> wrote:
On May 25, 2012 2:21 PM, "Robert Kern" <robert.kern@gmail.com> wrote:
On Thu, May 24, 2012 at 5:52 PM, Robert Kern <robert.kern@gmail.com> wrote:
(Hmm, now that I think about it, the edge cases are when the strides are 0 or negative. 0-stride axes can simply be removed, and I think we should be able to work back to a first item and flip the sign on the negative strides. The typical positive-stride solution can be found in an open source C++ global array code, IIRC. Double-hmmm...)
Except that it's still NP-complete.
Huh, is it really? I'm pretty sure checking the existence of a solution to a linear Diophantine equation is cheap, but I guess figuring out whether it falls within the "shape" bounds is less obvious...
If both positive and negative values are allowed, then there is a polynomial-time algorithm to solve the linear Diophantine equation, but bounding the possible values renders it NP-complete. When you go down to {0,1} as the only allowable values, it becomes the SUBSET-SUM problem. -- Robert Kern
On Fri, May 25, 2012 at 4:59 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Fri, May 25, 2012 at 3:55 PM, Nathaniel Smith <njs@pobox.com> wrote:
On May 25, 2012 2:21 PM, "Robert Kern" <robert.kern@gmail.com> wrote:
On Thu, May 24, 2012 at 5:52 PM, Robert Kern <robert.kern@gmail.com> wrote:
(Hmm, now that I think about it, the edge cases are when the strides are 0 or negative. 0-stride axes can simply be removed, and I think we should be able to work back to a first item and flip the sign on the negative strides. The typical positive-stride solution can be found in an open source C++ global array code, IIRC. Double-hmmm...)
Except that it's still NP-complete.
Huh, is it really? I'm pretty sure checking the existence of a solution to a linear Diophantine equation is cheap, but I guess figuring out whether it falls within the "shape" bounds is less obvious...
If both positive and negative values are allowed, then there is a polynomial-time algorithm to solve the linear Diophantine equation, but bounding the possible values renders it NP-complete. When you go down to {0,1} as the only allowable values, it becomes the SUBSET-SUM problem.
Right. I suspect it's still pretty practical to solve for many of the arrays we care about (strides that are multiples of each other, etc.); many NP-hard problems are easy in the typical case, and for a lot of the cases we care about (e.g. disjoint slices of a contiguous array) solving the Diophantine equation will show that the bounds are irrelevant and collisions can never occur. Oh well, fortunately nothing depends on this :-). - N
This is the stack overflow discussion mentioned. http://stackoverflow.com/questions/9164269/can-you-tell-if-an-array-is-a-vie... I basically implemented the answer from SO. I feel like the "is" gives you a good handle on things since to be true they are actually the same location in memory. Brian On May 24, 2012, at 8:56 AM, Jonathan T. Niehof wrote: On 05/23/2012 05:31 PM, T J wrote: It seems that there are a number of ways to check if an array is a view. Do we have a preferred way in the API that is guaranteed to stay available? Or are all of the various methods "here to stay"? We've settled on checking array.base, which I think was the outcome of a stackoverflow thread that I can't dig up. (I'll check with the guy who wrote the code.) -- Jonathan Niehof ISR-3 Space Data Systems Los Alamos National Laboratory MS-D466 Los Alamos, NM 87545 Phone: 505-667-9595 email: jniehof@lanl.gov<mailto:jniehof@lanl.gov> Correspondence / Technical data or Software Publicly Available _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org<mailto:NumPy-Discussion@scipy.org> http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Brian A. Larsen ISR-1 Space Science and Applications Los Alamos National Laboratory PO Box 1663, MS-D466 Los Alamos, NM 87545 USA (For overnight add: SM-30, Bikini Atoll Road) Phone: 505-665-7691 Fax: 505-665-7395 email: balarsen@lanl.gov<mailto:balarsen@lanl.gov> Correspondence / Technical data or Software Publicly Available
On Thu, May 24, 2012 at 6:07 PM, Larsen, Brian A <balarsen@lanl.gov> wrote:
This is the stack overflow discussion mentioned.
http://stackoverflow.com/questions/9164269/can-you-tell-if-an-array-is-a-vie...
I basically implemented the answer from SO. I feel like the "is" gives you a good handle on things since to be true they are actually the same location in memory.
If using the current development version of numpy, that answer is actually wrong... if you do a = np.arange(10) b = a.view() c = b.view() then in the development version, c.base is a, not b. This is the source of some contention and confusion right now...: https://github.com/numpy/numpy/pull/280#issuecomment-5888154 In any case, if "b.base is a" is True, then you can be pretty certain that b and a share memory, but if it is False, it doesn't tell you much at all. AFAICT np.may_share_memory would be strictly more useful. -- Nathaniel
On Thu, May 24, 2012 at 1:59 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, May 24, 2012 at 6:07 PM, Larsen, Brian A <balarsen@lanl.gov> wrote:
This is the stack overflow discussion mentioned.
http://stackoverflow.com/questions/9164269/can-you-tell-if-an-array-is-a-vie...
I basically implemented the answer from SO. I feel like the "is" gives you a good handle on things since to be true they are actually the same location in memory.
If using the current development version of numpy, that answer is actually wrong... if you do a = np.arange(10) b = a.view() c = b.view() then in the development version, c.base is a, not b. This is the source of some contention and confusion right now...: https://github.com/numpy/numpy/pull/280#issuecomment-5888154
In any case, if "b.base is a" is True, then you can be pretty certain that b and a share memory, but if it is False, it doesn't tell you much at all. AFAICT np.may_share_memory would be strictly more useful.
as example: I checked pandas recently and IIRC, I needed three .base to get a True
x = np.random.randn(4,5) xdf = pa.DataFrame(data=x) type(xdf[1]) <class 'pandas.core.series.Series'> xdf[1].base is x False xdf[1].base.base is x False xdf[1].base.base.base is x True np.may_share_memory(xdf[1], x) True
Josef
-- Nathaniel _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (9)
-
Benjamin Root
-
Dag Sverre Seljebotn
-
Jonathan T. Niehof
-
josef.pktd@gmail.com
-
Kathleen M Tacina
-
Larsen, Brian A
-
Nathaniel Smith
-
Robert Kern
-
T J