Memory order of array copies
There are three basic Python APIs to copy an array in numpy: a.copy(): has always returned a C-contiguous array by default. has always taken an order= argument, which defaults to "C". np.array(a, copy=True): by default, produces an array with whatever memory ordering that 'a' had. Can also specify order="C", "F" to get C or Fortran contiguity instead. np.copy(a): has always been a simple alias for np.array(a, copy=True), which means that it also preserves memory ordering. BUT in current master and the 1.7 betas, an extra argument order= has been added, and this has been set to default to "C" ordering. The extra argument and behavioural change occurred in 0e1a4e95. It's not clear why; the change isn't mentioned in the commit message. The change has to be reverted for 1.7, at least, because it caused regressions in scikit-learn (and presumably other packages too). So the question is, what *should* these interface look like. Then we can figure out what kind of transition scheme is needed, if any. My gut reaction is that if we want to change this at all from it's pre-1.7 status quo, it would be the opposite of the change that was made in master... I'd expect np.copy(a) and a.copy() to return arrays that are as nearly-identical to 'a' as possible, unless I explicitly requested something different by passing an order= argument. But, I bet there is code out there that's doing something like: my_fragile_C_function_wrapper(a.copy()) when it really should be doing something more explicit like my_fragile_C_function_wrapper(np.array(a, copy=False, order="C", dtype=float)) i.e., they're depending on the current behaviour where a.copy() normalizes order. I don't see any way to detect these cases and issue a proper warning, so we may not be able to change this at all. Any ideas? Is there anything better to do than simply revert np.copy() to its traditional behaviour and accept that np.copy(a) and a.copy() will continue to have different semantics indefinitely? -n
On Sun, Sep 30, 2012 at 07:17:42PM +0100, Nathaniel Smith wrote:
Is there anything better to do than simply revert np.copy() to its traditional behaviour and accept that np.copy(a) and a.copy() will continue to have different semantics indefinitely?
Have np.copy take an 'order=None', which would translate to 'K'. Detect 'None' as a sentinel that order as not been specified. If the order is not specified, raise a FutureWarning that np.copy will change semantics in 2 releases. In two releases, do the change. That's how I would deal with it.
As always, I think it is better to don't change the default behaviour. There is many people that don't update frequently and 2 releases is not enough. This will lead to many hard to find bug. This will also give the impression what we can't rely on numpy default behaviour and numpy is not stable. As a rule of thumb, we need to compare the benefit and consequence of changing default behaviour. In this case I see only a marginal speed gain (marginal in the sense that in the global user script, this won't matter, but locally it could be significant) vs silent and hard to find bug. If speed in that case is important, i think it would be much better to write an optimizer version that will take stride and cache line length into account. Even if we hard code the cache line lenght, this will probably bring most of the local speed up, without the inconvenient. If people still want to do this change, I think only a big release like numpy 2.0 make this acceptable but with the warning as Gael told. But I still prefer it not done and if people matter about the speed, they can write optimized code. Fred On Sun, Sep 30, 2012 at 2:22 PM, Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
On Sun, Sep 30, 2012 at 07:17:42PM +0100, Nathaniel Smith wrote:
Is there anything better to do than simply revert np.copy() to its traditional behaviour and accept that np.copy(a) and a.copy() will continue to have different semantics indefinitely?
Have np.copy take an 'order=None', which would translate to 'K'. Detect 'None' as a sentinel that order as not been specified. If the order is not specified, raise a FutureWarning that np.copy will change semantics in 2 releases. In two releases, do the change.
That's how I would deal with it. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sun, Sep 30, 2012 at 7:22 PM, Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
On Sun, Sep 30, 2012 at 07:17:42PM +0100, Nathaniel Smith wrote:
Is there anything better to do than simply revert np.copy() to its traditional behaviour and accept that np.copy(a) and a.copy() will continue to have different semantics indefinitely?
Have np.copy take an 'order=None', which would translate to 'K'. Detect 'None' as a sentinel that order as not been specified. If the order is not specified, raise a FutureWarning that np.copy will change semantics in 2 releases. In two releases, do the change.
I'm actually suggesting that arr.copy() should change, rather than np.copy() (the opposite of the change currently in master), but that aside: the problem with this approach is that the vast majority of calls to these functions don't care at all about this order thing, so spewing warnings all over the place is pretty useless at helping people actually detect and fix their code. Compare to the np.diagonal() change in 1.7, where we don't issue a warning when np.diagonal() is called, but wait until people write to the array. -n
On Mon, Oct 1, 2012 at 8:35 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Sun, Sep 30, 2012 at 7:22 PM, Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
On Sun, Sep 30, 2012 at 07:17:42PM +0100, Nathaniel Smith wrote:
Is there anything better to do than simply revert np.copy() to its traditional behaviour and accept that np.copy(a) and a.copy() will continue to have different semantics indefinitely?
Have np.copy take an 'order=None', which would translate to 'K'. Detect 'None' as a sentinel that order as not been specified. If the order is not specified, raise a FutureWarning that np.copy will change semantics in 2 releases. In two releases, do the change.
I'm actually suggesting that arr.copy() should change, rather than np.copy() (the opposite of the change currently in master), but that aside: the problem with this approach is that the vast majority of calls to these functions don't care at all about this order thing, so spewing warnings all over the place is pretty useless at helping people actually detect and fix their code. Compare to the np.diagonal() change in 1.7, where we don't issue a warning when np.diagonal() is called, but wait until people write to the array.
I'm +1 on changing a.copy() to be more consistent with np.copy()/np.array(copy=True), but would push it to 2.0 (or whatever release API changes are next expected). I would not worry about a FutureWarning, for this reason.
participants (4)
-
Frédéric Bastien
-
Gael Varoquaux
-
Nathaniel Smith
-
Thouis (Ray) Jones