[Numpy-discussion] Should arr.diagonal() return a copy or a view? (1.7 compatibility issue)

Ralf Gommers ralf.gommers at googlemail.com
Wed May 16 16:10:58 EDT 2012

On Wed, May 16, 2012 at 3:55 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Tue, May 15, 2012 at 2:49 PM, Frédéric Bastien <nouiz at nouiz.org> wrote:
> > Hi,
> >
> > In fact, I would arg to never change the current behavior, but add the
> > flag for people that want to use it.
> >
> > Why?
> >
> > 1) There is probably >10k script that use it that will need to be
> > checked for correctness. There won't be easy to see crash or error
> > that allow user to see it.
> My suggestion is that we follow the scheme, which I think gives ample
> opportunity for people to notice problems:
> 1.7: works like 1.6, except that a DeprecationWarning is produced if
> (and only if) someone writes to an array returned by np.diagonal (or
> friends). This gives a pleasant heads-up for those who pay attention
> to DeprecationWarnings.
> 1.8: return a view, but mark this view read-only. This causes crashes
> for anyone who ignored the DeprecationWarnings, guaranteeing that
> they'll notice the issue.
> 1.9: return a writeable view, transition complete.
> I've written a pull request implementing the first part of this; I
> hope everyone interested will take a look:
>  https://github.com/numpy/numpy/pull/280

Thanks for doing that. Seems like a good way forward.

When the PR gets merged, can you please also open a ticket for this with
Milestone 1.8? Then we won't forget to make the required changes for that


> > 2) This is a globally not significant speed up by this change. Due to
> > 1), i think it is not work it. Why this is not a significant speed up?
> > First, the user already create and use the original tensor. Suppose a
> > matrix of size n x n. If it don't fit in the cache, creating it will
> > cost n * n. But coping it will cost cst * n. The cst is the price of
> > loading a full cache line. But if you return a view, you will pay this
> > cst price later when you do the computation. But it all case, this is
> > cheap compared to the cost of creating the matrix. Also, you will do
> > work on the matrix and this work will be much more costly then the
> > price of the copy.
> >
> > In the case the matrix fix in the cache, the price of the copy is even
> lower.
> >
> > So in conclusion, optimizing the diagonal won't give speed up in the
> > global user script, but will break many of them.
> I agree that the speed difference is small. I'm more worried about the
> cost to users of having to remember odd inconsistencies like this, and
> to think about whether there actually is a speed difference or not,
> etc. (If we do add a copy=False option, then I guarantee many people
> will use it religiously "just in case" the speed difference is enough
> to matter! And that would suck for them.)
> Returning a view makes the API slightly nicer, cleaner, more
> consistent, more useful. (I believe the reason this was implemented in
> the first place was that providing a convenient way to *write* to the
> diagonal of an arbitrary array made it easier to implement numpy.eye
> for masked arrays.) And the whole point of numpy is to trade off a
> little speed in favor of having a simple, easy-to-work with high-level
> API :-).
> -- Nathaniel
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120516/840b40ad/attachment.html>

More information about the NumPy-Discussion mailing list