[Numpy-discussion] Multiple-field indexing: view vs copy in 1.14+

Allan Haldane allanhaldane at gmail.com
Sun Jan 21 21:48:37 EST 2018


Hello all,

We are making a decision (again) about what to do about the
behavior of multiple-field indexing of structured arrays: Should
it return a view or a copy, and on what release schedule?

As a reminder, this refers to operations like (1.13 behavior):

     >>> a = np.zeros(3, dtype=[('a', 'i4'), ('b', 'i4'), ('c', 'f4')])
     >>> a[['a', 'c']]
     array([(0, 0.), (0, 0.), (0, 0.)],
           dtype=[('a', '<i4'), ('c', '<f4')]

In numpy 1.14.0 we made this return a view instead of a copy, but
downstream test failures suggest we reconsider. In our current
implementation for 1.14.1, we have reverted this change, but
still plan to go through with it in 1.15.

See here for our discussion the problem and solutions:
https://github.com/numpy/numpy/pull/10411

The two main options we have discussed are either to try to make
the change in 1.15, or never make the change at all and always
return a copy.

Here are some pros and cons:

Pros (change to view in 1.15)
=============================

  * Views are useful and convenient. Other forms of indexing also
    often return views so this is more consistent.
  * This change has been planned since numpy 1.7 in 2009,
    and there have been visible FutureWarnings about it since
    then. Anyone whose code will break should have seen the
    warnings. It has been extensively warned about in recent
    release notes.
  * Past discussions have supported the change. See my comment in
    the PR with many links to them and to other history.
  * Users have requested the change on the list.
  * Possibly a majority of the reported code failures were not
    actually caused by the change, but by another bug (#8100)
    involving np.load/np.save which this change exposed. If we
    push it off to 1.15, we will have time to fix this other bug.
    (There were no FutureWarnings for this breakage, of course).
  * The code that really will break is of the form
          a[['a', 'c']].view('i8')
    because the returned itemsize is different. This has
    raised FutureWarnings since numpy 1.7, and no users reported
    failures due to this change. In the PR we still try to
    mitigate this breakage by introducing a new method
    `pack_fields`, which converts the result into the 1.13 form,
    so that
          np.pack_fields(a[['a', 'c']]).view('i8')
    will work.


Cons (keep returning a copy)
============================

  * The extra convenience is not really that much, and fancy
    indexing also returns a copy instead of a view, so there is
    a precedent there.
  * We want to minimize compatibility breaks with old behavior.
    We've had a fair amount of discussion and complaints about
    how we break things in general.
  * We have lived with a "copy" for 8 years now. At some point the
    behavior gets set in stone for compatibility reasons.
  * Users have written to the list and github about their code
    breaking in 1.14.0. As far as I am aware, they all refer
    to the #8100 problem.
  * If a new function `pack_fields` is needed to guard against
    mishaps with the view behavior, that seems like a sign that
    keeping the copy behavior is the best option from an API
    perspective.

My initial vote is go with the change in 1.15: The "view" code
that will ultimately break (not the code related to #8100) has
been sending FutureWarnings for many years, and I am not aware of
any user complaints involving it: All the complaints so far
would be fixed with #8100 in 1.15.

Feel free to also discuss the related proposed change, to make
np.diag return a view instead of a copy. That change has
not been implemented yet, only proposed.

Cheers,
Allan


More information about the NumPy-Discussion mailing list