NEP 21: Simplified and explicit advanced indexing

Sebastian and I have revised a Numpy Enhancement Proposal that he started three years ago for overhauling NumPy's advanced indexing. We'd now like to present it for official consideration. Minor inline comments (e.g., typos) can be added to the latest pull request (https://github.com/numpy/numpy/pull/11414/files), but otherwise let's keep discussion on the mailing list. The NumPy website should update shortly with a rendered version ( http://www.numpy.org/neps/nep-0021-advanced-indexing.html), but until then please see the full text below. Cheers, Stephan ========================================= Simplified and explicit advanced indexing ========================================= :Author: Sebastian Berg :Author: Stephan Hoyer <shoyer@google.com> :Status: Draft :Type: Standards Track :Created: 2015-08-27 Abstract -------- NumPy's "advanced" indexing support for indexing arrays with other arrays is one of its most powerful and popular features. Unfortunately, the existing rules for advanced indexing with multiple array indices are typically confusing to both new, and in many cases even old, users of NumPy. Here we propose an overhaul and simplification of advanced indexing, including two new "indexer" attributes ``oindex`` and ``vindex`` to facilitate explicit indexing. Background ---------- Existing indexing operations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NumPy arrays currently support a flexible range of indexing operations: - "Basic" indexing involving only slices, integers, ``np.newaxis`` and ellipsis (``...``), e.g., ``x[0, :3, np.newaxis]`` for selecting the first element from the 0th axis, the first three elements from the 1st axis and inserting a new axis of size 1 at the end. Basic indexing always return a view of the indexed array's data. - "Advanced" indexing, also called "fancy" indexing, includes all cases where arrays are indexed by other arrays. Advanced indexing always makes a copy: - "Boolean" indexing by boolean arrays, e.g., ``x[x > 0]`` for selecting positive elements. - "Vectorized" indexing by one or more integer arrays, e.g., ``x[[0, 1]]`` for selecting the first two elements along the first axis. With multiple arrays, vectorized indexing uses broadcasting rules to combine indices along multiple dimensions. This allows for producing a result of arbitrary shape with arbitrary elements from the original arrays. - "Mixed" indexing involving any combinations of the other advancing types. This is no more powerful than vectorized indexing, but is sometimes more convenient. For clarity, we will refer to these existing rules as "legacy indexing". This is only a high-level summary; for more details, see NumPy's documentation and and `Examples` below. Outer indexing ~~~~~~~~~~~~~~ One broadly useful class of indexing operations is not supported: - "Outer" or orthogonal indexing treats one-dimensional arrays equivalently to slices for determining output shapes. The rule for outer indexing is that the result should be equivalent to independently indexing along each dimension with integer or boolean arrays as if both the indexed and indexing arrays were one-dimensional. This form of indexing is familiar to many users of other programming languages such as MATLAB, Fortran and R. The reason why NumPy omits support for outer indexing is that the rules for outer and vectorized conflict. Consider indexing a 2D array by two 1D integer arrays, e.g., ``x[[0, 1], [0, 1]]``: - Outer indexing is equivalent to combining multiple integer indices with ``itertools.product()``. The result in this case is another 2D array with all combinations of indexed elements, e.g., ``np.array([[x[0, 0], x[0, 1]], [x[1, 0], x[1, 1]]])`` - Vectorized indexing is equivalent to combining multiple integer indices with ``zip()``. The result in this case is a 1D array containing the diagonal elements, e.g., ``np.array([x[0, 0], x[1, 1]])``. This difference is a frequent stumbling block for new NumPy users. The outer indexing model is easier to understand, and is a natural generalization of slicing rules. But NumPy instead chose to support vectorized indexing, because it is strictly more powerful. It is always possible to emulate outer indexing by vectorized indexing with the right indices. To make this easier, NumPy includes utility objects and functions such as ``np.ogrid`` and ``np.ix_``, e.g., ``x[np.ix_([0, 1], [0, 1])]``. However, there are no utilities for emulating fully general/mixed outer indexing, which could unambiguously allow for slices, integers, and 1D boolean and integer arrays. Mixed indexing ~~~~~~~~~~~~~~ NumPy's existing rules for combining multiple types of indexing in the same operation are quite complex, involving a number of edge cases. One reason why mixed indexing is particularly confusing is that at first glance the result works deceptively like outer indexing. Returning to our example of a 2D array, both ``x[:2, [0, 1]]`` and ``x[[0, 1], :2]`` return 2D arrays with axes in the same order as the original array. However, as soon as two or more non-slice objects (including integers) are introduced, vectorized indexing rules apply. The axes introduced by the array indices are at the front, unless all array indices are consecutive, in which case NumPy deduces where the user "expects" them to be. Consider indexing a 3D array ``arr`` with shape ``(X, Y, Z)``: 1. ``arr[:, [0, 1], 0]`` has shape ``(X, 2)``. 2. ``arr[[0, 1], 0, :]`` has shape ``(2, Z)``. 3. ``arr[0, :, [0, 1]]`` has shape ``(2, Y)``, not ``(Y, 2)``! These first two cases are intuitive and consistent with outer indexing, but this last case is quite surprising, even to many higly experienced NumPy users. Mixed cases involving multiple array indices are also surprising, and only less problematic because the current behavior is so useless that it is rarely encountered in practice. When a boolean array index is mixed with another boolean or integer array, boolean array is converted to integer array indices (equivalent to ``np.nonzero()``) and then broadcast. For example, indexing a 2D array of size ``(2, 2)`` like ``x[[True, False], [True, False]]`` produces a 1D vector with shape ``(1,)``, not a 2D sub-matrix with shape ``(1, 1)``. Mixed indexing seems so tricky that it is tempting to say that it never should be used. However, it is not easy to avoid, because NumPy implicitly adds full slices if there are fewer indices than the full dimensionality of the indexed array. This means that indexing a 2D array like `x[[0, 1]]`` is equivalent to ``x[[0, 1], :]``. These cases are not surprising, but they constrain the behavior of mixed indexing. Indexing in other Python array libraries ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Indexing is a useful and widely recognized mechanism for accessing multi-dimensional array data, so it is no surprise that many other libraries in the scientific Python ecosystem also support array indexing. Unfortunately, the full complexity of NumPy's indexing rules mean that it is both challenging and undesirable for other libraries to copy its behavior in all of its nuance. The only full implementation of NumPy-style indexing is NumPy itself. This includes projects like dask.array and h5py, which support *most* types of array indexing in some form, and otherwise attempt to copy NumPy's API exactly. Vectorized indexing in particular can be challenging to implement with array storage backends not based on NumPy. In contrast, indexing by 1D arrays along at least one dimension in the style of outer indexing is much more acheivable. This has led many libraries (including dask and h5py) to attempt to define a safe subset of NumPy-style indexing that is equivalent to outer indexing, e.g., by only allowing indexing with an array along at most one dimension. However, this is quite challenging to do correctly in a general enough way to be useful. For example, the current versions of dask and h5py both handle mixed indexing in case 3 above inconsistently with NumPy. This is quite likely to lead to bugs. These inconsistencies, in addition to the broader challenge of implementing every type of indexing logic, make it challenging to write high-level array libraries like xarray or dask.array that can interchangeably index many types of array storage. In contrast, explicit APIs for outer and vectorized indexing in NumPy would provide a model that external libraries could reliably emulate, even if they don't support every type of indexing. High level changes ------------------ Inspired by multiple "indexer" attributes for controlling different types of indexing behavior in pandas, we propose to: 1. Introduce ``arr.oindex[indices]`` which allows array indices, but uses outer indexing logic. 2. Introduce ``arr.vindex[indices]`` which use the current "vectorized"/broadcasted logic but with two differences from legacy indexing: * Boolean indices are not supported. All indices must be integers, integer arrays or slices. * The integer index result dimensions are always the first axes of the result array. No transpose is done, even for a single integer array index. 3. Plain indexing on arrays will start to give warnings and eventually errors in cases where one of the explicit indexers should be preferred: * First, in all cases where legacy and outer indexing would give different results. * Later, potentially in all cases involving an integer array. These constraints are sufficient for making indexing generally consistent with expectations and providing a less surprising learning curve with ``oindex``. Note that all things mentioned here apply both for assignment as well as subscription. Understanding these details is *not* easy. The `Examples` section in the discussion gives code examples. And the hopefully easier `Motivational Example` provides some motivational use-cases for the general ideas and is likely a good start for anyone not intimately familiar with advanced indexing. Detailed Description -------------------- Proposed rules ~~~~~~~~~~~~~~
From the three problems noted above some expectations for NumPy can be deduced:
1. There should be a prominent outer/orthogonal indexing method such as ``arr.oindex[indices]``. 2. Considering how confusing vectorized/fancy indexing can be, it should be possible to be made more explicitly (e.g. ``arr.vindex[indices]``). 3. A new ``arr.vindex[indices]`` method, would not be tied to the confusing transpose rules of fancy indexing, which is for example needed for the simple case of a single advanced index. Thus, no transposing should be done. The axes created by the integer array indices are always inserted at the front, even for a single index. 4. Boolean indexing is conceptionally outer indexing. Broadcasting together with other advanced indices in the manner of legacy indexing is generally not helpful or well defined. A user who wishes the "``nonzero``" plus broadcast behaviour can thus be expected to do this manually. Thus, ``vindex`` does not need to support boolean index arrays. 5. An ``arr.legacy_index`` attribute should be implemented to support legacy indexing. This gives a simple way to update existing codebases using legacy indexing, which will make the deprecation of plain indexing behavior easier. The longer name ``legacy_index`` is intentionally chosen to be explicit and discourage its use in new code. 6. Plain indexing ``arr[...]`` should return an error for ambiguous cases. For the beginning, this probably means cases where ``arr[ind]`` and ``arr.oindex[ind]`` return different results give deprecation warnings. This includes every use of vectorized indexing with multiple integer arrays. Due to the transposing behaviour, this means that``arr[0, :, index_arr]`` will be deprecated, but ``arr[:, 0, index_arr]`` will not for the time being. 7. To ensure that existing subclasses of `ndarray` that override indexing do not inadvertently revert to default behavior for indexing attributes, these attribute should have explicit checks that disable them if ``__getitem__`` or ``__setitem__`` has been overriden. Unlike plain indexing, the new indexing attributes are explicitly aimed at higher dimensional indexing, several additional changes should be implemented: * The indexing attributes will enforce exact dimension and indexing match. This means that no implicit ellipsis (``...``) will be added. Unless an ellipsis is present the indexing expression will thus only work for an array with a specific number of dimensions. This makes the expression more explicit and safeguards against wrong dimensionality of arrays. There should be no implications for "duck typing" compatibility with builtin Python sequences, because Python sequences only support a limited form of "basic indexing" with integers and slices. * The current plain indexing allows for the use of non-tuples for multi-dimensional indexing such as ``arr[[slice(None), 2]]``. This creates some inconsistencies and thus the indexing attributes should only allow plain python tuples for this purpose. (Whether or not this should be the case for plain indexing is a different issue.) * The new attributes should not use getitem to implement setitem, since it is a cludge and not useful for vectorized indexing. (not implemented yet) Open Questions ~~~~~~~~~~~~~~ * The names ``oindex``, ``vindex`` and ``legacy_index`` are just suggestions at the time of writing this, another name NumPy has used for something like ``oindex`` is ``np.ix_``. See also below. * ``oindex`` and ``vindex`` could always return copies, even when no array operation occurs. One argument for allowing a view return is that this way ``oindex`` can be used as a general index replacement. However, there is one argument for returning copies. It is possible for ``arr.vindex[array_scalar, ...]``, where ``array_scalar`` should be a 0-D array but is not, since 0-D arrays tend to be converted. Copying always "fixes" this possible inconsistency. * The final state to morph plain indexing in is not fixed in this PEP. It is for example possible that `arr[index]`` will be equivalent to ``arr.oindex`` at some point in the future. Since such a change will take years, it seems unnecessary to make specific decisions at this time. * The proposed changes to plain indexing could be postponed indefinitely or not taken in order to not break or force major fixes to existing code bases. Alternative Names ~~~~~~~~~~~~~~~~~ Possible names suggested (more suggestions will be added). ============== ============ ======== **Orthogonal** oindex oix **Vectorized** vindex vix **Legacy** legacy_index l/findex ============== ============ ======== Subclasses ~~~~~~~~~~ Subclasses are a bit problematic in the light of these changes. There are some possible solutions for this. For most subclasses (those which do not provide ``__getitem__`` or ``__setitem__``) the special attributes should just work. Subclasses that *do* provide it must be updated accordingly and should preferably not subclass working versions of these attributes. All subclasses will inherit the attributes, however, the implementation of ``__getitem__`` on these attributes should test ``subclass.__getitem__ is ndarray.__getitem__``. If not, the subclass has special handling for indexing and ``NotImplementedError`` should be raised, requiring that the indexing attributes is also explicitly overwritten. Likewise, implementations of ``__setitem__`` should check to see if ``__setitem__`` is overriden. A further question is how to facilitate implementing the special attributes. Also there is the weird functionality where ``__setitem__`` calls ``__getitem__`` for non-advanced indices. It might be good to avoid it for the new attributes, but on the other hand, that may make it even more confusing. To facilitate implementations we could provide functions similar to ``operator.itemgetter`` and ``operator.setitem`` for the attributes. Possibly a mixin could be provided to help implementation. These improvements are not essential to the initial implementation, so they are saved for future work. Implementation -------------- Implementation would start with writing special indexing objects available through ``arr.oindex``, ``arr.vindex``, and ``arr.legacy_index`` to allow these indexing operations. Also, we would need to start to deprecate those plain index operations which are not ambiguous. Furthermore, the NumPy code base will need to use the new attributes and tests will have to be adapted. Backward compatibility ---------------------- As a new feature, no backward compatibility issues with the new ``vindex`` and ``oindex`` attributes would arise. To facilitate backwards compatibility as much as possible, we expect a long deprecation cycle for legacy indexing behavior and propose the new ``legacy_index`` attribute. Some forward compatibility issues with subclasses that do not specifically implement the new methods may arise. Alternatives ------------ NumPy may not choose to offer these different type of indexing methods, or choose to only offer them through specific functions instead of the proposed notation above. We don't think that new functions are a good alternative, because indexing notation ``[]`` offer some syntactic advantages in Python (i.e., direct creation of slice objects) compared to functions. A more reasonable alternative would be write new wrapper objects for alternative indexing with functions rather than methods (e.g., ``np.oindex(arr)[indices]`` instead of ``arr.oindex[indices]``). Functionally, this would be equivalent, but indexing is such a common operation that we think it is important to minimize syntax and worth implementing it directly on `ndarray` objects themselves. Indexing attributes also define a clear interface that is easier for alternative array implementations to copy, nonwithstanding ongoing efforts to make it easier to override NumPy functions [2]_. Discussion ---------- The original discussion about vectorized vs outer/orthogonal indexing arose on the NumPy mailing list: * https://mail.python.org/pipermail/numpy-discussion/2015-April/072550.html Some discussion can be found on the original pull request for this NEP: * https://github.com/numpy/numpy/pull/6256 Python implementations of the indexing operations can be found at: * https://github.com/numpy/numpy/pull/5749 * https://gist.github.com/shoyer/c700193625347eb68fee4d1f0dc8c0c8 Examples ~~~~~~~~ Since the various kinds of indexing is hard to grasp in many cases, these examples hopefully give some more insights. Note that they are all in terms of shape. In the examples, all original dimensions have 5 or more elements, advanced indexing inserts smaller dimensions. These examples may be hard to grasp without working knowledge of advanced indexing as of NumPy 1.9. Example array:: >>> arr = np.ones((5, 6, 7, 8)) Legacy fancy indexing --------------------- Note that the same result can be achieved with ``arr.legacy_index``, but the "future error" will still work in this case. Single index is transposed (this is the same for all indexing types):: >>> arr[[0], ...].shape (1, 6, 7, 8) >>> arr[:, [0], ...].shape (5, 1, 7, 8) Multiple indices are transposed *if* consecutive:: >>> arr[:, [0], [0], :].shape # future error (5, 1, 8) >>> arr[:, [0], :, [0]].shape # future error (1, 5, 7) It is important to note that a scalar *is* integer array index in this sense (and gets broadcasted with the other advanced index):: >>> arr[:, [0], 0, :].shape (5, 1, 8) >>> arr[:, [0], :, 0].shape # future error (scalar is "fancy") (1, 5, 7) Single boolean index can act on multiple dimensions (especially the whole array). It has to match (as of 1.10. a deprecation warning) the dimensions. The boolean index is otherwise identical to (multiple consecutive) integer array indices:: >>> # Create boolean index with one True value for the last two dimensions: >>> bindx = np.zeros((7, 8), dtype=np.bool_) >>> bindx[0, 0] = True >>> arr[:, 0, bindx].shape (5, 1) >>> arr[0, :, bindx].shape (1, 6) The combination with anything that is not a scalar is confusing, e.g.:: >>> arr[[0], :, bindx].shape # bindx result broadcasts with [0] (1, 6) >>> arr[:, [0, 1], bindx].shape # IndexError Outer indexing -------------- Multiple indices are "orthogonal" and their result axes are inserted at the same place (they are not broadcasted):: >>> arr.oindex[:, [0], [0, 1], :].shape (5, 1, 2, 8) >>> arr.oindex[:, [0], :, [0, 1]].shape (5, 1, 7, 2) >>> arr.oindex[:, [0], 0, :].shape (5, 1, 8) >>> arr.oindex[:, [0], :, 0].shape (5, 1, 7) Boolean indices results are always inserted where the index is:: >>> # Create boolean index with one True value for the last two dimensions: >>> bindx = np.zeros((7, 8), dtype=np.bool_) >>> bindx[0, 0] = True >>> arr.oindex[:, 0, bindx].shape (5, 1) >>> arr.oindex[0, :, bindx].shape (6, 1) Nothing changed in the presence of other advanced indices since:: >>> arr.oindex[[0], :, bindx].shape (1, 6, 1) >>> arr.oindex[:, [0, 1], bindx].shape (5, 2, 1) Vectorized/inner indexing ------------------------- Multiple indices are broadcasted and iterated as one like fancy indexing, but the new axes area always inserted at the front:: >>> arr.vindex[:, [0], [0, 1], :].shape (2, 5, 8) >>> arr.vindex[:, [0], :, [0, 1]].shape (2, 5, 7) >>> arr.vindex[:, [0], 0, :].shape (1, 5, 8) >>> arr.vindex[:, [0], :, 0].shape (1, 5, 7) Boolean indices results are always inserted where the index is, exactly as in ``oindex`` given how specific they are to the axes they operate on:: >>> # Create boolean index with one True value for the last two dimensions: >>> bindx = np.zeros((7, 8), dtype=np.bool_) >>> bindx[0, 0] = True >>> arr.vindex[:, 0, bindx].shape (5, 1) >>> arr.vindex[0, :, bindx].shape (6, 1) But other advanced indices are again transposed to the front:: >>> arr.vindex[[0], :, bindx].shape (1, 6, 1) >>> arr.vindex[:, [0, 1], bindx].shape (2, 5, 1) Motivational Example ~~~~~~~~~~~~~~~~~~~~ Imagine having a data acquisition software storing ``D`` channels and ``N`` datapoints along the time. She stores this into an ``(N, D)`` shaped array. During data analysis, we needs to fetch a pool of channels, for example to calculate a mean over them. This data can be faked using:: >>> arr = np.random.random((100, 10)) Now one may remember indexing with an integer array and find the correct code:: >>> group = arr[:, [2, 5]] >>> mean_value = arr.mean() However, assume that there were some specific time points (first dimension of the data) that need to be specially considered. These time points are already known and given by:: >>> interesting_times = np.array([1, 5, 8, 10], dtype=np.intp) Now to fetch them, we may try to modify the previous code:: >>> group_at_it = arr[interesting_times, [2, 5]] IndexError: Ambiguous index, use `.oindex` or `.vindex` An error such as this will point to read up the indexing documentation. This should make it clear, that ``oindex`` behaves more like slicing. So, out of the different methods it is the obvious choice (for now, this is a shape mismatch, but that could possibly also mention ``oindex``):: >>> group_at_it = arr.oindex[interesting_times, [2, 5]] Now of course one could also have used ``vindex``, but it is much less obvious how to achieve the right thing!:: >>> reshaped_times = interesting_times[:, np.newaxis] >>> group_at_it = arr.vindex[reshaped_times, [2, 5]] One may find, that for example our data is corrupt in some places. So, we need to replace these values by zero (or anything else) for these times. The first column may for example give the necessary information, so that changing the values becomes easy remembering boolean indexing:: >>> bad_data = arr[:, 0] > 0.5 >>> arr[bad_data, :] = 0 # (corrupts further examples) Again, however, the columns may need to be handled more individually (but in groups), and the ``oindex`` attribute works well:: >>> arr.oindex[bad_data, [2, 5]] = 0 Note that it would be very hard to do this using legacy fancy indexing. The only way would be to create an integer array first:: >>> bad_data_indx = np.nonzero(bad_data)[0] >>> bad_data_indx_reshaped = bad_data_indx[:, np.newaxis] >>> arr[bad_data_indx_reshaped, [2, 5]] In any case we can use only ``oindex`` to do all of this without getting into any trouble or confused by the whole complexity of advanced indexing. But, some new features are added to the data acquisition. Different sensors have to be used depending on the times. Let us assume we already have created an array of indices:: >>> correct_sensors = np.random.randint(10, size=(100, 2)) Which lists for each time the two correct sensors in an ``(N, 2)`` array. A first try to achieve this may be ``arr[:, correct_sensors]`` and this does not work. It should be clear quickly that slicing cannot achieve the desired thing. But hopefully users will remember that there is ``vindex`` as a more powerful and flexible approach to advanced indexing. One may, if trying ``vindex`` randomly, be confused about:: >>> new_arr = arr.vindex[:, correct_sensors] which is neither the same, nor the correct result (see transposing rules)! This is because slicing works still the same in ``vindex``. However, reading the documentation and examples, one can hopefully quickly find the desired solution:: >>> rows = np.arange(len(arr)) >>> rows = rows[:, np.newaxis] # make shape fit with correct_sensors >>> new_arr = arr.vindex[rows, correct_sensors] At this point we have left the straight forward world of ``oindex`` but can do random picking of any element from the array. Note that in the last example a method such as mentioned in the ``Related Questions`` section could be more straight forward. But this approach is even more flexible, since ``rows`` does not have to be a simple ``arange``, but could be ``intersting_times``:: >>> interesting_times = np.array([0, 4, 8, 9, 10]) >>> correct_sensors_at_it = correct_sensors[interesting_times, :] >>> interesting_times_reshaped = interesting_times[:, np.newaxis] >>> new_arr_it = arr[interesting_times_reshaped, correct_sensors_at_it] Truly complex situation would arise now if you would for example pool ``L`` experiments into an array shaped ``(L, N, D)``. But for ``oindex`` this should not result into surprises. ``vindex``, being more powerful, will quite certainly create some confusion in this case but also cover pretty much all eventualities. Copyright --------- This document is placed under the CC0 1.0 Universell (CC0 1.0) Public Domain Dedication [1]_. References and Footnotes ------------------------ .. [1] To the extent possible under law, the person who associated CC0 with this work has waived all copyright and related or neighboring rights to this work. The CC0 license may be found at https://creativecommons.org/publicdomain/zero/1.0/ .. [2] e.g., see NEP 18, http://www.numpy.org/neps/nep-0018-array-function-protocol.html

Generally +1 on this, but I don’t think we need To ensure that existing subclasses of ndarray that override indexing do not inadvertently revert to default behavior for indexing attributes, these attribute should have explicit checks that disable them if __getitem__ or __setitem__ has been overriden. Repeating my proposal from github, I think we should introduce some internal indexing objects - something simple like: # np.core.*class Indexer(object): # importantly not iterable def __init__(self, value): self.value = valueclass OrthogonalIndexer(Indexer): passclass VectorizedIndexer(Indexer): pass Keeping the proposed syntax, we’d implement: - arr.oindex[ind] as arr[np.core.OrthogonalIndexer(ind)] - arr.vindex[ind] as arr[np.core.VectorizedIndexer(ind)] This means that subclasses like the following class LoggingIndexer(np.ndarray): def __getitem__(self, ind): ret = super().__getitem__(ind) print("Got an index") return ret will continue to work without issues. This includes np.ma.MaskedArray and np.memmap, so this already has value internally. For classes like np.matrix which inspect the index object itself, an error will still be raised from __getitem__, since it looks nothing like the values normally passed - most likely of the form TypeError: 'numpy.core.VectorizedIndexer' object does not support indexing TypeError: 'numpy.core.VectorizedIndexer' object is not iterable This could potentially be caught in oindex.__getitem__ and converted into a more useful error message. So to summarize the benefits of the above tweaks: - Pass-through subclasses get the new behavior for free - No additional descriptor helpers are needed to let non-passthrough subclasses implement the new indexable attributes - only a change to __getitem__ is needed And the costs: - A less clear error message when new indexing is used on old types (can chain with a more useful exception on python 3) - Class construction overhead for indexing via the attributes (skippable for base ndarray if significant) Eric On Mon, 25 Jun 2018 at 14:30 Stephan Hoyer <shoyer@gmail.com> wrote:

rr, cc = coords.T # coords is an (n, 2) array of integer coordinates values = image[rr, cc] Are you saying that this use is deprecated? Because we love it at scikit- image. I would be very very very sad to lose this syntax.
Other general comments: - oindex in general seems very intuitive and I'm :+1: - I would much prefer some extremely compact notation such as arr.ox[] and arr.vx.- Depending on the above concern I am either -1 or (-1/0) on the deprecation. Deprecating (all) old vindex behaviour doesn't seem to bring many benefits while potentially causing a lot of pain to downstream libraries. Juan.

On Mon, Jun 25, 2018 at 11:29 PM Andrew Nelson <andyfaff@gmail.com> wrote:
And thirded. This should not be considered deprecated or discouraged. As I mentioned in the previous iteration of this discussion, this is the behavior I want more often than the orthogonal indexing. It's a really common way to work with images and other kinds of raster data, so I don't think it should be relegated to the "officially discouraged" ghetto of `.legacy_index`. It should not issue warnings or (eventual) errors. I would reserve warnings for the cases where the current behavior is something no one really wants, like mixing slices and integer arrays. -- Robert Kern

I don't think it should be relegated to the "officially discouraged" ghetto of `.legacy_index`
The way I read it, the new spelling lof that would be the explicit but not discouraged `image.vindex[rr, cc]`.
I would reserve warnings for the cases where the current behavior is something no one really wants, like mixing slices and integer arrays.
These are the cases that would only be available under `legacy_index`. Eric On Mon, 25 Jun 2018 at 23:54 Robert Kern <robert.kern@gmail.com> wrote:

On Tue, 26 Jun 2018 at 17:12, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
If I'm understanding correctly what can be achieved now by `arr[rr, cc]` would have to be modified to use `arr.vindex[rr, cc]`, which is a very large change in behaviour. I suspect that there a lot of situations out there which use `arr[idxs]` where `idxs` can mean one of a range of things depending on the code path followed. If any of those change, or a mix of nomenclatures are required to access the different cases, then havoc will probably ensue.

On Tue, 2018-06-26 at 17:30 +1000, Andrew Nelson wrote:
Yes, that is true, but I doubt you will find a lot of code path that need the current indexing as opposed to vindex here, and the idea was to have a method to get the old behaviour indefinitely. You will need to add the `.vindex`, but that should be the only code change needed, and it would be easy to find where with errors/warnings. I see a possible problem with code that has to work on different numpy versions, but only in meaning we need to delay deprecations. The only thing I could imagine where this might happen is if you forward someone elses indexing objects and different users are used to different results. Otherwise, there is mostly one case which would get annoying, and that is `arr[:, rr, cc]` since `arr.vindex[:, rr, cc]` would not be exactly the same. Because, yes, in some cases the current logic is convenient, just incredibly surprising as well. - Sebastian

On Tue, Jun 26, 2018 at 12:58 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
That's probably true! But I think it's besides the point. I'd wager that most code paths that will use .vindex would work perfectly well with current indexing, too. Most of the time, people aren't getting into the hairy corners of advanced indexing. Adding to the toolbox is great, but I don't see a good reason to take out the ones that are commonly used quite safely.
It's not necessarily hard; it's just churn for no benefit to the downstream code. They didn't get a new feature; they just have to run faster to stay in the same place. -- Robert Kern

On Tue, 2018-06-26 at 01:21 -0700, Robert Kern wrote:
On Tue, Jun 26, 2018 at 12:58 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
<snip>
Right, the proposal was to have DeprecationWarnings when they differ, now I also thought DeprecationWarnings on two advanced indexes in general is good, because it is good for new users. I have to agree with your argument that most of the confused should be running into broadcast errors (if they expect oindex vs. fancy). So I see this as a point that we likely should just limit ourselves at least for now to the cases for example with sudden transposing going on. However, I would like to point out that the reason for the more broad warnings is that it could allow warping normal indexing at some point. Also it decreases traps with array-likes that behave differently.
So, yes, it is annoying for quite a few projects that correctly use fancy indexing, but if we choose to not annoy you a little, we will have much less long term options which also includes such projects compatibility to new/current array-likes. So basically one point is: if we annoy scikit-image now, their code will work better for dask arrays in the future hopefully. - Sebastian

On Tue, Jun 26, 2018 at 1:36 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
I don't really understand this. You would discourage the "normal" syntax in favor of these more specific named syntaxes, so you can introduce different behavior for the "normal" syntax and encourage everyone to use it again? Just add more named syntaxes if you want new behavior! That's the beauty of the design underlying this NEP.
Also it decreases traps with array-likes that behave differently.
If we were to take this seriously, then no one should use a bare [] ever. I'll go on record as saying that array-likes should respond to `a[rr, cc]`, as in Juan's example, with the current behavior. And if they don't, they don't deserve to be operated on by skimage functions. If I'm reading the NEP correctly, the main thrust of the issue with array-likes is that it is difficult for some of them to implement the full spectrum of indexing possibilities. This NEP does not actually make it *easier* for those array-likes to implement every possibility. It just offers some APIs that more naturally express common use cases which can sometimes be implemented more naturally than if expressed in the current indexing. For instance, you can achieve the same effect as orthogonal indexing with the current implementation, but you have to manipulate the indices before you pass them over to __getitem__(), losing information along the way that could be used to make a more efficient lookup in some array-likes. The NEP design is essentially more of a way to give these array-likes standard places to raise NotImplementedError than it is to help them get rid of all of their NotImplementedErrors. More specifically, if these array-likes can't implement `a[rr, cc]`, they're not going to implement `a.vindex[rr, cc]`, either. I think most of the problems that caused these libraries to make different choices in their __getitem__() implementation are due to the fact that these expressive APIs didn't exist, so they had to shoehorn them into __getitem__(); orthogonal indexing was too useful and efficient not to implement! I think that once we have .oindex and .vindex out there, they will be able to clean up their __getitem__()s to consistently support whatever of the current behavior that they can and raise NotImplementedError where they can't. -- Robert Kern

On Tue, 2018-06-26 at 02:27 -0700, Robert Kern wrote:
Right, it helps mostly to be clear about what an object can and cannot do. So h5py or whatever could error out for plain indexing and only support `.oindex`, and we have all options cleanly available. And yes, I agree that in itself is a big step forward. The thing is there are also very strong opinions that the fancy indexing behaviour is so confusing that it would ideally not be the default since it breaks comparing analogy slice objects. So, personally, I would argue that if we were to start over from scratch, fancy indexing (multiple indexes), would not be the default plain indexing behaviour. Now, maybe the pain of a few warnings is too high, but if we wish to move, no matter how slowly, in such regard, we will have to swallow it eventually. The suggestion was to make that as easy as possible with adding an attribute indefinitely. Otherwise, even a possible numpy replacement might have difficulties to chose a different default for indexing for years to come... Practically, I guess some warnings might have to wait a longer while, just because it could be almost impossible to avoid them in code working with different numpy versions. - Sebastian

On Tue, Jun 26, 2018 at 3:50 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
Okay, great. Before we move on to your next point, can we agree that the array-likes aren't a motivating factor for deprecating the current behavior of __getitem__()?
So I think we've moved past the technical objections. In the post-NEP .oindex/.vindex order, everyone can get the behavior that they want. Your argument for deprecation is now just about what the default is, the semantics that get pride of place with the shortest spelling. I am sympathetic to the feeling like you wish you had a time machine to go fix a design with your new insight. But it seems to me that just changing which semantics are the default has relatively attenuated value while breaking compatibility for a fundamental feature of numpy has significant costs. Just introducing .oindex is the bulk of the value of this NEP. Everything else is window dressing. You have my sympathies, but not enough for me to consent to deprecation. You might get more of my sympathy a year or two from now when the community has had a chance to work with .oindex. It's entirely possible that everyone will leap to using .oindex (and .vindex only rarely), and we will be flooded with complaints that "I only use .oindex, but the name is so long it messes up the readability of my lengthy expressions". But it's also possible that it sort of fizzles: people use it, but maybe use .vindex more, or about the same. Or just keep on happily using neither. We don't know which of those futures are going to be true. Anecdatally, you want .oindex semantics most often; I would almost exclusively use .vindex. I don't know which of us is more representative. Probably neither. I maintain that considering deprecation is premature at this time. Please take it out of this NEP. Let us get a feel for how people actually use .oindex/.vindex. Then we can talk about deprecation. This NEP gets my enthusiastic approval, except for the deprecation. I will be happy to talk about deprecation with an open mind in a few years. With some more actual experience under our belt, rather than prediction and theory, we can be more confident about the approach we want to take. Deprecation is not a fundamental part of this NEP and can be decided independently at a later time. -- Robert Kern

On Tue, Jun 26, 2018 at 4:34 PM Robert Kern <robert.kern@gmail.com> wrote:
I agree, we should scale back most of the deprecations proposed in this NEP, leaving them for possible future work. In particular, you're not convinced yet that "outer indexing" is a more intuitive default indexing mode than "vectorized indexing", so it is premature to deprecate vectorized indexing behavior that conflicts with outer indexing. OK, fair enough. I would still like to include at least two more limited form of deprecation that I hope will be less controversial: - Mixed boolean/integer array indexing. This is not very intuitive nor useful, and I don't think I've ever seen it used. Usually "outer indexing" behavior is what is desired here. - Mixed array/slice indexing, for cases with arrays separated by slices so NumPy can't do the "intuitive" transpose on the output. As noted in the NEP, this is a common source of bugs. Users who want this should really switch to vindex. In the long term, although I agree with Sebastian that "outer indexing" is more intuitive for default indexing behavior, I would really like to eliminate the "dimension reordering" behavior of mixed array/slice indexing altogether. This is a weird special case that is different between indexing like array[...] from array.vindex[...]. So if we don't choose to deprecate all cases where [] and oindex[] are different, I would at least like to deprecate all cases where [] and vindex[] are different.

On Tue, Jun 26, 2018 at 6:14 PM Stephan Hoyer <shoyer@gmail.com> wrote:
Actually, I do think outer indexing is more "intuitive"*, as far as that goes. It's just rarely what I actually want to accomplish. * I do not like using "intuitive" in programming. Nipples are intuitive. Everything else is learned. But in this case, I think that outer indexing is a more concordant extension of the concepts that a new numpy user would have learned earlier: integer indices and slices. I would still like to include at least two more limited form of deprecation
I'd still prefer not talking deprecation, per se, in this NEP (but my objection is weaker). I would definitely start adding in informative, noisy warnings in these cases, though. Along the lines of, "Hey, this is a dodgy construction that typically gives unexpected results. Here are .oindex/.vindex that might do what you actually want, but you can use .legacy_index if you just want to silence this warning". Rather than "Hey, this is going to go away at some point." -- Robert Kern

*They didn't get a new feature; they just have to run faster to stay in the same place.**
Let me start by thanking Robert for articulating my viewpoints far better than I could have done myself. I want to explicitly flag the following statements for endorsement: though: In [1]: from skimage import data In [2]: astro = data.astronaut() In [3]: astro.shape Out[3]: (512, 512, 3) In [4]: rr, cc = np.array([1, 3, 3, 3]), np.array([1, 8, 9, 10]) In [5]: astro[rr, cc].shape Out[5]: (4, 3) In [6]: astro[rr, cc, :].shape Out[6]: (4, 3) This does exactly what I would expect. Going back to the motivation for the NEP, I think this bit, emphasis mine, is crucial: * (I don't think of us highly enough to use the word "deserve", but I would say that we would hesitate to support arrays that don't use this convention.) * It is also probably true, as mentioned elsewhere, that we could go through our entire codebase and append `.vidx` to every array indexing op. Perhaps others on this list find this a reasonable request, but I don't. Aside from the churn involved, it would make our codebase significantly uglier and less readable. I should also emphasise that NumPy is really *the* foundational project for the entire Scientific Python ecosystem. Changing the meaning of [] should only be considered if it delivers an *extreme* benefit. Robert's statement would apply to a stupid number of projects. * :+10**6: To Sebastian's comment:
Let's get rid of the hopefully. Let NumPy implement .oindex and .vindex. Let Dask arrays do the same. Let's have an announcement on the scikit-image mailing list, "hey guys, if you switch all your indexing operations to .vindex, suddenly all of your library works with dask arrays!" At that point, we have a value proposition on our hands. Currently, it amounts to gambling with others' time. To Stephan's options that were sent while I was composing this:
Some options, in roughly descending order of severity:
I favour 4, or at the limit 3. (See use case above, which I would argue is totally unsurprising.) I'm happy that option 1 appears to be off the table. Hameer,

On Tue, Jun 26, 2018 at 10:21 PM Juan Nunez-Iglesias <jni.soma@gmail.com> wrote:
Yup, sorry, I didn't mean those. I meant when there is an explicit slice in between index arrays. (And maybe when index arrays follow slices; I'll need to think more on that.)
Ahem, yes, I was being provocative in a moment of weakness. May the array-like authors forgive me. -- Robert Kern

On Tue, 2018-06-26 at 22:26 -0700, Robert Kern wrote:
OK, sounds fine to me, I see that we just can't start planning for a possible long term future yet. I personally do not care really what the warnings itself say for now (Deprecation or not), larger packages will have to avoid them in any case though. But I guess we have a consent on a certain amount of warnings (probably will have to see how much they actually appear) and then can revisit in a longer while. - Sebastian

On Tue, Jun 26, 2018 at 12:13 AM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Okay, I missed that the first time through. I think having more self-contained descriptions of the semantics of each of these would be a good idea. The current description of `.vindex` spends more time talking about what it doesn't do, compared to the other methods, than what it does. Some more typical, less-exotic examples would be a good idea.
I'm still leaning towards not warning on current, unproblematic common uses. It's unnecessary churn for currently working, understandable code. I would still reserve warnings and deprecation for the cases where the current behavior gives us something that no one wants. Those are the real traps that people need to be warned away from. If someone is mixing slices and integer indices, that's a really good sign that they thought indexing behaved in a different way (e.g. orthogonal indexing). If someone is just using multiple index arrays that would currently not give an error, that's actually a really good sign that they are using it correctly and are getting the semantics that they desired. If they wanted orthogonal indexing, it is *really* likely that their index arrays would *not* broadcast together. And even if they did, the wrong shape of the result is one of the more easily noticed things. These are not silent errors that would motivate adding a new warning. -- Robert Kern

On Tue, Jun 26, 2018 at 12:46 AM Robert Kern <robert.kern@gmail.com> wrote:
Of course, I would definitely support adding more information to the various IndexError messages to point people to `.oindex` and `.vindex`. I think that would guide more people to correct their code than adding a new warning to code that currently executes (which is likely not erroneous), and it would cause no churn. -- Robert Kern

I second this design. If we were to consider the general case of a tuple `idx`, then we’d not be moving forward at all. Design changes would be impossible. I’d argue that this newer model would be easier for library maintainers overall (who are the kind of people using this), reducing maintenance cost in the long run because it’d lead to simpler code. I would also that the “internal” classes expressing outer as vectorised indexing etc. should be exposed, for maintainers of duck arrays to use. God knows how many utility functions I’ve had to write to avoid relying on undocumented NumPy internals for pydata/sparse, fearing that I’d have to rewrite/modify them when behaviour changes or I find other corner cases. Best Regards, Hameer Abbasi Sent from Astro <https://www.helloastro.com> for Mac On 26. Jun 2018 at 09:46, Robert Kern <robert.kern@gmail.com> wrote: On Tue, Jun 26, 2018 at 12:13 AM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Okay, I missed that the first time through. I think having more self-contained descriptions of the semantics of each of these would be a good idea. The current description of `.vindex` spends more time talking about what it doesn't do, compared to the other methods, than what it does. Some more typical, less-exotic examples would be a good idea.
I'm still leaning towards not warning on current, unproblematic common uses. It's unnecessary churn for currently working, understandable code. I would still reserve warnings and deprecation for the cases where the current behavior gives us something that no one wants. Those are the real traps that people need to be warned away from. If someone is mixing slices and integer indices, that's a really good sign that they thought indexing behaved in a different way (e.g. orthogonal indexing). If someone is just using multiple index arrays that would currently not give an error, that's actually a really good sign that they are using it correctly and are getting the semantics that they desired. If they wanted orthogonal indexing, it is *really* likely that their index arrays would *not* broadcast together. And even if they did, the wrong shape of the result is one of the more easily noticed things. These are not silent errors that would motivate adding a new warning. -- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

I like the proposal generally. NumPy could use a good orthogonal indexing method and a vectorized-indexing method is fine too. Robert Kern is spot on with his concerns as well. Please do not change what arr[idx] does except to provide warnings and perhaps point people to new .oix and .vix methods. What indexing does is documented (if hard to understand and surprising in a particular sub-case). There is one specific place in the code where I would make a change to raise an error rather than change the order of the axes of the output to provide a consistent subspace. Even then, it should be done as a deprecation warning and then raise the error. Otherwise, just add the new methods and don't make any other changes until a major release. -Travis On Tue, Jun 26, 2018 at 2:03 AM Hameer Abbasi <einstein.edison@gmail.com> wrote:

On Tue, Jun 26, 2018 at 1:26 AM Travis Oliphant <teoliphant@gmail.com> wrote:
I'd suggest that the NEP explicitly disclaim deprecating current behavior. Let the NEP just be about putting the new features out there. Once we have some experience with them for a year or three, then let's talk about deprecating parts of the current behavior and make a new NEP then if we want to go that route. We're only contemplating *long* deprecation cycles anyways; we're not in a race. The success of these new features doesn't really rely on the deprecation of current indexing, so let's separate those issues. -- Robert Kern

I would disagree here. For libraries like Dask, XArray, pydata/sparse, XND, etc., it would be bad for them if there was continued use of “weird” indexing behaviour (no warnings means more code written that’s… well… not exactly the best design). Of course, we could just choose to not support it. But that means a lot of code won’t support us, or support us later than we desire. I agree with your design of “let’s limit the number of warnings/deprecations to cases that make very little sense” but there should be warnings. Specifically, I recommend warnings for mixed slices and fancy indexes, and warnings followed by errors for cases where the transposing behaviour occurs. Best Regards, Hameer Abbasi Sent from Astro <https://www.helloastro.com> for Mac On 26. Jun 2018 at 10:33, Robert Kern <robert.kern@gmail.com> wrote: On Tue, Jun 26, 2018 at 1:26 AM Travis Oliphant <teoliphant@gmail.com> wrote:
I'd suggest that the NEP explicitly disclaim deprecating current behavior. Let the NEP just be about putting the new features out there. Once we have some experience with them for a year or three, then let's talk about deprecating parts of the current behavior and make a new NEP then if we want to go that route. We're only contemplating *long* deprecation cycles anyways; we're not in a race. The success of these new features doesn't really rely on the deprecation of current indexing, so let's separate those issues. -- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

On Tue, Jun 26, 2018 at 1:49 AM Hameer Abbasi <einstein.edison@gmail.com> wrote:
On 26. Jun 2018 at 10:33, Robert Kern <robert.kern@gmail.com> wrote:
I'd suggest that the NEP explicitly disclaim deprecating current
behavior. Let the NEP just be about putting the new features out there. Once we have some experience with them for a year or three, then let's talk about deprecating parts of the current behavior and make a new NEP then if we want to go that route. We're only contemplating *long* deprecation cycles anyways; we're not in a race. The success of these new features doesn't really rely on the deprecation of current indexing, so let's separate those issues.
I would disagree here. For libraries like Dask, XArray, pydata/sparse,
XND, etc., it would be bad for them if there was continued use of “weird” indexing behaviour (no warnings means more code written that’s… well… not exactly the best design). Of course, we could just choose to not support it. But that means a lot of code won’t support us, or support us later than we desire.
I agree with your design of “let’s limit the number of
warnings/deprecations to cases that make very little sense” but there should be warnings. I'm still in favor of warnings in these cases. I didn't mean to suggest excluding those from the NEP. I just don't think they should be deprecations; we shouldn't suggest that they will eventually turn into errors. At least until we get these features out there, get some experience with them, then have a new NEP at that time just proposing deprecation. P.S. Would you mind bottom-posting? It helps maintain the context of what you are commenting on and my reply to those comments. I tried writing this reply without it, and it felt like it was missing context. Thanks! -- Robert Kern

On Tue, 2018-06-26 at 04:01 -0400, Hameer Abbasi wrote:
Could you list some examples what you would need? We can expose some of the internals, or maybe even provide funcs to map e.g. oindex to vindex or vindex to plain indexing, etc. but it would be helpful to know what downstream actually might need. For all I know the things that you are thinking of may not even exist... - Sebastian

We can expose some of the internals These could be expressed as methods on the internal indexing objects I proposed in the first reply to this thread, which has seen no responses. I think Hameer Abbasi is looking for something like OrthogonalIndexer(...).to_vindex() -> VectorizedIndexer such that arr.oindex[ind] selects the same elements as arr.vindex[OrthogonalIndexer(ind).to_vindex()] Eric On Tue, 26 Jun 2018 at 08:04 Sebastian Berg <sebastian@sipsolutions.net> wrote:

On Tue, Jun 26, 2018 at 9:38 AM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
It is probably worth noting that xarray already uses very similar classes internally for keeping track of indexing operations. See BasicIndexer, OuterIndexer and VectorizedIndexer: https://github.com/pydata/xarray/blob/v0.10.7/xarray/core/indexing.py#L295-L... This turns out to be pretty convenient model even when not using subclassing. In xarray, we use them internally in various "partial duck array" classes that do some lazy computation upon indexing with __getitem__. It's nice to simply be able to forward on Indexer objects rather than implement separate vindex/oindex methods. We also have utility functions for converting between different forms, e.g., from OuterIndexer to VectorizedIndexer: https://github.com/pydata/xarray/blob/v0.10.7/xarray/core/indexing.py#L654 I guess this is a case for using such classes internally in NumPy, and possibly for exposing them publicly as well.

On Tue, Jun 26, 2018 at 12:46 AM Robert Kern <robert.kern@gmail.com> wrote:
Will do.
I agree, but I'm still not entirely sure where to draw the line on behavior that should issue a warning. Some options, in roughly descending order of severity: 1. Warn if [] would give a different result than .oindex[]. This is the current proposal in the NEP, but based on the feedback we should hold back on it for now. 2. Warn if there is a mixture of arrays/slice objects in indices for [], even implicitly (e.g., including arr[idx] when is equivalent to arr[idx, :]). In this case, indices end up at the end both for legacy_index and vindex, but arguably that is only a happy coincidence. 3. Warn if [] would give a different result from .vindex[]. This is a little weaker than the previous condition, because arr[idx, :] or arr[idx, ...] would not give a warning. However, cases like arr[..., idx] or arr[:, idx, :] would still start to give warnings, even though they are arguably well defined according to either outer indexing (if idx.ndim == 1) or legacy indexing (due to dimension reordering rules that will be omitted from vindex). 4. Warn if there are multiple arrays/integer indices separated by a slice object, e.g., arr[idx1, :, idx2]. This is the edge case that really trips up users. As I said in my other response, in the long term, I would prefer to either (a) drop support for vectorized indexing in [] or (b) if we stick with supporting vectorized indexing in [], at least ensure consistent dimension ordering rules for [] and vindex[]. That would suggest using either my proposed rule 2 or 3. I also agree with you that anyone mixing slices and integers probably is confused about how indexing works, at least in edge cases. But given the lengths that legacy indexing goes to to support "outer indexing-like" behavior in the common case of a single integer array and many slices, I am hesitant to start warning in this case. The result of arr[..., idx, :] is relatively easy to understand, even though it uses its own set of rules, which happen to be more consistent with oindex[] than vindex[]. We certainly could make the conservative choice of only adopting 4 for now and leaving further cleanup for later. I guess this uncertainty about whether direct indexing should be more like vindex[] or oindex[] in the long term is a good argument for holding off on other warnings for now. But I think we are almost certainly going to want to make further warnings/deprecations of some form.

On Tue, Jun 26, 2018 at 9:50 PM Stephan Hoyer <shoyer@gmail.com> wrote:
I'd have to deep dive through my email archive to double check, but I'm pretty sure this is intentional design, not coincidence. There is a long-standing pattern of using the first axes as the "collection" axes when the objects that we are concerned with are vectors or matrices or more. For example, evaluate a scalar field on a grid in 3D space (nx, ny, nz), then the gradient at those points is usually represented as (nx, ny, nz, 3). It is desirable to be able to apply the same indices to the scalar grid and the vector grid to select out the scalar and vector values at the same set of points. It's why we implicitly tack on empty slices to the end of any partial index tuple (e.g. with just integer scalars). The current rules for mixing slices and integer array indices are possibly the simplest way to effect this use case; it is the behaviors for the other cases that are the unhappy coincidences. 3. Warn if [] would give a different result from .vindex[]. This is a
I'd prefer 4, could be talked into 3, but any higher is not a good idea, I don't think. -- Robert Kern

On Tue, Jun 26, 2018 at 10:22 PM Robert Kern <robert.kern@gmail.com> wrote:
OK, I think 4 is the safe option for now. Eventually, I want either 1 or 3. But: - We don't agree yet on whether the right long-term solution would be for [] to support vectorized indexing, outer indexing or neither. - This will certainly cause some amount of churn, so let's save it for later when vindex/oindex are widely used and libraries don't need to worry about whether they're available or not they are available in all NumPy versions they support.

Boolean indices are not supported. All indices must be integers, integer arrays or slices.
I would hope that there’s at least some way to do boolean indexing. I often find myself needing it. I realise that `arr.vindex[np.nonzero(boolean_idx)]` works, but it is slightly too verbose for my liking. Maybe we can have `arr.bindex[boolean_index]` as an alias to exactly that? Or is boolean indexing preserved as-is n the newest proposal? If so, great! Another thing I’d say is `arr.?index` should be replaced with `arr.?idx`. I personally prefer `arr.?x` for my fingers but I realise that for someone not super into NumPy indexing, this is kind of opaque to read, so I propose this less verbose but hopefully equally clear version, for my (and others’) brains. Best Regards, Hameer Abbasi Sent from Astro <https://www.helloastro.com> for Mac

Another thing I’d say is arr.?index should be replaced with arr.?idx. Or perhaps arr.o_[] and arr.v_[], to match the style of our existing np.r_, np.c_, np.s_, etc?

I actually had to think a lot, read docs, use SO and so on to realise what those meant the first time around, I didn’t understand them on sight. And I had to keep coming back to the docs from time to time as I wasn’t exactly using them too much (for exactly this reason, when some problems could be solved more simply by doing just that). I’d prefer something that sticks in your head and “underscore” for “indexing” didn't do that for me. Of course, this was my experience as a first-timer. I’d prefer not to up the learning curve for others in the same situation. An experienced user might disagree. :-) Best Regards, Hameer Abbasi Sent from Astro <https://www.helloastro.com> for Mac On 26. Jun 2018 at 10:28, Eric Wieser <wieser.eric+numpy@gmail.com> wrote: Another thing I’d say is arr.?index should be replaced with arr.?idx. Or perhaps arr.o_[] and arr.v_[], to match the style of our existing np.r_, np.c_, np.s_, etc? _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

On Tue, 2018-06-26 at 04:23 -0400, Hameer Abbasi wrote:
That part is limited to `vindex` only. A single boolean index would always work in plain indexing and you can mix it all up inside of `oindex`. But with fancy indexing mixing boolean + integer seems currently pretty much useless (and thus the same is true for `vindex`, in `oindex` things make sense). Now you could invent some new logic for such a mixing case in `vindex`, but it seems easier to just ignore it for the moment. - Sebastian

Generally +1 on this, but I don’t think we need To ensure that existing subclasses of ndarray that override indexing do not inadvertently revert to default behavior for indexing attributes, these attribute should have explicit checks that disable them if __getitem__ or __setitem__ has been overriden. Repeating my proposal from github, I think we should introduce some internal indexing objects - something simple like: # np.core.*class Indexer(object): # importantly not iterable def __init__(self, value): self.value = valueclass OrthogonalIndexer(Indexer): passclass VectorizedIndexer(Indexer): pass Keeping the proposed syntax, we’d implement: - arr.oindex[ind] as arr[np.core.OrthogonalIndexer(ind)] - arr.vindex[ind] as arr[np.core.VectorizedIndexer(ind)] This means that subclasses like the following class LoggingIndexer(np.ndarray): def __getitem__(self, ind): ret = super().__getitem__(ind) print("Got an index") return ret will continue to work without issues. This includes np.ma.MaskedArray and np.memmap, so this already has value internally. For classes like np.matrix which inspect the index object itself, an error will still be raised from __getitem__, since it looks nothing like the values normally passed - most likely of the form TypeError: 'numpy.core.VectorizedIndexer' object does not support indexing TypeError: 'numpy.core.VectorizedIndexer' object is not iterable This could potentially be caught in oindex.__getitem__ and converted into a more useful error message. So to summarize the benefits of the above tweaks: - Pass-through subclasses get the new behavior for free - No additional descriptor helpers are needed to let non-passthrough subclasses implement the new indexable attributes - only a change to __getitem__ is needed And the costs: - A less clear error message when new indexing is used on old types (can chain with a more useful exception on python 3) - Class construction overhead for indexing via the attributes (skippable for base ndarray if significant) Eric On Mon, 25 Jun 2018 at 14:30 Stephan Hoyer <shoyer@gmail.com> wrote:

rr, cc = coords.T # coords is an (n, 2) array of integer coordinates values = image[rr, cc] Are you saying that this use is deprecated? Because we love it at scikit- image. I would be very very very sad to lose this syntax.
Other general comments: - oindex in general seems very intuitive and I'm :+1: - I would much prefer some extremely compact notation such as arr.ox[] and arr.vx.- Depending on the above concern I am either -1 or (-1/0) on the deprecation. Deprecating (all) old vindex behaviour doesn't seem to bring many benefits while potentially causing a lot of pain to downstream libraries. Juan.

On Mon, Jun 25, 2018 at 11:29 PM Andrew Nelson <andyfaff@gmail.com> wrote:
And thirded. This should not be considered deprecated or discouraged. As I mentioned in the previous iteration of this discussion, this is the behavior I want more often than the orthogonal indexing. It's a really common way to work with images and other kinds of raster data, so I don't think it should be relegated to the "officially discouraged" ghetto of `.legacy_index`. It should not issue warnings or (eventual) errors. I would reserve warnings for the cases where the current behavior is something no one really wants, like mixing slices and integer arrays. -- Robert Kern

I don't think it should be relegated to the "officially discouraged" ghetto of `.legacy_index`
The way I read it, the new spelling lof that would be the explicit but not discouraged `image.vindex[rr, cc]`.
I would reserve warnings for the cases where the current behavior is something no one really wants, like mixing slices and integer arrays.
These are the cases that would only be available under `legacy_index`. Eric On Mon, 25 Jun 2018 at 23:54 Robert Kern <robert.kern@gmail.com> wrote:

On Tue, 26 Jun 2018 at 17:12, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
If I'm understanding correctly what can be achieved now by `arr[rr, cc]` would have to be modified to use `arr.vindex[rr, cc]`, which is a very large change in behaviour. I suspect that there a lot of situations out there which use `arr[idxs]` where `idxs` can mean one of a range of things depending on the code path followed. If any of those change, or a mix of nomenclatures are required to access the different cases, then havoc will probably ensue.

On Tue, 2018-06-26 at 17:30 +1000, Andrew Nelson wrote:
Yes, that is true, but I doubt you will find a lot of code path that need the current indexing as opposed to vindex here, and the idea was to have a method to get the old behaviour indefinitely. You will need to add the `.vindex`, but that should be the only code change needed, and it would be easy to find where with errors/warnings. I see a possible problem with code that has to work on different numpy versions, but only in meaning we need to delay deprecations. The only thing I could imagine where this might happen is if you forward someone elses indexing objects and different users are used to different results. Otherwise, there is mostly one case which would get annoying, and that is `arr[:, rr, cc]` since `arr.vindex[:, rr, cc]` would not be exactly the same. Because, yes, in some cases the current logic is convenient, just incredibly surprising as well. - Sebastian

On Tue, Jun 26, 2018 at 12:58 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
That's probably true! But I think it's besides the point. I'd wager that most code paths that will use .vindex would work perfectly well with current indexing, too. Most of the time, people aren't getting into the hairy corners of advanced indexing. Adding to the toolbox is great, but I don't see a good reason to take out the ones that are commonly used quite safely.
It's not necessarily hard; it's just churn for no benefit to the downstream code. They didn't get a new feature; they just have to run faster to stay in the same place. -- Robert Kern

On Tue, 2018-06-26 at 01:21 -0700, Robert Kern wrote:
On Tue, Jun 26, 2018 at 12:58 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
<snip>
Right, the proposal was to have DeprecationWarnings when they differ, now I also thought DeprecationWarnings on two advanced indexes in general is good, because it is good for new users. I have to agree with your argument that most of the confused should be running into broadcast errors (if they expect oindex vs. fancy). So I see this as a point that we likely should just limit ourselves at least for now to the cases for example with sudden transposing going on. However, I would like to point out that the reason for the more broad warnings is that it could allow warping normal indexing at some point. Also it decreases traps with array-likes that behave differently.
So, yes, it is annoying for quite a few projects that correctly use fancy indexing, but if we choose to not annoy you a little, we will have much less long term options which also includes such projects compatibility to new/current array-likes. So basically one point is: if we annoy scikit-image now, their code will work better for dask arrays in the future hopefully. - Sebastian

On Tue, Jun 26, 2018 at 1:36 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
I don't really understand this. You would discourage the "normal" syntax in favor of these more specific named syntaxes, so you can introduce different behavior for the "normal" syntax and encourage everyone to use it again? Just add more named syntaxes if you want new behavior! That's the beauty of the design underlying this NEP.
Also it decreases traps with array-likes that behave differently.
If we were to take this seriously, then no one should use a bare [] ever. I'll go on record as saying that array-likes should respond to `a[rr, cc]`, as in Juan's example, with the current behavior. And if they don't, they don't deserve to be operated on by skimage functions. If I'm reading the NEP correctly, the main thrust of the issue with array-likes is that it is difficult for some of them to implement the full spectrum of indexing possibilities. This NEP does not actually make it *easier* for those array-likes to implement every possibility. It just offers some APIs that more naturally express common use cases which can sometimes be implemented more naturally than if expressed in the current indexing. For instance, you can achieve the same effect as orthogonal indexing with the current implementation, but you have to manipulate the indices before you pass them over to __getitem__(), losing information along the way that could be used to make a more efficient lookup in some array-likes. The NEP design is essentially more of a way to give these array-likes standard places to raise NotImplementedError than it is to help them get rid of all of their NotImplementedErrors. More specifically, if these array-likes can't implement `a[rr, cc]`, they're not going to implement `a.vindex[rr, cc]`, either. I think most of the problems that caused these libraries to make different choices in their __getitem__() implementation are due to the fact that these expressive APIs didn't exist, so they had to shoehorn them into __getitem__(); orthogonal indexing was too useful and efficient not to implement! I think that once we have .oindex and .vindex out there, they will be able to clean up their __getitem__()s to consistently support whatever of the current behavior that they can and raise NotImplementedError where they can't. -- Robert Kern

On Tue, 2018-06-26 at 02:27 -0700, Robert Kern wrote:
Right, it helps mostly to be clear about what an object can and cannot do. So h5py or whatever could error out for plain indexing and only support `.oindex`, and we have all options cleanly available. And yes, I agree that in itself is a big step forward. The thing is there are also very strong opinions that the fancy indexing behaviour is so confusing that it would ideally not be the default since it breaks comparing analogy slice objects. So, personally, I would argue that if we were to start over from scratch, fancy indexing (multiple indexes), would not be the default plain indexing behaviour. Now, maybe the pain of a few warnings is too high, but if we wish to move, no matter how slowly, in such regard, we will have to swallow it eventually. The suggestion was to make that as easy as possible with adding an attribute indefinitely. Otherwise, even a possible numpy replacement might have difficulties to chose a different default for indexing for years to come... Practically, I guess some warnings might have to wait a longer while, just because it could be almost impossible to avoid them in code working with different numpy versions. - Sebastian

On Tue, Jun 26, 2018 at 3:50 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
Okay, great. Before we move on to your next point, can we agree that the array-likes aren't a motivating factor for deprecating the current behavior of __getitem__()?
So I think we've moved past the technical objections. In the post-NEP .oindex/.vindex order, everyone can get the behavior that they want. Your argument for deprecation is now just about what the default is, the semantics that get pride of place with the shortest spelling. I am sympathetic to the feeling like you wish you had a time machine to go fix a design with your new insight. But it seems to me that just changing which semantics are the default has relatively attenuated value while breaking compatibility for a fundamental feature of numpy has significant costs. Just introducing .oindex is the bulk of the value of this NEP. Everything else is window dressing. You have my sympathies, but not enough for me to consent to deprecation. You might get more of my sympathy a year or two from now when the community has had a chance to work with .oindex. It's entirely possible that everyone will leap to using .oindex (and .vindex only rarely), and we will be flooded with complaints that "I only use .oindex, but the name is so long it messes up the readability of my lengthy expressions". But it's also possible that it sort of fizzles: people use it, but maybe use .vindex more, or about the same. Or just keep on happily using neither. We don't know which of those futures are going to be true. Anecdatally, you want .oindex semantics most often; I would almost exclusively use .vindex. I don't know which of us is more representative. Probably neither. I maintain that considering deprecation is premature at this time. Please take it out of this NEP. Let us get a feel for how people actually use .oindex/.vindex. Then we can talk about deprecation. This NEP gets my enthusiastic approval, except for the deprecation. I will be happy to talk about deprecation with an open mind in a few years. With some more actual experience under our belt, rather than prediction and theory, we can be more confident about the approach we want to take. Deprecation is not a fundamental part of this NEP and can be decided independently at a later time. -- Robert Kern

On Tue, Jun 26, 2018 at 4:34 PM Robert Kern <robert.kern@gmail.com> wrote:
I agree, we should scale back most of the deprecations proposed in this NEP, leaving them for possible future work. In particular, you're not convinced yet that "outer indexing" is a more intuitive default indexing mode than "vectorized indexing", so it is premature to deprecate vectorized indexing behavior that conflicts with outer indexing. OK, fair enough. I would still like to include at least two more limited form of deprecation that I hope will be less controversial: - Mixed boolean/integer array indexing. This is not very intuitive nor useful, and I don't think I've ever seen it used. Usually "outer indexing" behavior is what is desired here. - Mixed array/slice indexing, for cases with arrays separated by slices so NumPy can't do the "intuitive" transpose on the output. As noted in the NEP, this is a common source of bugs. Users who want this should really switch to vindex. In the long term, although I agree with Sebastian that "outer indexing" is more intuitive for default indexing behavior, I would really like to eliminate the "dimension reordering" behavior of mixed array/slice indexing altogether. This is a weird special case that is different between indexing like array[...] from array.vindex[...]. So if we don't choose to deprecate all cases where [] and oindex[] are different, I would at least like to deprecate all cases where [] and vindex[] are different.

On Tue, Jun 26, 2018 at 6:14 PM Stephan Hoyer <shoyer@gmail.com> wrote:
Actually, I do think outer indexing is more "intuitive"*, as far as that goes. It's just rarely what I actually want to accomplish. * I do not like using "intuitive" in programming. Nipples are intuitive. Everything else is learned. But in this case, I think that outer indexing is a more concordant extension of the concepts that a new numpy user would have learned earlier: integer indices and slices. I would still like to include at least two more limited form of deprecation
I'd still prefer not talking deprecation, per se, in this NEP (but my objection is weaker). I would definitely start adding in informative, noisy warnings in these cases, though. Along the lines of, "Hey, this is a dodgy construction that typically gives unexpected results. Here are .oindex/.vindex that might do what you actually want, but you can use .legacy_index if you just want to silence this warning". Rather than "Hey, this is going to go away at some point." -- Robert Kern

*They didn't get a new feature; they just have to run faster to stay in the same place.**
Let me start by thanking Robert for articulating my viewpoints far better than I could have done myself. I want to explicitly flag the following statements for endorsement: though: In [1]: from skimage import data In [2]: astro = data.astronaut() In [3]: astro.shape Out[3]: (512, 512, 3) In [4]: rr, cc = np.array([1, 3, 3, 3]), np.array([1, 8, 9, 10]) In [5]: astro[rr, cc].shape Out[5]: (4, 3) In [6]: astro[rr, cc, :].shape Out[6]: (4, 3) This does exactly what I would expect. Going back to the motivation for the NEP, I think this bit, emphasis mine, is crucial: * (I don't think of us highly enough to use the word "deserve", but I would say that we would hesitate to support arrays that don't use this convention.) * It is also probably true, as mentioned elsewhere, that we could go through our entire codebase and append `.vidx` to every array indexing op. Perhaps others on this list find this a reasonable request, but I don't. Aside from the churn involved, it would make our codebase significantly uglier and less readable. I should also emphasise that NumPy is really *the* foundational project for the entire Scientific Python ecosystem. Changing the meaning of [] should only be considered if it delivers an *extreme* benefit. Robert's statement would apply to a stupid number of projects. * :+10**6: To Sebastian's comment:
Let's get rid of the hopefully. Let NumPy implement .oindex and .vindex. Let Dask arrays do the same. Let's have an announcement on the scikit-image mailing list, "hey guys, if you switch all your indexing operations to .vindex, suddenly all of your library works with dask arrays!" At that point, we have a value proposition on our hands. Currently, it amounts to gambling with others' time. To Stephan's options that were sent while I was composing this:
Some options, in roughly descending order of severity:
I favour 4, or at the limit 3. (See use case above, which I would argue is totally unsurprising.) I'm happy that option 1 appears to be off the table. Hameer,

On Tue, Jun 26, 2018 at 10:21 PM Juan Nunez-Iglesias <jni.soma@gmail.com> wrote:
Yup, sorry, I didn't mean those. I meant when there is an explicit slice in between index arrays. (And maybe when index arrays follow slices; I'll need to think more on that.)
Ahem, yes, I was being provocative in a moment of weakness. May the array-like authors forgive me. -- Robert Kern

On Tue, 2018-06-26 at 22:26 -0700, Robert Kern wrote:
OK, sounds fine to me, I see that we just can't start planning for a possible long term future yet. I personally do not care really what the warnings itself say for now (Deprecation or not), larger packages will have to avoid them in any case though. But I guess we have a consent on a certain amount of warnings (probably will have to see how much they actually appear) and then can revisit in a longer while. - Sebastian

On Tue, Jun 26, 2018 at 12:13 AM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Okay, I missed that the first time through. I think having more self-contained descriptions of the semantics of each of these would be a good idea. The current description of `.vindex` spends more time talking about what it doesn't do, compared to the other methods, than what it does. Some more typical, less-exotic examples would be a good idea.
I'm still leaning towards not warning on current, unproblematic common uses. It's unnecessary churn for currently working, understandable code. I would still reserve warnings and deprecation for the cases where the current behavior gives us something that no one wants. Those are the real traps that people need to be warned away from. If someone is mixing slices and integer indices, that's a really good sign that they thought indexing behaved in a different way (e.g. orthogonal indexing). If someone is just using multiple index arrays that would currently not give an error, that's actually a really good sign that they are using it correctly and are getting the semantics that they desired. If they wanted orthogonal indexing, it is *really* likely that their index arrays would *not* broadcast together. And even if they did, the wrong shape of the result is one of the more easily noticed things. These are not silent errors that would motivate adding a new warning. -- Robert Kern

On Tue, Jun 26, 2018 at 12:46 AM Robert Kern <robert.kern@gmail.com> wrote:
Of course, I would definitely support adding more information to the various IndexError messages to point people to `.oindex` and `.vindex`. I think that would guide more people to correct their code than adding a new warning to code that currently executes (which is likely not erroneous), and it would cause no churn. -- Robert Kern

I second this design. If we were to consider the general case of a tuple `idx`, then we’d not be moving forward at all. Design changes would be impossible. I’d argue that this newer model would be easier for library maintainers overall (who are the kind of people using this), reducing maintenance cost in the long run because it’d lead to simpler code. I would also that the “internal” classes expressing outer as vectorised indexing etc. should be exposed, for maintainers of duck arrays to use. God knows how many utility functions I’ve had to write to avoid relying on undocumented NumPy internals for pydata/sparse, fearing that I’d have to rewrite/modify them when behaviour changes or I find other corner cases. Best Regards, Hameer Abbasi Sent from Astro <https://www.helloastro.com> for Mac On 26. Jun 2018 at 09:46, Robert Kern <robert.kern@gmail.com> wrote: On Tue, Jun 26, 2018 at 12:13 AM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
Okay, I missed that the first time through. I think having more self-contained descriptions of the semantics of each of these would be a good idea. The current description of `.vindex` spends more time talking about what it doesn't do, compared to the other methods, than what it does. Some more typical, less-exotic examples would be a good idea.
I'm still leaning towards not warning on current, unproblematic common uses. It's unnecessary churn for currently working, understandable code. I would still reserve warnings and deprecation for the cases where the current behavior gives us something that no one wants. Those are the real traps that people need to be warned away from. If someone is mixing slices and integer indices, that's a really good sign that they thought indexing behaved in a different way (e.g. orthogonal indexing). If someone is just using multiple index arrays that would currently not give an error, that's actually a really good sign that they are using it correctly and are getting the semantics that they desired. If they wanted orthogonal indexing, it is *really* likely that their index arrays would *not* broadcast together. And even if they did, the wrong shape of the result is one of the more easily noticed things. These are not silent errors that would motivate adding a new warning. -- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

I like the proposal generally. NumPy could use a good orthogonal indexing method and a vectorized-indexing method is fine too. Robert Kern is spot on with his concerns as well. Please do not change what arr[idx] does except to provide warnings and perhaps point people to new .oix and .vix methods. What indexing does is documented (if hard to understand and surprising in a particular sub-case). There is one specific place in the code where I would make a change to raise an error rather than change the order of the axes of the output to provide a consistent subspace. Even then, it should be done as a deprecation warning and then raise the error. Otherwise, just add the new methods and don't make any other changes until a major release. -Travis On Tue, Jun 26, 2018 at 2:03 AM Hameer Abbasi <einstein.edison@gmail.com> wrote:

On Tue, Jun 26, 2018 at 1:26 AM Travis Oliphant <teoliphant@gmail.com> wrote:
I'd suggest that the NEP explicitly disclaim deprecating current behavior. Let the NEP just be about putting the new features out there. Once we have some experience with them for a year or three, then let's talk about deprecating parts of the current behavior and make a new NEP then if we want to go that route. We're only contemplating *long* deprecation cycles anyways; we're not in a race. The success of these new features doesn't really rely on the deprecation of current indexing, so let's separate those issues. -- Robert Kern

I would disagree here. For libraries like Dask, XArray, pydata/sparse, XND, etc., it would be bad for them if there was continued use of “weird” indexing behaviour (no warnings means more code written that’s… well… not exactly the best design). Of course, we could just choose to not support it. But that means a lot of code won’t support us, or support us later than we desire. I agree with your design of “let’s limit the number of warnings/deprecations to cases that make very little sense” but there should be warnings. Specifically, I recommend warnings for mixed slices and fancy indexes, and warnings followed by errors for cases where the transposing behaviour occurs. Best Regards, Hameer Abbasi Sent from Astro <https://www.helloastro.com> for Mac On 26. Jun 2018 at 10:33, Robert Kern <robert.kern@gmail.com> wrote: On Tue, Jun 26, 2018 at 1:26 AM Travis Oliphant <teoliphant@gmail.com> wrote:
I'd suggest that the NEP explicitly disclaim deprecating current behavior. Let the NEP just be about putting the new features out there. Once we have some experience with them for a year or three, then let's talk about deprecating parts of the current behavior and make a new NEP then if we want to go that route. We're only contemplating *long* deprecation cycles anyways; we're not in a race. The success of these new features doesn't really rely on the deprecation of current indexing, so let's separate those issues. -- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

On Tue, Jun 26, 2018 at 1:49 AM Hameer Abbasi <einstein.edison@gmail.com> wrote:
On 26. Jun 2018 at 10:33, Robert Kern <robert.kern@gmail.com> wrote:
I'd suggest that the NEP explicitly disclaim deprecating current
behavior. Let the NEP just be about putting the new features out there. Once we have some experience with them for a year or three, then let's talk about deprecating parts of the current behavior and make a new NEP then if we want to go that route. We're only contemplating *long* deprecation cycles anyways; we're not in a race. The success of these new features doesn't really rely on the deprecation of current indexing, so let's separate those issues.
I would disagree here. For libraries like Dask, XArray, pydata/sparse,
XND, etc., it would be bad for them if there was continued use of “weird” indexing behaviour (no warnings means more code written that’s… well… not exactly the best design). Of course, we could just choose to not support it. But that means a lot of code won’t support us, or support us later than we desire.
I agree with your design of “let’s limit the number of
warnings/deprecations to cases that make very little sense” but there should be warnings. I'm still in favor of warnings in these cases. I didn't mean to suggest excluding those from the NEP. I just don't think they should be deprecations; we shouldn't suggest that they will eventually turn into errors. At least until we get these features out there, get some experience with them, then have a new NEP at that time just proposing deprecation. P.S. Would you mind bottom-posting? It helps maintain the context of what you are commenting on and my reply to those comments. I tried writing this reply without it, and it felt like it was missing context. Thanks! -- Robert Kern

On Tue, 2018-06-26 at 04:01 -0400, Hameer Abbasi wrote:
Could you list some examples what you would need? We can expose some of the internals, or maybe even provide funcs to map e.g. oindex to vindex or vindex to plain indexing, etc. but it would be helpful to know what downstream actually might need. For all I know the things that you are thinking of may not even exist... - Sebastian

We can expose some of the internals These could be expressed as methods on the internal indexing objects I proposed in the first reply to this thread, which has seen no responses. I think Hameer Abbasi is looking for something like OrthogonalIndexer(...).to_vindex() -> VectorizedIndexer such that arr.oindex[ind] selects the same elements as arr.vindex[OrthogonalIndexer(ind).to_vindex()] Eric On Tue, 26 Jun 2018 at 08:04 Sebastian Berg <sebastian@sipsolutions.net> wrote:

On Tue, Jun 26, 2018 at 9:38 AM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
It is probably worth noting that xarray already uses very similar classes internally for keeping track of indexing operations. See BasicIndexer, OuterIndexer and VectorizedIndexer: https://github.com/pydata/xarray/blob/v0.10.7/xarray/core/indexing.py#L295-L... This turns out to be pretty convenient model even when not using subclassing. In xarray, we use them internally in various "partial duck array" classes that do some lazy computation upon indexing with __getitem__. It's nice to simply be able to forward on Indexer objects rather than implement separate vindex/oindex methods. We also have utility functions for converting between different forms, e.g., from OuterIndexer to VectorizedIndexer: https://github.com/pydata/xarray/blob/v0.10.7/xarray/core/indexing.py#L654 I guess this is a case for using such classes internally in NumPy, and possibly for exposing them publicly as well.

On Tue, Jun 26, 2018 at 12:46 AM Robert Kern <robert.kern@gmail.com> wrote:
Will do.
I agree, but I'm still not entirely sure where to draw the line on behavior that should issue a warning. Some options, in roughly descending order of severity: 1. Warn if [] would give a different result than .oindex[]. This is the current proposal in the NEP, but based on the feedback we should hold back on it for now. 2. Warn if there is a mixture of arrays/slice objects in indices for [], even implicitly (e.g., including arr[idx] when is equivalent to arr[idx, :]). In this case, indices end up at the end both for legacy_index and vindex, but arguably that is only a happy coincidence. 3. Warn if [] would give a different result from .vindex[]. This is a little weaker than the previous condition, because arr[idx, :] or arr[idx, ...] would not give a warning. However, cases like arr[..., idx] or arr[:, idx, :] would still start to give warnings, even though they are arguably well defined according to either outer indexing (if idx.ndim == 1) or legacy indexing (due to dimension reordering rules that will be omitted from vindex). 4. Warn if there are multiple arrays/integer indices separated by a slice object, e.g., arr[idx1, :, idx2]. This is the edge case that really trips up users. As I said in my other response, in the long term, I would prefer to either (a) drop support for vectorized indexing in [] or (b) if we stick with supporting vectorized indexing in [], at least ensure consistent dimension ordering rules for [] and vindex[]. That would suggest using either my proposed rule 2 or 3. I also agree with you that anyone mixing slices and integers probably is confused about how indexing works, at least in edge cases. But given the lengths that legacy indexing goes to to support "outer indexing-like" behavior in the common case of a single integer array and many slices, I am hesitant to start warning in this case. The result of arr[..., idx, :] is relatively easy to understand, even though it uses its own set of rules, which happen to be more consistent with oindex[] than vindex[]. We certainly could make the conservative choice of only adopting 4 for now and leaving further cleanup for later. I guess this uncertainty about whether direct indexing should be more like vindex[] or oindex[] in the long term is a good argument for holding off on other warnings for now. But I think we are almost certainly going to want to make further warnings/deprecations of some form.

On Tue, Jun 26, 2018 at 9:50 PM Stephan Hoyer <shoyer@gmail.com> wrote:
I'd have to deep dive through my email archive to double check, but I'm pretty sure this is intentional design, not coincidence. There is a long-standing pattern of using the first axes as the "collection" axes when the objects that we are concerned with are vectors or matrices or more. For example, evaluate a scalar field on a grid in 3D space (nx, ny, nz), then the gradient at those points is usually represented as (nx, ny, nz, 3). It is desirable to be able to apply the same indices to the scalar grid and the vector grid to select out the scalar and vector values at the same set of points. It's why we implicitly tack on empty slices to the end of any partial index tuple (e.g. with just integer scalars). The current rules for mixing slices and integer array indices are possibly the simplest way to effect this use case; it is the behaviors for the other cases that are the unhappy coincidences. 3. Warn if [] would give a different result from .vindex[]. This is a
I'd prefer 4, could be talked into 3, but any higher is not a good idea, I don't think. -- Robert Kern

On Tue, Jun 26, 2018 at 10:22 PM Robert Kern <robert.kern@gmail.com> wrote:
OK, I think 4 is the safe option for now. Eventually, I want either 1 or 3. But: - We don't agree yet on whether the right long-term solution would be for [] to support vectorized indexing, outer indexing or neither. - This will certainly cause some amount of churn, so let's save it for later when vindex/oindex are widely used and libraries don't need to worry about whether they're available or not they are available in all NumPy versions they support.

Boolean indices are not supported. All indices must be integers, integer arrays or slices.
I would hope that there’s at least some way to do boolean indexing. I often find myself needing it. I realise that `arr.vindex[np.nonzero(boolean_idx)]` works, but it is slightly too verbose for my liking. Maybe we can have `arr.bindex[boolean_index]` as an alias to exactly that? Or is boolean indexing preserved as-is n the newest proposal? If so, great! Another thing I’d say is `arr.?index` should be replaced with `arr.?idx`. I personally prefer `arr.?x` for my fingers but I realise that for someone not super into NumPy indexing, this is kind of opaque to read, so I propose this less verbose but hopefully equally clear version, for my (and others’) brains. Best Regards, Hameer Abbasi Sent from Astro <https://www.helloastro.com> for Mac

Another thing I’d say is arr.?index should be replaced with arr.?idx. Or perhaps arr.o_[] and arr.v_[], to match the style of our existing np.r_, np.c_, np.s_, etc?

I actually had to think a lot, read docs, use SO and so on to realise what those meant the first time around, I didn’t understand them on sight. And I had to keep coming back to the docs from time to time as I wasn’t exactly using them too much (for exactly this reason, when some problems could be solved more simply by doing just that). I’d prefer something that sticks in your head and “underscore” for “indexing” didn't do that for me. Of course, this was my experience as a first-timer. I’d prefer not to up the learning curve for others in the same situation. An experienced user might disagree. :-) Best Regards, Hameer Abbasi Sent from Astro <https://www.helloastro.com> for Mac On 26. Jun 2018 at 10:28, Eric Wieser <wieser.eric+numpy@gmail.com> wrote: Another thing I’d say is arr.?index should be replaced with arr.?idx. Or perhaps arr.o_[] and arr.v_[], to match the style of our existing np.r_, np.c_, np.s_, etc? _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

On Tue, 2018-06-26 at 04:23 -0400, Hameer Abbasi wrote:
That part is limited to `vindex` only. A single boolean index would always work in plain indexing and you can mix it all up inside of `oindex`. But with fancy indexing mixing boolean + integer seems currently pretty much useless (and thus the same is true for `vindex`, in `oindex` things make sense). Now you could invent some new logic for such a mixing case in `vindex`, but it seems easier to just ignore it for the moment. - Sebastian
participants (8)
-
Andrew Nelson
-
Eric Wieser
-
Hameer Abbasi
-
Juan Nunez-Iglesias
-
Robert Kern
-
Sebastian Berg
-
Stephan Hoyer
-
Travis Oliphant