There are two different interpretations in common use of how to handle multi-valued (array/sequence) indexes. The numpy style is to consider all multi-valued indices together which allows arbitrary points to be extracted. The orthogonal style (e.g. as provided by netcdf4-python) is to consider each multi-valued index independently.
For example:
type(v)<type 'netCDF4.Variable'>>>> v.shape
(240, 37, 49)>>> v[(0, 1), (0, 2, 3)].shape (2, 3, 49)>>> np.array(v)[(0, 1), (0, 2, 3)].shape Traceback (most recent call last): File "<stdin>", line 1, in <module>IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,)
In a netcdf4-python GitHub issue https://github.com/Unidata/netcdf4-python/issues/385 the authors of various orthogonal indexing packages have been discussing how to distinguish the two behaviours and have currently settled on a boolean __orthogonal_indexing__ attribute.
1. Is there any appetite for adding that attribute (with the value `False`) to ndarray?
2. As suggested by shoyer https://github.com/Unidata/netcdf4-python/issues/385#issuecomment-87775034, is there any appetite for adding an alternative indexer to ndarray where __orthogonal_indexing__ = True? For example: myarray.ix_[(0,1), (0, 2, 3)]
Richard
On Wed, Apr 1, 2015 at 2:17 AM, R Hattersley rhattersley@gmail.com wrote:
There are two different interpretations in common use of how to handle multi-valued (array/sequence) indexes. The numpy style is to consider all multi-valued indices together which allows arbitrary points to be extracted. The orthogonal style (e.g. as provided by netcdf4-python) is to consider each multi-valued index independently.
For example:
type(v)<type 'netCDF4.Variable'>>>> v.shape
(240, 37, 49)>>> v[(0, 1), (0, 2, 3)].shape (2, 3, 49)>>> np.array(v)[(0, 1), (0, 2, 3)].shape Traceback (most recent call last): File "<stdin>", line 1, in <module>IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,)
In a netcdf4-python GitHub issue https://github.com/Unidata/netcdf4-python/issues/385 the authors of various orthogonal indexing packages have been discussing how to distinguish the two behaviours and have currently settled on a boolean __orthogonal_indexing__ attribute.
- Is there any appetite for adding that attribute (with the value
`False`) to ndarray?
- As suggested by shoyer
https://github.com/Unidata/netcdf4-python/issues/385#issuecomment-87775034, is there any appetite for adding an alternative indexer to ndarray where __orthogonal_indexing__ = True? For example: myarray.ix_[(0,1), (0, 2, 3)]
Is there any other package implementing non-orthogonal indexing aside from numpy? I understand that it would be nice to do:
if x.__orthogonal_indexing__: return x[idx] else: return x.ix_[idx]
But I think you would get the exact same result doing:
if isinstance(x, np.ndarray): return x[np.ix_(*idx)] else: return x[idx]
If `not x.__orthogonal_indexing__` is going to be a proxy for `isinstance(x, ndarray)` I don't really see the point of disguising it, explicit is better than implicit and all that.
If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all for improving that to provide the full functionality of "orthogonal indexing". I just need a little more convincing that those new attributes/indexers are going to ever see any real use.
Jaime
Richard
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Apr 1, 2015 at 7:06 AM, Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
Is there any other package implementing non-orthogonal indexing aside from numpy?
I think we can safely say that NumPy's implementation of broadcasting indexing is unique :).
The issue is that many other packages rely on numpy for implementation of custom array objects (e.g., scipy.sparse and scipy.io.netcdf). It's not immediately obvious what sort of indexing these objects represent.
If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all
for improving that to provide the full functionality of "orthogonal indexing". I just need a little more convincing that those new attributes/indexers are going to ever see any real use.
Orthogonal indexing is close to the norm for packages that implement labeled data structures, both because it's easier to understand and implement, and because it's difficult to maintain associations with labels through complex broadcasting indexing.
Unfortunately, the lack of a full featured implementation of orthogonal indexing has lead to that wheel being reinvented at least three times (in Iris, xray [1] and pandas). So it would be nice to have a canonical implementation that supports slices and integers in numpy for that reason alone. This could be done by building on the existing `np.ix_` function, but a new indexer seems more elegant: there's just much less noise with `arr.ix_[:1, 2, [3]]` than `arr[np.ix_(slice(1), 2, [3])]`.
It's also well known that indexing with __getitem__ can be much slower than np.take. It seems plausible to me that a careful implementation of orthogonal indexing could close or eliminate this speed gap, because the model for orthogonal indexing is so much simpler than that for broadcasting indexing: each element of the key tuple can be applied separately along the corresponding axis.
So I think there could be a real benefit to having the feature in numpy. In particular, if somebody is up for implementing it in C or Cython, I would be very pleased.
Cheers, Stephan
[1] Here is my implementation of remapping from orthogonal to broadcasting indexing. It works, but it's a real mess, especially because I try to optimize by minimizing the number of times slices are converted into arrays: https://github.com/xray/xray/blob/0d164d848401209971ded33aea2880c1fdc892cb/x...
On Do, 2015-04-02 at 01:29 -0700, Stephan Hoyer wrote:
On Wed, Apr 1, 2015 at 7:06 AM, Jaime Fernández del Río jaime.frio@gmail.com wrote: Is there any other package implementing non-orthogonal indexing aside from numpy?
I think we can safely say that NumPy's implementation of broadcasting indexing is unique :).
The issue is that many other packages rely on numpy for implementation of custom array objects (e.g., scipy.sparse and scipy.io.netcdf). It's not immediately obvious what sort of indexing these objects represent.
If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all for improving that to provide the full functionality of "orthogonal indexing". I just need a little more convincing that those new attributes/indexers are going to ever see any real use.
Orthogonal indexing is close to the norm for packages that implement labeled data structures, both because it's easier to understand and implement, and because it's difficult to maintain associations with labels through complex broadcasting indexing.
Unfortunately, the lack of a full featured implementation of orthogonal indexing has lead to that wheel being reinvented at least three times (in Iris, xray [1] and pandas). So it would be nice to have a canonical implementation that supports slices and integers in numpy for that reason alone. This could be done by building on the existing `np.ix_` function, but a new indexer seems more elegant: there's just much less noise with `arr.ix_[:1, 2, [3]]` than `arr[np.ix_(slice(1), 2, [3])]`.
It's also well known that indexing with __getitem__ can be much slower than np.take. It seems plausible to me that a careful implementation of orthogonal indexing could close or eliminate this speed gap, because the model for orthogonal indexing is so much simpler than that for broadcasting indexing: each element of the key tuple can be applied separately along the corresponding axis.
Wrong (sorry, couldn't resist ;)), since 1.9. take is not typically faster unless you have a small subspace ("subspace" are the non-indexed/slice-indexed axes, though I guess small subspace is common in some cases, i.e. Nx3 array), it should typically be noticeably slower for large subspaces at the moment.
Anyway, unfortunately while orthogonal indexing may seem simpler, as you probably noticed, mapping it fully featured to advanced indexing does not seem like a walk in the park due to how axis remapping works when you have a combination of slices and advanced indices.
It might be possible to basically implement a second MapIterSwapaxis in addition to adding extra axes to the inputs (which I think would need a post-processing step, but that is not that bad). If you do that, you can mostly reuse the current machinery and avoid most of the really annoying code blocks which set up the iterators for the various special cases. Otherwise, for hacking it of course you can replace the slices by arrays as well ;).
So I think there could be a real benefit to having the feature in numpy. In particular, if somebody is up for implementing it in C or Cython, I would be very pleased.
Cheers,
Stephan
[1] Here is my implementation of remapping from orthogonal to broadcasting indexing. It works, but it's a real mess, especially because I try to optimize by minimizing the number of times slices are converted into arrays: https://github.com/xray/xray/blob/0d164d848401209971ded33aea2880c1fdc892cb/x...
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Apr 2, 2015 at 1:29 AM, Stephan Hoyer shoyer@gmail.com wrote:
On Wed, Apr 1, 2015 at 7:06 AM, Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
Is there any other package implementing non-orthogonal indexing aside from numpy?
I think we can safely say that NumPy's implementation of broadcasting indexing is unique :).
The issue is that many other packages rely on numpy for implementation of custom array objects (e.g., scipy.sparse and scipy.io.netcdf). It's not immediately obvious what sort of indexing these objects represent.
If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all
for improving that to provide the full functionality of "orthogonal indexing". I just need a little more convincing that those new attributes/indexers are going to ever see any real use.
Orthogonal indexing is close to the norm for packages that implement labeled data structures, both because it's easier to understand and implement, and because it's difficult to maintain associations with labels through complex broadcasting indexing.
Unfortunately, the lack of a full featured implementation of orthogonal indexing has lead to that wheel being reinvented at least three times (in Iris, xray [1] and pandas). So it would be nice to have a canonical implementation that supports slices and integers in numpy for that reason alone. This could be done by building on the existing `np.ix_` function, but a new indexer seems more elegant: there's just much less noise with `arr.ix_[:1, 2, [3]]` than `arr[np.ix_(slice(1), 2, [3])]`.
It's also well known that indexing with __getitem__ can be much slower than np.take. It seems plausible to me that a careful implementation of orthogonal indexing could close or eliminate this speed gap, because the model for orthogonal indexing is so much simpler than that for broadcasting indexing: each element of the key tuple can be applied separately along the corresponding axis.
So I think there could be a real benefit to having the feature in numpy. In particular, if somebody is up for implementing it in C or Cython, I would be very pleased.
Cheers, Stephan
[1] Here is my implementation of remapping from orthogonal to broadcasting indexing. It works, but it's a real mess, especially because I try to optimize by minimizing the number of times slices are converted into arrays:
https://github.com/xray/xray/blob/0d164d848401209971ded33aea2880c1fdc892cb/x...
I believe you can leave all slices unchanged if you later reshuffle your axes. Basically all the fancy-indexed axes go in the front of the shape in order, and the subspace follows, e.g.:
a = np.arange(60).reshape(3, 4, 5) a[np.array([1])[:, None], ::2, np.array([1, 2, 3])].shape
(1, 3, 2)
So you would need to swap the second and last axes and be done. You would not get a contiguous array without a copy, but that's a different story. Assigning to an orthogonally indexed subarray is an entirely different beast, not sure if there is a use case for that.
We probably need more traction on the "should this be done?" discussion than on the "can this be done?" one, the need for a reordering of the axes swings me slightly in favor, but I mostly don't see it yet. Nathaniel usually has good insights on who we are, where do we come from, where are we going to, type of questions, would be good to have him chime in.
Jaime
On 2015/04/02 4:15 AM, Jaime Fernández del Río wrote:
We probably need more traction on the "should this be done?" discussion than on the "can this be done?" one, the need for a reordering of the axes swings me slightly in favor, but I mostly don't see it yet.
As a long-time user of numpy, and an advocate and teacher of Python for science, here is my perspective:
Fancy indexing is a horrible design mistake--a case of cleverness run amok. As you can read in the Numpy documentation, it is hard to explain, hard to understand, hard to remember. Its use easily leads to unreadable code and hard-to-see errors. Here is the essence of an example that a student presented me with just this week, in the context of reordering eigenvectors based on argsort applied to eigenvalues:
In [25]: xx = np.arange(2*3*4).reshape((2, 3, 4))
In [26]: ii = np.arange(4)
In [27]: print(xx[0]) [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]]
In [28]: print(xx[0, :, ii]) [[ 0 4 8] [ 1 5 9] [ 2 6 10] [ 3 7 11]]
Quickly now, how many numpy users would look at that last expression and say, "Of course, that is equivalent to transposing xx[0]"? And, "Of course that expression should give a completely different result from xx[0][:, ii]."?
I would guess it would be less than 1%. That should tell you right away that we have a real problem here. Fancy indexing can't be *read* by a sub-genius--it has to be laboriously figured out piece by piece, with frequent reference to the baffling descriptions in the Numpy docs.
So I think you should turn the question around and ask, "What is the actual real-world use case for fancy indexing?" How often does real code rely on it? I have taken advantage of it occasionally, maybe you have too, but I think a survey of existing code would show that the need for it is *far* less common than the need for simple orthogonal indexing. That tells me that it is fancy indexing, not orthogonal indexing, that should be available through a function and/or special indexing attribute. The question is then how to make that transition.
Eric
On Thu, Apr 2, 2015 at 11:03 AM, Eric Firing efiring@hawaii.edu wrote:
Fancy indexing is a horrible design mistake--a case of cleverness run amok. As you can read in the Numpy documentation, it is hard to explain, hard to understand, hard to remember.
Well put!
I also failed to correct predict your example.
So I think you should turn the question around and ask, "What is the actual real-world use case for fancy indexing?" How often does real code rely on it?
I'll just note that Indexing with a boolean array with the same shape as the array (e.g., x[x < 0] when x has greater than 1 dimension) technically falls outside a strict interpretation of orthogonal indexing. But there's not any ambiguity in adding that as an extension to orthogonal indexing (which otherwise does not allow ndim > 1), so I think your point still stands.
Stephan
The distinction that boolean indexing has over the other 2 methods of indexing is that it can guarantee that it references a position at most once. Slicing and scalar indexes are also this way, hence why these methods allow for in-place assignments. I don't see boolean indexing as an extension of orthogonal indexing because of that.
Ben Root
On Thu, Apr 2, 2015 at 2:41 PM, Stephan Hoyer shoyer@gmail.com wrote:
On Thu, Apr 2, 2015 at 11:03 AM, Eric Firing efiring@hawaii.edu wrote:
Fancy indexing is a horrible design mistake--a case of cleverness run amok. As you can read in the Numpy documentation, it is hard to explain, hard to understand, hard to remember.
Well put!
I also failed to correct predict your example.
So I think you should turn the question around and ask, "What is the actual real-world use case for fancy indexing?" How often does real code rely on it?
I'll just note that Indexing with a boolean array with the same shape as the array (e.g., x[x < 0] when x has greater than 1 dimension) technically falls outside a strict interpretation of orthogonal indexing. But there's not any ambiguity in adding that as an extension to orthogonal indexing (which otherwise does not allow ndim > 1), so I think your point still stands.
Stephan
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Apr 2, 2015 at 2:03 PM, Eric Firing efiring@hawaii.edu wrote:
On 2015/04/02 4:15 AM, Jaime Fernández del Río wrote:
We probably need more traction on the "should this be done?" discussion than on the "can this be done?" one, the need for a reordering of the axes swings me slightly in favor, but I mostly don't see it yet.
As a long-time user of numpy, and an advocate and teacher of Python for science, here is my perspective:
Fancy indexing is a horrible design mistake--a case of cleverness run amok. As you can read in the Numpy documentation, it is hard to explain, hard to understand, hard to remember. Its use easily leads to unreadable code and hard-to-see errors. Here is the essence of an example that a student presented me with just this week, in the context of reordering eigenvectors based on argsort applied to eigenvalues:
In [25]: xx = np.arange(2*3*4).reshape((2, 3, 4))
In [26]: ii = np.arange(4)
In [27]: print(xx[0]) [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]]
In [28]: print(xx[0, :, ii]) [[ 0 4 8] [ 1 5 9] [ 2 6 10] [ 3 7 11]]
Quickly now, how many numpy users would look at that last expression and say, "Of course, that is equivalent to transposing xx[0]"? And, "Of course that expression should give a completely different result from xx[0][:, ii]."?
I would guess it would be less than 1%. That should tell you right away that we have a real problem here. Fancy indexing can't be *read* by a sub-genius--it has to be laboriously figured out piece by piece, with frequent reference to the baffling descriptions in the Numpy docs.
So I think you should turn the question around and ask, "What is the actual real-world use case for fancy indexing?" How often does real code rely on it? I have taken advantage of it occasionally, maybe you have too, but I think a survey of existing code would show that the need for it is *far* less common than the need for simple orthogonal indexing. That tells me that it is fancy indexing, not orthogonal indexing, that should be available through a function and/or special indexing attribute. The question is then how to make that transition.
Swapping the axis when slices are mixed with fancy indexing was a design mistake, IMO. But not fancy indexing itself.
np.triu_indices(5)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4], dtype=int64), array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4], dtype=int64))
m = np.arange(25).reshape(5, 5)[np.triu_indices(5)] m
array([ 0, 1, 2, 3, 4, 6, 7, 8, 9, 12, 13, 14, 18, 19, 24])
m2 = np.zeros((5,5)) m2[np.triu_indices(5)] = m m2
array([[ 0., 1., 2., 3., 4.], [ 0., 6., 7., 8., 9.], [ 0., 0., 12., 13., 14.], [ 0., 0., 0., 18., 19.], [ 0., 0., 0., 0., 24.]])
(I don't remember what's "fancy" in indexing, just that broadcasting rules apply.)
Josef
Eric
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 2015/04/02 10:22 AM, josef.pktd@gmail.com wrote:
Swapping the axis when slices are mixed with fancy indexing was a design mistake, IMO. But not fancy indexing itself.
I'm not saying there should be no fancy indexing capability; I am saying that it should be available through a function or method, rather than via the square brackets. Square brackets should do things that people expect them to do--the most common and easy-to-understand style of indexing.
Eric
On 02-Apr-15 4:35 PM, Eric Firing wrote:
On 2015/04/02 10:22 AM, josef.pktd@gmail.com wrote:
Swapping the axis when slices are mixed with fancy indexing was a design mistake, IMO. But not fancy indexing itself.
I'm not saying there should be no fancy indexing capability; I am saying that it should be available through a function or method, rather than via the square brackets. Square brackets should do things that people expect them to do--the most common and easy-to-understand style of indexing.
Eric
+1
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 03 Apr 2015, at 00:04, Colin J. Williams cjw@ncf.ca wrote:
On 02-Apr-15 4:35 PM, Eric Firing wrote:
On 2015/04/02 10:22 AM, josef.pktd@gmail.com wrote:
Swapping the axis when slices are mixed with fancy indexing was a design mistake, IMO. But not fancy indexing itself.
I'm not saying there should be no fancy indexing capability; I am saying that it should be available through a function or method, rather than via the square brackets. Square brackets should do things that people expect them to do--the most common and easy-to-understand style of indexing.
Eric
+1
Well, I have written quite a bit of code that relies on fancy indexing, and I think the question, if the behaviour of the [] operator should be changed has sailed with numpy now at version 1.9. Given the amount packages that rely on numpy, changing this fundamental behaviour would not be a clever move.
If people want to implement orthogonal indexing with another method, by all means I might use it at some point in the future. However, adding even more complexity to the behaviour of the bracket slicing is probably not a good idea.
Hanno
On 2015/04/02 1:14 PM, Hanno Klemm wrote:
Well, I have written quite a bit of code that relies on fancy indexing, and I think the question, if the behaviour of the [] operator should be changed has sailed with numpy now at version 1.9. Given the amount packages that rely on numpy, changing this fundamental behaviour would not be a clever move.
Are you *positive* that there is no clever way to make a transition? It's not worth any further thought?
If people want to implement orthogonal indexing with another method, by all means I might use it at some point in the future. However, adding even more complexity to the behaviour of the bracket slicing is probably not a good idea.
I'm not advocating adding even more complexity, I'm trying to think about ways to make it *less* complex from the typical user's standpoint.
Eric
On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efiring@hawaii.edu wrote:
On 2015/04/02 1:14 PM, Hanno Klemm wrote:
Well, I have written quite a bit of code that relies on fancy indexing, and I think the question, if the behaviour of the [] operator should be changed has sailed with numpy now at version 1.9. Given the amount packages that rely on numpy, changing this fundamental behaviour would not be a clever move.
Are you *positive* that there is no clever way to make a transition? It's not worth any further thought?
I guess it would be similar to python 3 string versus bytes, but without the overwhelming benefits.
I don't think I would be in favor of deprecating fancy indexing even if it were possible. In general, my impression is that if there is a trade-off in numpy between powerful machinery versus easy to learn and teach, then the design philosophy when in favor of power.
I think numpy indexing is not too difficult and follows a consistent pattern, and I completely avoid mixing slices and index arrays with ndim > 2.
I think it should be DOA, except as a discussion topic for numpy 3000.
just my opinion
Josef
If people want to implement orthogonal indexing with another method, by all means I might use it at some point in the future. However, adding even more complexity to the behaviour of the bracket slicing is probably not a good idea.
I'm not advocating adding even more complexity, I'm trying to think about ways to make it *less* complex from the typical user's standpoint.
Eric _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Apr 2, 2015 at 9:09 PM, josef.pktd@gmail.com wrote:
On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efiring@hawaii.edu wrote:
On 2015/04/02 1:14 PM, Hanno Klemm wrote:
Well, I have written quite a bit of code that relies on fancy indexing, and I think the question, if the behaviour of the [] operator should be changed has sailed with numpy now at version 1.9. Given the amount packages that rely on numpy, changing this fundamental behaviour would not be a clever move.
Are you *positive* that there is no clever way to make a transition? It's not worth any further thought?
I guess it would be similar to python 3 string versus bytes, but without the overwhelming benefits.
I don't think I would be in favor of deprecating fancy indexing even if it were possible. In general, my impression is that if there is a trade-off in numpy between powerful machinery versus easy to learn and teach, then the design philosophy when in favor of power.
I think numpy indexing is not too difficult and follows a consistent pattern, and I completely avoid mixing slices and index arrays with ndim > 2.
I think it should be DOA, except as a discussion topic for numpy 3000.
just my opinion
is this fancy?
vals
array([6, 5, 4, 1, 2, 3])
a+b
array([[3, 2, 1, 0], [4, 3, 2, 1], [5, 4, 3, 2]])
vals[a+b]
array([[1, 4, 5, 6], [2, 1, 4, 5], [3, 2, 1, 4]])
https://github.com/scipy/scipy/blob/v0.14.0/scipy/linalg/special_matrices.py...
(I thought about this because I was looking at accessing off-diagonal elements, m2[np.arange(4), np.arange(4) + 1] )
How would you find all the code that would not be correct anymore with a changed definition of indexing and slicing, if there is insufficient test coverage and it doesn't raise an exception? If we find it, who fixes all the legacy code? (I don't think it will be minor unless there is a new method `fix_[...]` (fancy ix)
Josef
Josef
If people want to implement orthogonal indexing with another method, by all means I might use it at some point in the future. However, adding even more complexity to the behaviour of the bracket slicing is probably not a good idea.
I'm not advocating adding even more complexity, I'm trying to think about ways to make it *less* complex from the typical user's standpoint.
Eric _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Apr 2, 2015 at 6:35 PM, josef.pktd@gmail.com wrote:
(I thought about this because I was looking at accessing off-diagonal elements, m2[np.arange(4), np.arange(4) + 1] )
Psst: np.diagonal(m2, offset=1)
On Thu, Apr 2, 2015 at 11:30 PM, Nathaniel Smith njs@pobox.com wrote:
On Thu, Apr 2, 2015 at 6:35 PM, josef.pktd@gmail.com wrote:
(I thought about this because I was looking at accessing off-diagonal elements, m2[np.arange(4), np.arange(4) + 1] )
Psst: np.diagonal(m2, offset=1)
It was just an example (banded or toeplitz) (I know how indexing works, kind off, but don't remember what diag or other functions are exactly doing.)
m2b = m2.copy() m2b[np.arange(4), np.arange(4) + 1]
array([ 1., 7., 13., 19.])
m2b[np.arange(4), np.arange(4) + 1] = np.nan m2b
array([[ 0., nan, 2., 3., 4.], [ 0., 6., nan, 8., 9.], [ 0., 0., 12., nan, 14.], [ 0., 0., 0., 18., nan], [ 0., 0., 0., 0., 24.]])
m2c = m2.copy() np.diagonal(m2c, offset=1) = np.nan
SyntaxError: can't assign to function call
dd = np.diagonal(m2c, offset=1) dd[:] = np.nan
Traceback (most recent call last): File "<pyshell#89>", line 1, in <module> dd[:] = np.nan ValueError: assignment destination is read-only
np.__version__
'1.9.2rc1'
m2d = m2.copy() m2d[np.arange(4)[::-1], np.arange(4) + 1] = np.nan
Josef
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
On Thu, Apr 2, 2015 at 6:09 PM, josef.pktd@gmail.com wrote:
On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efiring@hawaii.edu wrote:
On 2015/04/02 1:14 PM, Hanno Klemm wrote:
Well, I have written quite a bit of code that relies on fancy indexing, and I think the question, if the behaviour of the [] operator should be changed has sailed with numpy now at version 1.9. Given the amount packages that rely on numpy, changing this fundamental behaviour would not be a clever move.
Are you *positive* that there is no clever way to make a transition? It's not worth any further thought?
I guess it would be similar to python 3 string versus bytes, but without the overwhelming benefits.
I don't think I would be in favor of deprecating fancy indexing even if it were possible. In general, my impression is that if there is a trade-off in numpy between powerful machinery versus easy to learn and teach, then the design philosophy when in favor of power.
I think numpy indexing is not too difficult and follows a consistent pattern, and I completely avoid mixing slices and index arrays with ndim > 2.
I'm sure y'all are totally on top of this, but for myself, I would like to distinguish:
* fancy indexing with boolean arrays - I use it all the time and don't get confused; * fancy indexing with non-boolean arrays - horrendously confusing, almost never use it, except on a single axis when I can't confuse it with orthogonal indexing:
In [3]: a = np.arange(24).reshape(6, 4)
In [4]: a Out[4]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]])
In [5]: a[[1, 2, 4]] Out[5]: array([[ 4, 5, 6, 7], [ 8, 9, 10, 11], [16, 17, 18, 19]])
I also remember a discussion with Travis O where he was also saying that this indexing was confusing and that it would be good if there was some way to transition to what he called outer product indexing (I think that's the same as 'orthogonal' indexing).
I think it should be DOA, except as a discussion topic for numpy 3000.
I think there are two proposals here:
1) Add some syntactic sugar to allow orthogonal indexing of numpy arrays, no backward compatibility break.
That seems like a very good idea to me - were there any big objections to that?
2) Over some long time period, move the default behavior of np.array non-boolean indexing from the current behavior to the orthogonal behavior.
That is going to be very tough, because it will cause very confusing breakage of legacy code.
On the other hand, maybe it is worth going some way towards that, like this:
* implement orthogonal indexing as a method arr.sensible_index[...] * implement the current non-boolean fancy indexing behavior as a method - arr.crazy_index[...] * deprecate non-boolean fancy indexing as standard arr[...] indexing; * wait a long time; * remove non-boolean fancy indexing as standard arr[...] (errors are preferable to change in behavior)
Then if we are brave we could:
* wait a very long time; * make orthogonal indexing the default.
But the not-brave steps above seem less controversial, and fairly reasonable.
What about that as an approach?
Cheers,
Matthew
On Thu, Apr 2, 2015 at 10:30 PM, Matthew Brett matthew.brett@gmail.com wrote:
Hi,
On Thu, Apr 2, 2015 at 6:09 PM, josef.pktd@gmail.com wrote:
On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efiring@hawaii.edu wrote:
On 2015/04/02 1:14 PM, Hanno Klemm wrote:
Well, I have written quite a bit of code that relies on fancy indexing, and I think the question, if the behaviour of the [] operator should be changed has sailed with numpy now at version 1.9. Given the amount packages that rely on numpy, changing this fundamental behaviour would not be a clever move.
Are you *positive* that there is no clever way to make a transition? It's not worth any further thought?
I guess it would be similar to python 3 string versus bytes, but without the overwhelming benefits.
I don't think I would be in favor of deprecating fancy indexing even if it were possible. In general, my impression is that if there is a trade-off in numpy between powerful machinery versus easy to learn and teach, then the design philosophy when in favor of power.
I think numpy indexing is not too difficult and follows a consistent pattern, and I completely avoid mixing slices and index arrays with ndim > 2.
I'm sure y'all are totally on top of this, but for myself, I would like to distinguish:
- fancy indexing with boolean arrays - I use it all the time and don't
get confused;
- fancy indexing with non-boolean arrays - horrendously confusing,
almost never use it, except on a single axis when I can't confuse it with orthogonal indexing:
In [3]: a = np.arange(24).reshape(6, 4)
In [4]: a Out[4]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]])
In [5]: a[[1, 2, 4]] Out[5]: array([[ 4, 5, 6, 7], [ 8, 9, 10, 11], [16, 17, 18, 19]])
I also remember a discussion with Travis O where he was also saying that this indexing was confusing and that it would be good if there was some way to transition to what he called outer product indexing (I think that's the same as 'orthogonal' indexing).
I think it should be DOA, except as a discussion topic for numpy 3000.
I think there are two proposals here:
- Add some syntactic sugar to allow orthogonal indexing of numpy
arrays, no backward compatibility break.
That seems like a very good idea to me - were there any big objections to that?
- Over some long time period, move the default behavior of np.array
non-boolean indexing from the current behavior to the orthogonal behavior.
That is going to be very tough, because it will cause very confusing breakage of legacy code.
On the other hand, maybe it is worth going some way towards that, like this:
- implement orthogonal indexing as a method arr.sensible_index[...]
- implement the current non-boolean fancy indexing behavior as a
method - arr.crazy_index[...]
- deprecate non-boolean fancy indexing as standard arr[...] indexing;
- wait a long time;
- remove non-boolean fancy indexing as standard arr[...] (errors are
preferable to change in behavior)
Then if we are brave we could:
- wait a very long time;
- make orthogonal indexing the default.
But the not-brave steps above seem less controversial, and fairly reasonable.
What about that as an approach?
I also thought the transition would have to be something like that or a clear break point, like numpy 3.0. I would be in favor something like this for the axis swapping case with ndim>2.
However, before going to that, you would still have to provide a list of behaviors that will be deprecated, and make a poll in various libraries for how much it is actually used.
My impression is that fancy indexing is used more often than orthogonal indexing (beyond the trivial case x[:, idx]). Also, many usecases for orthogonal indexing moved to using pandas, and numpy is left with non-orthogonal indexing use cases. And third, fancy indexing is a superset of orthogonal indexing (with proper broadcasting), and you still need to justify why everyone should be restricted to the subset instead of a voluntary constraint to use code that is easier to understand.
I checked numpy.random.choice which I would have implemented with fancy indexing, but it uses only `take`, AFAICS.
Switching to using a explicit method is not really a problem for maintained library code, but I still don't really see why we should do this.
Josef
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Apr 2, 2015 at 7:30 PM, Matthew Brett matthew.brett@gmail.com wrote:
Hi,
On Thu, Apr 2, 2015 at 6:09 PM, josef.pktd@gmail.com wrote:
On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efiring@hawaii.edu wrote:
On 2015/04/02 1:14 PM, Hanno Klemm wrote:
Well, I have written quite a bit of code that relies on fancy indexing, and I think the question, if the behaviour of the [] operator should be changed has sailed with numpy now at version 1.9. Given the amount packages that rely on numpy, changing this fundamental behaviour would not be a clever move.
Are you *positive* that there is no clever way to make a transition? It's not worth any further thought?
I guess it would be similar to python 3 string versus bytes, but without the overwhelming benefits.
I don't think I would be in favor of deprecating fancy indexing even if it were possible. In general, my impression is that if there is a trade-off in numpy between powerful machinery versus easy to learn and teach, then the design philosophy when in favor of power.
I think numpy indexing is not too difficult and follows a consistent pattern, and I completely avoid mixing slices and index arrays with ndim > 2.
I'm sure y'all are totally on top of this, but for myself, I would like to distinguish:
- fancy indexing with boolean arrays - I use it all the time and don't
get confused;
- fancy indexing with non-boolean arrays - horrendously confusing,
almost never use it, except on a single axis when I can't confuse it with orthogonal indexing:
In [3]: a = np.arange(24).reshape(6, 4)
In [4]: a Out[4]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]])
In [5]: a[[1, 2, 4]] Out[5]: array([[ 4, 5, 6, 7], [ 8, 9, 10, 11], [16, 17, 18, 19]])
I also remember a discussion with Travis O where he was also saying that this indexing was confusing and that it would be good if there was some way to transition to what he called outer product indexing (I think that's the same as 'orthogonal' indexing).
I think it should be DOA, except as a discussion topic for numpy 3000.
I think there are two proposals here:
- Add some syntactic sugar to allow orthogonal indexing of numpy
arrays, no backward compatibility break.
That seems like a very good idea to me - were there any big objections to that?
- Over some long time period, move the default behavior of np.array
non-boolean indexing from the current behavior to the orthogonal behavior.
That is going to be very tough, because it will cause very confusing breakage of legacy code.
On the other hand, maybe it is worth going some way towards that, like this:
- implement orthogonal indexing as a method arr.sensible_index[...]
- implement the current non-boolean fancy indexing behavior as a
method - arr.crazy_index[...]
- deprecate non-boolean fancy indexing as standard arr[...] indexing;
- wait a long time;
- remove non-boolean fancy indexing as standard arr[...] (errors are
preferable to change in behavior)
Then if we are brave we could:
- wait a very long time;
- make orthogonal indexing the default.
But the not-brave steps above seem less controversial, and fairly reasonable.
What about that as an approach?
Your option 1 was what was being discussed before the posse was assembled to bring fancy indexing before justice... ;-)
My background is in image processing, and I have used fancy indexing in all its fanciness far more often than orthogonal or outer product indexing. I actually have a vivid memory of the moment I fell in love with NumPy: after seeing a code snippet that ran a huge image through a look-up table by indexing the LUT with the image. Beautifully simple. And here http://stackoverflow.com/questions/12014186/fancier-fancy-indexing-in-numpy is a younger me, learning to ride NumPy without the training wheels.
Another obvious use case that you can find all over the place in scikit-image is drawing a curve on an image from the coordinates.
If there is such strong agreement on an orthogonal indexer, we might as well go ahead an implement it. But before considering any bolder steps, we should probably give it a couple of releases to see how many people out there really use it.
Jaime
P.S. As an aside on the remapping of axes when arrays and slices are mixed, there really is no better way. Once you realize that the array indexing a dimension does not have to be 1-D, it should clearly appear that what seems the obvious way does not generalize to the general case. E.g.:
One may rightfully think that:
a = np.arange(60).reshape(3, 4, 5) a[np.array([1])[:, None], ::2, [0, 1, 3]].shape
(1, 3, 2)
should not reorder the axes, and return an array of shape (1, 2, 3). But what do you do in the following case?
idx0 = np.random.randint(3, size=(10, 1, 10)) idx2 = np.random.randint(5, size=(1, 20, 1)) a[idx0, ::2, idx2].shape
(10, 20, 10, 2)
What is the right place for that 2 now?
Hi,
On Thu, Apr 2, 2015 at 8:20 PM, Jaime Fernández del Río jaime.frio@gmail.com wrote:
On Thu, Apr 2, 2015 at 7:30 PM, Matthew Brett matthew.brett@gmail.com wrote:
Hi,
On Thu, Apr 2, 2015 at 6:09 PM, josef.pktd@gmail.com wrote:
On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efiring@hawaii.edu wrote:
On 2015/04/02 1:14 PM, Hanno Klemm wrote:
Well, I have written quite a bit of code that relies on fancy indexing, and I think the question, if the behaviour of the [] operator should be changed has sailed with numpy now at version 1.9. Given the amount packages that rely on numpy, changing this fundamental behaviour would not be a clever move.
Are you *positive* that there is no clever way to make a transition? It's not worth any further thought?
I guess it would be similar to python 3 string versus bytes, but without the overwhelming benefits.
I don't think I would be in favor of deprecating fancy indexing even if it were possible. In general, my impression is that if there is a trade-off in numpy between powerful machinery versus easy to learn and teach, then the design philosophy when in favor of power.
I think numpy indexing is not too difficult and follows a consistent pattern, and I completely avoid mixing slices and index arrays with ndim > 2.
I'm sure y'all are totally on top of this, but for myself, I would like to distinguish:
- fancy indexing with boolean arrays - I use it all the time and don't
get confused;
- fancy indexing with non-boolean arrays - horrendously confusing,
almost never use it, except on a single axis when I can't confuse it with orthogonal indexing:
In [3]: a = np.arange(24).reshape(6, 4)
In [4]: a Out[4]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]])
In [5]: a[[1, 2, 4]] Out[5]: array([[ 4, 5, 6, 7], [ 8, 9, 10, 11], [16, 17, 18, 19]])
I also remember a discussion with Travis O where he was also saying that this indexing was confusing and that it would be good if there was some way to transition to what he called outer product indexing (I think that's the same as 'orthogonal' indexing).
I think it should be DOA, except as a discussion topic for numpy 3000.
I think there are two proposals here:
- Add some syntactic sugar to allow orthogonal indexing of numpy
arrays, no backward compatibility break.
That seems like a very good idea to me - were there any big objections to that?
- Over some long time period, move the default behavior of np.array
non-boolean indexing from the current behavior to the orthogonal behavior.
That is going to be very tough, because it will cause very confusing breakage of legacy code.
On the other hand, maybe it is worth going some way towards that, like this:
- implement orthogonal indexing as a method arr.sensible_index[...]
- implement the current non-boolean fancy indexing behavior as a
method - arr.crazy_index[...]
- deprecate non-boolean fancy indexing as standard arr[...] indexing;
- wait a long time;
- remove non-boolean fancy indexing as standard arr[...] (errors are
preferable to change in behavior)
Then if we are brave we could:
- wait a very long time;
- make orthogonal indexing the default.
But the not-brave steps above seem less controversial, and fairly reasonable.
What about that as an approach?
Your option 1 was what was being discussed before the posse was assembled to bring fancy indexing before justice... ;-)
Yes, sorry - I was trying to bring the argument back there.
My background is in image processing, and I have used fancy indexing in all its fanciness far more often than orthogonal or outer product indexing. I actually have a vivid memory of the moment I fell in love with NumPy: after seeing a code snippet that ran a huge image through a look-up table by indexing the LUT with the image. Beautifully simple. And here is a younger me, learning to ride NumPy without the training wheels.
Another obvious use case that you can find all over the place in scikit-image is drawing a curve on an image from the coordinates.
No question at all that it does have its uses - but then again, no-one thinks that it should not be available, only, maybe, in the very far future, not what you get by default...
Cheers,
Matthew
03.04.2015, 04:09, josef.pktd@gmail.com kirjoitti: [clip]
I think numpy indexing is not too difficult and follows a consistent pattern, and I completely avoid mixing slices and index arrays with ndim > 2.
I think it should be DOA, except as a discussion topic for numpy 3000.
If you change how Numpy indexing works, you need to scrap a nontrivial amount of existing code, at which point everybody should just go back to Matlab, which at least provides a stable API.
I have an all-Pyhton implementation of an OrthogonalIndexer class, loosely based on Stephan's code plus some axis remapping, that provides all the needed functionality for getting and setting with orthogonal indices.
Would those interested rather see it as a gist to play around with, or as a PR adding an orthogonally indexable `.ix_` argument to ndarray?
Jaime
On Fri, Apr 3, 2015 at 10:59 AM, Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
I have an all-Pyhton implementation of an OrthogonalIndexer class, loosely based on Stephan's code plus some axis remapping, that provides all the needed functionality for getting and setting with orthogonal indices.
Awesome, thanks!
Would those interested rather see it as a gist to play around with, or as a PR adding an orthogonally indexable `.ix_` argument to ndarray?
My preference would be for a PR (even if it's purely a prototype) because it supports inline comments better than a gist.
Stephan
On 2015/04/03 7:59 AM, Jaime Fernández del Río wrote:
I have an all-Pyhton implementation of an OrthogonalIndexer class, loosely based on Stephan's code plus some axis remapping, that provides all the needed functionality for getting and setting with orthogonal indices.
Excellent!
Would those interested rather see it as a gist to play around with, or as a PR adding an orthogonally indexable `.ix_` argument to ndarray?
I think the PR would be easier to test.
Eric
Jaime
-- (__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, Apr 3, 2015 at 10:59 AM, Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
I have an all-Pyhton implementation of an OrthogonalIndexer class, loosely based on Stephan's code plus some axis remapping, that provides all the needed functionality for getting and setting with orthogonal indices.
Would those interested rather see it as a gist to play around with, or as a PR adding an orthogonally indexable `.ix_` argument to ndarray?
A PR it is, #5749 https://github.com/numpy/numpy/pull/5749 to be precise. I think it has all the bells and whistles: integers, boolean and integer 1-D arrays, slices, ellipsis, and even newaxis, both for getting and setting. No tests yet, so correctness of the implementation is dubious at best. As a small example:
a = np.arange(60).reshape(3, 4, 5) a.ix_
<numpy.core._indexer.OrthogonalIndexer at 0x1027979d0>
a.ix_[[0, 1], :, [True, False, True, False, True]]
array([[[ 0, 2, 4], [ 5, 7, 9], [10, 12, 14], [15, 17, 19]],
[[20, 22, 24], [25, 27, 29], [30, 32, 34], [35, 37, 39]]])
a.ix_[[0, 1], :, [True, False, True, False, True]] = 0 a
array([[[ 0, 1, 0, 3, 0], [ 0, 6, 0, 8, 0], [ 0, 11, 0, 13, 0], [ 0, 16, 0, 18, 0]],
[[ 0, 21, 0, 23, 0], [ 0, 26, 0, 28, 0], [ 0, 31, 0, 33, 0], [ 0, 36, 0, 38, 0]],
[[40, 41, 42, 43, 44], [45, 46, 47, 48, 49], [50, 51, 52, 53, 54], [55, 56, 57, 58, 59]]])
Jaime
On So, 2015-04-05 at 00:45 -0700, Jaime Fernández del Río wrote:
On Fri, Apr 3, 2015 at 10:59 AM, Jaime Fernández del Río
<snip>
A PR it is, #5749 to be precise. I think it has all the bells and whistles: integers, boolean and integer 1-D arrays, slices, ellipsis, and even newaxis, both for getting and setting. No tests yet, so correctness of the implementation is dubious at best. As a small example:
Looks neat, I am sure there will be some details. Just a quick thought, I wonder if it might make sense to even introduce a context manager. Not sure how easy it is to make sure that it is thread safe, etc?
If the code is not too difficult, maybe it can even be moved to C. Though I have to think about it, I think currently we parse from first index to last, maybe it would be plausible to parse from last to first so that adding dimensions could be done easily inside the preparation function. The second axis remapping is probably reasonably easy (if, like the first thing, tedious).
- Sebastian
PS: One side comment about the discussion. I don't think anyone suggests that we should not/do not even consider proposals as such, even if it might looks like that. Not that I can compare, but my guess is that numpy is actually very open (though no idea if it appears like that, too).
But also to me it does seem like a lost cause to try to actually change indexing itself. So maybe that does not sound diplomatic, but without a specific reasoning about how the change does not wreak havoc, talking about switching indexing behaviour seems a waste time to me. Please try to surprise me, but until then....
a = np.arange(60).reshape(3, 4, 5) a.ix_
<snip>
Jaime
-- (__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On So, 2015-04-05 at 14:13 +0200, Sebastian Berg wrote:
On So, 2015-04-05 at 00:45 -0700, Jaime Fernández del Río wrote:
On Fri, Apr 3, 2015 at 10:59 AM, Jaime Fernández del Río
<snip> > > > A PR it is, #5749 to be precise. I think it has all the bells and > whistles: integers, boolean and integer 1-D arrays, slices, ellipsis, > and even newaxis, both for getting and setting. No tests yet, so > correctness of the implementation is dubious at best. As a small > example: >
Looks neat, I am sure there will be some details. Just a quick thought, I wonder if it might make sense to even introduce a context manager. Not sure how easy it is to make sure that it is thread safe, etc?
Also wondering, because while I think that actually changing numpy is probably impossible, I do think we can talk about something like:
np.enable_outer_indexing() or along the lines of: from numpy.future import outer_indexing
or some such, to do a module wide switch and maybe also allow at some point to make it easier to write code that is compatible between a possible followup such as blaze (or also pandas I guess), that uses incompatible indexing. I have no clue if this is technically feasible, though.
The python equivalent would be teaching someone to use:
from __future__ import division
even though you don't even tell them that python 3 exists ;), just because you like the behaviour more.
<snip>
a = np.arange(60).reshape(3, 4, 5) a.ix_
<snip> > > Jaime > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus > planes de dominación mundial. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Apr 1, 2015 2:17 AM, "R Hattersley" rhattersley@gmail.com wrote:
There are two different interpretations in common use of how to handle multi-valued (array/sequence) indexes. The numpy style is to consider all multi-valued indices together which allows arbitrary points to be extracted. The orthogonal style (e.g. as provided by netcdf4-python) is to consider each multi-valued index independently.
For example:
type(v)
<type 'netCDF4.Variable'>
v.shape
(240, 37, 49)
v[(0, 1), (0, 2, 3)].shape
(2, 3, 49)
np.array(v)[(0, 1), (0, 2, 3)].shape
Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,)
In a netcdf4-python GitHub issue the authors of various orthogonal indexing packages have been discussing how to distinguish the two behaviours and have currently settled on a boolean __orthogonal_indexing__ attribute.
I guess my feeling is that this attribute is a fine solution to the wrong problem. If I understand the situation correctly: users are writing two copies of their indexing code to handle two different array-duck-types (those that do broadcasting indexing and those that do Cartesian product indexing), and then have trouble knowing which set of code to use for a given object. The problem that __orthogonal_indexing__ solves is that it makes easier to decide which code to use. It works well for this, great.
But, the real problem here is that we have two different array duck types that force everyone to write their code twice. This is a terrible state of affairs! (And exactly analogous to the problems caused by np.ndarray disagreeing with np.matrix & scipy.sparse about the the proper definition of *, which PEP 465 may eventually alleviate.) IMO we should be solving this indexing problem directly, not applying bandaids to its symptoms, and the way to do that is to come up with some common duck type that everyone can agree on.
Unfortunately, AFAICT this means our only options here are to have some kind of backcompat break in numpy, some kind of backcompat break in pandas, or to do nothing and continue indefinitely with the status quo where the same indexing operation might silently return different results depending on the types passed in. All of these options have real costs for users, and it isn't at all clear to me what the relative costs will be when we dig into the details of our various options. So I'd be very happy to see worked out proposals for any or all of these approaches. It strikes me as really premature to be issuing proclamations about what changes might be considered. There is really no danger to *considering* a proposal; the worst case is that we end up rejecting it anyway, but based on better information.
-n
On Fri, Apr 3, 2015 at 4:54 PM, Nathaniel Smith njs@pobox.com wrote:
Unfortunately, AFAICT this means our only options here are to have some kind of backcompat break in numpy, some kind of backcompat break in pandas, or to do nothing and continue indefinitely with the status quo where the same indexing operation might silently return different results depending on the types passed in.
For what it's worth, DataFrame.__getitem__ is also pretty broken in pandas (even worse than in NumPy). Not even the pandas devs can keep straight how it works! https://github.com/pydata/pandas/issues/9595
So we'll probably need a backwards incompatible switch there at some point, too.
That said, the issues are somewhat different, and in my experience the strict label and integer based indexers .loc and .iloc work pretty well. I haven't heard any complaints about how they do cartesian indexing rather than fancy indexing.
Stephan
On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith njs@pobox.com wrote:
But, the real problem here is that we have two different array duck types that force everyone to write their code twice. This is a terrible state of affairs! (And exactly analogous to the problems caused by np.ndarray disagreeing with np.matrix & scipy.sparse about the the proper definition of *, which PEP 465 may eventually alleviate.) IMO we should be solving this indexing problem directly, not applying bandaids to its symptoms, and the way to do that is to come up with some common duck type that everyone can agree on.
Unfortunately, AFAICT this means our only options here are to have some kind of backcompat break in numpy, some kind of backcompat break in pandas, or to do nothing and continue indefinitely with the status quo where the same indexing operation might silently return different results depending on the types passed in. All of these options have real costs for users, and it isn't at all clear to me what the relative costs will be when we dig into the details of our various options.
I doubt that there is a reasonable way to quantify those costs, especially those of breaking backwards compatibility. If someone has a good method, I'd be interested though.
So I'd be very happy to see worked out proposals for any or all of these approaches. It strikes me as really premature to be issuing proclamations about what changes might be considered. There is really no danger to *considering* a proposal;
Sorry, I have to disagree. Numpy is already seen by some as having a poor track record on backwards compatibility. Having core developers say "propose some backcompat break to how indexing works and we'll consider it" makes our stance on that look even worse. Of course everyone is free to make any technical proposal they deem fit and we'll consider the merits of it. However I'd like us to be clear that we do care strongly about backwards compatibility and that the fundamentals of the core of Numpy (things like indexing, broadcasting, dtypes and ufuncs) will not be changed in backwards-incompatible ways.
Ralf
P.S. also not for a possible numpy 2.0 (or have we learned nothing from Python3?).
On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers ralf.gommers@gmail.com wrote:
On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith njs@pobox.com wrote:
But, the real problem here is that we have two different array duck types that force everyone to write their code twice. This is a terrible state of affairs! (And exactly analogous to the problems caused by np.ndarray disagreeing with np.matrix & scipy.sparse about the the proper definition of *, which PEP 465 may eventually alleviate.) IMO we should be solving this indexing problem directly, not applying bandaids to its symptoms, and the way to do that is to come up with some common duck type that everyone can agree on.
Unfortunately, AFAICT this means our only options here are to have some kind of backcompat break in numpy, some kind of backcompat break in pandas, or to do nothing and continue indefinitely with the status quo where the same indexing operation might silently return different results depending on the types passed in. All of these options have real costs for users, and it isn't at all clear to me what the relative costs will be when we dig into the details of our various options.
I doubt that there is a reasonable way to quantify those costs, especially those of breaking backwards compatibility. If someone has a good method, I'd be interested though.
I'm a little nervous about how easily this argument might turn into "either A or B is better but we can't be 100% *certain* which it is so instead of doing our best using the data available we should just choose B." Being a maintainer means accepting uncertainty and doing our best anyway.
But that said I'm still totally on board with erring on the side of caution (in particular, you can never go back and *un*break backcompat). An obvious challenge to anyone trying to take this forward (in any direction!) would definitely be to gather the most useful data possible. And it's not obviously impossible -- maybe one could do something useful by scanning ASTs of lots of packages (I have a copy of pypi if anyone wants it, that I downloaded with the idea of making some similar arguments for why core python should slightly break backcompat to allow overloading of a < b < c syntax), or adding instrumentation to numpy, or running small-scale usability tests, or surveying people, or ...
(I was pretty surprised by some of the data gathered during the PEP 465 process, e.g. on how common dot() calls are relative to existing built-in operators, and on its associativity in practice.)
So I'd be very happy to see worked out proposals for any or all of these approaches. It strikes me as really premature to be issuing proclamations about what changes might be considered. There is really no danger to *considering* a proposal;
Sorry, I have to disagree. Numpy is already seen by some as having a poor track record on backwards compatibility. Having core developers say "propose some backcompat break to how indexing works and we'll consider it" makes our stance on that look even worse. Of course everyone is free to make any technical proposal they deem fit and we'll consider the merits of it. However I'd like us to be clear that we do care strongly about backwards compatibility and that the fundamentals of the core of Numpy (things like indexing, broadcasting, dtypes and ufuncs) will not be changed in backwards-incompatible ways.
Ralf
P.S. also not for a possible numpy 2.0 (or have we learned nothing from Python3?).
I agree 100% that we should and do care strongly about backwards compatibility. But you're saying in one sentence that we should tell people that we won't consider backcompat breaks, and then in the next sentence that of course we actually will consider them (even if we almost always reject them). Basically, I think saying one thing and doing another is not a good way to build people's trust.
Core python broke backcompat on a regular basis throughout the python 2 series, and almost certainly will again -- the bar to doing so is *very* high, and they use elaborate mechanisms to ease the way (__future__, etc.), but they do it. A few months ago there was even some serious consideration given to changing py3 bytestring indexing to return bytestrings instead of integers. (Consensus was unsurprisingly that this was a bad idea, but there were core devs seriously exploring it, and no-one complained about the optics.)
It's true that numpy has something of a bad reputation in this area, and I think it's because until ~1.7 or so, we randomly broke stuff by accident on a pretty regular basis, even in "bug fix" releases. I think the way to rebuild that trust is to honestly say to our users that when we do break backcompat, we will never do it by accident, and we will do it only rarely, after careful consideration, with the smoothest transition possible, only in situations where we are convinced that it the net best possible solution for our users, and only after public discussion and getting buy-in from stakeholders (e.g. major projects affected). And then follow through on that to the best of our ability. We've certainly gotten a lot better at this over the last few years.
If we say we'll *never* break backcompat then we'll inevitably end up convincing some people that we're liars, just because one person's bugfix is another's backcompat break. (And they're right, it is a backcompat break; it's just one where the benefits of the fix obviously outweigh the cost of the break.) Or we could actually avoid breaking backcompat by descending into Knuth-style stasis... but even there notice that none of us are actually using Knuth's TeX, we all use forks like XeTeX that have further changes added, which goes to show how futile this would be.
In particular, I'd *not* willingly say that we'll never incompatibly change the core pieces of numpy, b/c I'm personally convinced that rewriting how e.g. dtypes work could be a huge win with minimal real-world breakage -- even though technically there's practically nothing we can touch there without breaking backcompat to some extent b/c dtype structs are all public, including even silly things like the ad hoc, barely-used refcounting system. OTOH I'm happy to say that we won't incompatibly change the core of how dtypes work except in ways that make the userbase glad that we did. How's that? :-)
-n
On Sat, Apr 4, 2015 at 9:54 AM, Nathaniel Smith njs@pobox.com wrote:
On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers ralf.gommers@gmail.com
wrote:
On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith njs@pobox.com wrote:
So I'd be very happy to see worked out proposals for any or all of these approaches. It strikes me as really premature to be issuing proclamations about what changes might be considered. There is really no danger to *considering* a proposal;
Sorry, I have to disagree. Numpy is already seen by some as having a
poor
track record on backwards compatibility. Having core developers say
"propose
some backcompat break to how indexing works and we'll consider it"
makes our
stance on that look even worse. Of course everyone is free to make any technical proposal they deem fit and we'll consider the merits of it. However I'd like us to be clear that we do care strongly about backwards compatibility and that the fundamentals of the core of Numpy (things
like
indexing, broadcasting, dtypes and ufuncs) will not be changed in backwards-incompatible ways.
Ralf
P.S. also not for a possible numpy 2.0 (or have we learned nothing from Python3?).
I agree 100% that we should and do care strongly about backwards compatibility. But you're saying in one sentence that we should tell people that we won't consider backcompat breaks, and then in the next sentence that of course we actually will consider them (even if we almost always reject them). Basically, I think saying one thing and doing another is not a good way to build people's trust.
There is a difference between politely considering what proposals people send us uninvited and inviting people to work on specific proposals. That is what Ralf was getting at.
-- Robert Kern
On Sat, Apr 4, 2015 at 2:15 AM, Robert Kern robert.kern@gmail.com wrote:
On Sat, Apr 4, 2015 at 9:54 AM, Nathaniel Smith njs@pobox.com wrote:
On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers ralf.gommers@gmail.com wrote:
On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith njs@pobox.com wrote:
So I'd be very happy to see worked out proposals for any or all of these approaches. It strikes me as really premature to be issuing proclamations about what changes might be considered. There is really no danger to *considering* a proposal;
Sorry, I have to disagree. Numpy is already seen by some as having a poor track record on backwards compatibility. Having core developers say "propose some backcompat break to how indexing works and we'll consider it" makes our stance on that look even worse. Of course everyone is free to make any technical proposal they deem fit and we'll consider the merits of it. However I'd like us to be clear that we do care strongly about backwards compatibility and that the fundamentals of the core of Numpy (things like indexing, broadcasting, dtypes and ufuncs) will not be changed in backwards-incompatible ways.
Ralf
P.S. also not for a possible numpy 2.0 (or have we learned nothing from Python3?).
I agree 100% that we should and do care strongly about backwards compatibility. But you're saying in one sentence that we should tell people that we won't consider backcompat breaks, and then in the next sentence that of course we actually will consider them (even if we almost always reject them). Basically, I think saying one thing and doing another is not a good way to build people's trust.
There is a difference between politely considering what proposals people send us uninvited and inviting people to work on specific proposals. That is what Ralf was getting at.
I mean, I get that Ralf read my bit quoted above and got worried that people would read it as "numpy core team announces they don't care about backcompat", which is fair enough. Sometimes people jump to all kinds of conclusions, esp. when confirmation bias meets skim-reading meets hastily-written emails.
But it's just not true that I read people's proposals out of politeness; I read them because I'm interested, because they might surprise us by being more practical/awesome/whatever than we expect, and because we all learn things by giving them due consideration regardless of the final outcome. So yeah, I do honestly do want to see people work on specific proposals for important problems (and this indexing thing strikes me as important), even proposals that involve breaking backcompat. Pretending otherwise would still be a lie, at least on my part. So the distinction you're making here doesn't help me much.
-n
On Sat, Apr 4, 2015 at 11:38 AM, Nathaniel Smith njs@pobox.com wrote:
On Sat, Apr 4, 2015 at 2:15 AM, Robert Kern robert.kern@gmail.com wrote:
On Sat, Apr 4, 2015 at 9:54 AM, Nathaniel Smith njs@pobox.com wrote:
On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers ralf.gommers@gmail.com wrote:
On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith njs@pobox.com
wrote:
So I'd be very happy to see worked out proposals for any or all of these approaches. It strikes me as really premature to be issuing proclamations about what changes might be considered. There
is
really no danger to *considering* a proposal;
Sorry, I have to disagree. Numpy is already seen by some as having a poor track record on backwards compatibility. Having core developers say "propose some backcompat break to how indexing works and we'll consider it"
makes
our stance on that look even worse. Of course everyone is free to make any technical proposal they deem fit and we'll consider the merits of it. However I'd like us to be clear that we do care strongly about
backwards
compatibility and that the fundamentals of the core of Numpy (things like indexing, broadcasting, dtypes and ufuncs) will not be changed in backwards-incompatible ways.
Ralf
P.S. also not for a possible numpy 2.0 (or have we learned nothing
from
Python3?).
I agree 100% that we should and do care strongly about backwards compatibility. But you're saying in one sentence that we should tell people that we won't consider backcompat breaks, and then in the next sentence that of course we actually will consider them (even if we almost always reject them). Basically, I think saying one thing and doing another is not a good way to build people's trust.
There is a difference between politely considering what proposals people send us uninvited and inviting people to work on specific proposals.
That is
what Ralf was getting at.
I mean, I get that Ralf read my bit quoted above and got worried that people would read it as "numpy core team announces they don't care about backcompat", which is fair enough. Sometimes people jump to all kinds of conclusions, esp. when confirmation bias meets skim-reading meets hastily-written emails.
But it's just not true that I read people's proposals out of politeness; I read them because I'm interested, because they might surprise us by being more practical/awesome/whatever than we expect, and because we all learn things by giving them due consideration regardless of the final outcome.
Thanks for explaining, good perspective.
So yeah, I do honestly do want to see people work on specific proposals for important problems (and this indexing thing strikes me as important), even proposals that involve breaking backcompat. Pretending otherwise would still be a lie, at least on my part. So the distinction you're making here doesn't help me much.
A change in semantics would help already. If you'd phrased it for example as:
"I'd personally be interested in seeing a description of what changes, including backwards-incompatible ones, would need to be made to numpy indexing behavior to resolve this situation. We could learn a lot from such an exercise.",
that would have invited the same investigation from interested people without creating worries about Numpy stability. And without potentially leading new enthusiastic contributors to believe that this is an opportunity to make an important change to Numpy: >99.9% chance that they'd be disappointed after having their well thought out proposal rejected.
Cheers, Ralf
-n
-- Nathaniel J. Smith -- http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Apr 4, 2015 10:54 AM, "Nathaniel Smith" njs@pobox.com wrote:
On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers ralf.gommers@gmail.com
wrote:
On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith njs@pobox.com wrote:
But, the real problem here is that we have two different array duck types that force everyone to write their code twice. This is a terrible state of affairs! (And exactly analogous to the problems caused by np.ndarray disagreeing with np.matrix & scipy.sparse about the the proper definition of *, which PEP 465 may eventually alleviate.) IMO we should be solving this indexing problem directly, not applying bandaids to its symptoms, and the way to do that is to come up with some common duck type that everyone can agree on.
Unfortunately, AFAICT this means our only options here are to have some kind of backcompat break in numpy, some kind of backcompat break in pandas, or to do nothing and continue indefinitely with the status quo where the same indexing operation might silently return different results depending on the types passed in. All of these options have real costs for users, and it isn't at all clear to me what the relative costs will be when we dig into the details of our various options.
I doubt that there is a reasonable way to quantify those costs,
especially
those of breaking backwards compatibility. If someone has a good
method, I'd
be interested though.
I'm a little nervous about how easily this argument might turn into "either A or B is better but we can't be 100% *certain* which it is so instead of doing our best using the data available we should just choose B." Being a maintainer means accepting uncertainty and doing our best anyway.
I think the burden of proof needs to be on the side proposing a change, and the more invasive the change the higher that burden needs to be.
When faced with a situation like this, where the proposed change will cause fundamental alterations to the most basic, high-level operation of numpy, and where the is an alternative approach with no backwards-compatibility issues, I think the burden of proof would necessarily be nearly impossibly large.
But that said I'm still totally on board with erring on the side of caution (in particular, you can never go back and *un*break backcompat). An obvious challenge to anyone trying to take this forward (in any direction!) would definitely be to gather the most useful data possible. And it's not obviously impossible -- maybe one could do something useful by scanning ASTs of lots of packages (I have a copy of pypi if anyone wants it, that I downloaded with the idea of making some similar arguments for why core python should slightly break backcompat to allow overloading of a < b < c syntax), or adding instrumentation to numpy, or running small-scale usability tests, or surveying people, or ...
(I was pretty surprised by some of the data gathered during the PEP 465 process, e.g. on how common dot() calls are relative to existing built-in operators, and on its associativity in practice.)
Surveys like this have the problem of small sample size and selection bias. Usability studies can't measure the effect of the compatibility break, not to mention the effect on numpy's reputation. This is considerably more difficult to scan existing projects for than .dot because it depends on the type being passed (which may not even be defined in the same project). And I am not sure I much like the idea of numpy "phoning home" by default, and an opt-in had the same issues as a survey.
So to make a long story short, in this sort of situation I have a hard time imaging ways to get enough reliable, representative data to justify this level of backwards compatibility break.
Core python broke backcompat on a regular basis throughout the python 2 series, and almost certainly will again -- the bar to doing so is *very* high, and they use elaborate mechanisms to ease the way (__future__, etc.), but they do it. A few months ago there was even some serious consideration given to changing py3 bytestring indexing to return bytestrings instead of integers. (Consensus was unsurprisingly that this was a bad idea, but there were core devs seriously exploring it, and no-one complained about the optics.)
There was no break as large as this. In fact I would say this is even a larger change than any individual change we saw in the python 2 to 3 switch. The basic mechanics of indexing are just too fundamental and touch on too many things to make this sort of change feasible. It would be better to have a new language, or in this case anew project.
It's true that numpy has something of a bad reputation in this area, and I think it's because until ~1.7 or so, we randomly broke stuff by accident on a pretty regular basis, even in "bug fix" releases. I think the way to rebuild that trust is to honestly say to our users that when we do break backcompat, we will never do it by accident, and we will do it only rarely, after careful consideration, with the smoothest transition possible, only in situations where we are convinced that it the net best possible solution for our users, and only after public discussion and getting buy-in from stakeholders (e.g. major projects affected). And then follow through on that to the best of our ability. We've certainly gotten a lot better at this over the last few years.
If we say we'll *never* break backcompat then we'll inevitably end up convincing some people that we're liars, just because one person's bugfix is another's backcompat break. (And they're right, it is a backcompat break; it's just one where the benefits of the fix obviously outweigh the cost of the break.) Or we could actually avoid breaking backcompat by descending into Knuth-style stasis... but even there notice that none of us are actually using Knuth's TeX, we all use forks like XeTeX that have further changes added, which goes to show how futile this would be.
I think it is fair to say that some things are just too fundamental to what makes numpy numpy that they are off-limits, that people will always be able to count on those working.
On Sat, Apr 4, 2015 at 1:11 PM, Todd toddrjen@gmail.com wrote:
There was no break as large as this. In fact I would say this is even a larger change than any individual change we saw in the python 2 to 3 switch.
Well, the impact of what Python3 did to everyone's string handling code caused so much work that it's close to impossible to top that within numpy I'd say:)
Ralf
The basic mechanics of indexing are just too fundamental and touch on too
many things to make this sort of change feasible. It would be better to have a new language, or in this case anew project.
On Apr 4, 2015 4:12 AM, "Todd" toddrjen@gmail.com wrote:
On Apr 4, 2015 10:54 AM, "Nathaniel Smith" njs@pobox.com wrote:
Core python broke backcompat on a regular basis throughout the python 2 series, and almost certainly will again -- the bar to doing so is *very* high, and they use elaborate mechanisms to ease the way (__future__, etc.), but they do it. A few months ago there was even some serious consideration given to changing py3 bytestring indexing to return bytestrings instead of integers. (Consensus was unsurprisingly that this was a bad idea, but there were core devs seriously exploring it, and no-one complained about the optics.)
There was no break as large as this. In fact I would say this is even a
larger change than any individual change we saw in the python 2 to 3 switch. The basic mechanics of indexing are just too fundamental and touch on too many things to make this sort of change feasible.
I'm afraid I'm not clever enough to know how large or feasible a change is without even seeing the proposed change. I may well agree with you when I do see it; I just prefer to base important decisions on as much data as possible.
-n
On Sat, Apr 4, 2015 at 10:38 PM, Nathaniel Smith njs@pobox.com wrote:
On Apr 4, 2015 4:12 AM, "Todd" toddrjen@gmail.com wrote:
On Apr 4, 2015 10:54 AM, "Nathaniel Smith" njs@pobox.com wrote:
Core python broke backcompat on a regular basis throughout the python 2 series, and almost certainly will again -- the bar to doing so is *very* high, and they use elaborate mechanisms to ease the way (__future__, etc.), but they do it. A few months ago there was even some serious consideration given to changing py3 bytestring indexing to return bytestrings instead of integers. (Consensus was unsurprisingly that this was a bad idea, but there were core devs seriously exploring it, and no-one complained about the optics.)
There was no break as large as this. In fact I would say this is even a
larger change than any individual change we saw in the python 2 to 3 switch. The basic mechanics of indexing are just too fundamental and touch on too many things to make this sort of change feasible.
I'm afraid I'm not clever enough to know how large or feasible a change
is without even seeing the proposed change.
It doesn't take any cleverness. The change in question was to make the default indexing semantics to orthogonal indexing. No matter the details of the ultimate proposal to achieve that end, it has known minimum consequences, at least in the broad outline. Current documentation and books become obsolete for a fundamental operation. Current code must be modified by some step to continue working. These are consequences inherent in the end, not just the means to the end; we don't need a concrete proposal in front of us to know what they are. There are ways to mitigate these consequences, but there are no silver bullets that eliminate them. And we can compare those consequences to approaches like Jaime's that achieve a majority of the benefits of such a change without any of the negative consequences. That comparison does not bode well for any proposal.
-- Robert Kern