On Apr 1, 2015 2:17 AM, "R Hattersley"
There are two different interpretations in common use of how to handle multi-valued (array/sequence) indexes. The numpy style is to consider all multi-valued indices together which allows arbitrary points to be extracted. The orthogonal style (e.g. as provided by netcdf4-python) is to consider each multi-valued index independently.
For example:
type(v)
v.shape (240, 37, 49) v[(0, 1), (0, 2, 3)].shape (2, 3, 49) np.array(v)[(0, 1), (0, 2, 3)].shape Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,) In a netcdf4-python GitHub issue the authors of various orthogonal indexing packages have been discussing how to distinguish the two behaviours and have currently settled on a boolean __orthogonal_indexing__ attribute.
I guess my feeling is that this attribute is a fine solution to the wrong problem. If I understand the situation correctly: users are writing two copies of their indexing code to handle two different array-duck-types (those that do broadcasting indexing and those that do Cartesian product indexing), and then have trouble knowing which set of code to use for a given object. The problem that __orthogonal_indexing__ solves is that it makes easier to decide which code to use. It works well for this, great. But, the real problem here is that we have two different array duck types that force everyone to write their code twice. This is a terrible state of affairs! (And exactly analogous to the problems caused by np.ndarray disagreeing with np.matrix & scipy.sparse about the the proper definition of *, which PEP 465 may eventually alleviate.) IMO we should be solving this indexing problem directly, not applying bandaids to its symptoms, and the way to do that is to come up with some common duck type that everyone can agree on. Unfortunately, AFAICT this means our only options here are to have some kind of backcompat break in numpy, some kind of backcompat break in pandas, or to do nothing and continue indefinitely with the status quo where the same indexing operation might silently return different results depending on the types passed in. All of these options have real costs for users, and it isn't at all clear to me what the relative costs will be when we dig into the details of our various options. So I'd be very happy to see worked out proposals for any or all of these approaches. It strikes me as really premature to be issuing proclamations about what changes might be considered. There is really no danger to *considering* a proposal; the worst case is that we end up rejecting it anyway, but based on better information. -n