
On Tue, Jul 21, 2020 at 9:15 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mon, 2020-07-20 at 22:27 -0700, Christopher Barker wrote:
On Mon, Jul 20, 2020 at 3:17 AM Rhodri James <rhodri@kynesim.co.uk> wrote:
Ironically that example pushes me back to -1. It may look a lot like xarray and pandas working, but that just means it should be in xarray and/or pandas.
after following most of this discussion, I'm still not sure what we'd get with keywords in indexing.
But I do think it would be nice of we could use slice syntax in other places. That would allow things like xarray and pandas to use slices in regular function calls. here's an example from the xarray docs:
da.isel(space=0, time=slice(None, 2))
wouldn't that be nice as:
da.isel(space=0, time=:2)
or:
da.sel(time=slice("2000-01-01", "2000-01-02"))
could be:
da.sel(time="2000-01-01":"2000-01-02")
As far as I can tell, slicing syntax is currently a syntax error in these cases, and any others I thought to test.
Is there a reason to not allow syntax for creating a slice object to be used anywhere (Or more places, anyway)?
By the way, I just noticed this note in the xarray docs:
"""Note: We would love to be able to do indexing with labeled dimension names inside brackets, but unfortunately, Python does yet not support indexing with keyword arguments like da[space=0] """
This would be the thing I would think of first when indexing with keywords. But, there are a few points about named dimensions:
First, using it for named dimensions, means you don't actually need to mix it with normal tuple indexing, mixing both seems rather confusing? (I cannot think of how to interpret it)
Second, using keyword arguments for indexing `mode=` or `method=` switches as Stephan Hoyer mentioned as well seems neat. But I am worried that the two potential uses clash way too much and my gut feeling is to prefer the labeled use (which is why I would be extremely hesitant to add mode-switching things to NumPy or pandas). I might rather prefer mode switching to be spelled as::
temperature.loc(method="nearest")[longitude=longs, latitude=lats]
even if that has to create an intermediate indexer object (twice, since `.loc` here is also an index helper object). (This means axis labels must be strings, that is likely no issue, but should maybe be mentioned.)
Thus, for most containers, my knee jerk reaction would be to discourage the use of keywords in indexing for mode switching. But some of the use-cases seemed more like class factories, for which there is no clash of these two concepts/applications.
That said, labeled dimensions/axis do seem like nice syntax with quite a bit of potential to me, even with 3 dimensions, remembering whether your coordinate order was x,y,z or z,x,y or z,y,x can be annoying (especially if you mix in a 1-D dataset with only a z axis).
For what it's worth, I (as the original author of xarray) totally agree with both Sebastian and Christopher. For indexing labeled arrays, the most compelling use-case is cleaner syntax for creating slice() objects along with keyword arguments for dimension names. I don't particularly care whether that's spelled with [] or (), e.g., da.sel(time="2000-01-01":"2000-01-02") or da.loc[time="2000-01-01":"2000-01-02"] neither of which is currently valid syntax. The further advantages of supporting keyword arguments in __getitem__/__setitem__ would be: 1. We wouldn't need separate methods for positional vs keyword argument indexing. Currently, xarray has both .loc[] and .sel(). 2. We could support matching syntax with keyword arguments in assignment. This is mostly relevant for inexperienced Python users, who will try something like "da.sel(x=0) = value" and encounter a SyntaxError. (This does come up with some regularity, because xarray's target audience includes scientists who often aren't experienced programmers.