On Tue, Jul 21, 2020 at 9:15 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mon, 2020-07-20 at 22:27 -0700, Christopher Barker wrote:
> On Mon, Jul 20, 2020 at 3:17 AM Rhodri James <rhodri@kynesim.co.uk>
> wrote:
>
> > Ironically that example pushes me back to -1.  It may look a lot
> > like
> > xarray and pandas working, but that just means it should be in
> > xarray
> > and/or pandas.
>
> after following most of this discussion, I'm still not sure what we'd
> get
> with keywords in indexing.
>
> But I do think it would be nice of we could use slice syntax in other
> places. That would allow things like xarray and pandas to use slices
> in
> regular function calls. here's an example from the xarray docs:
>
>  da.isel(space=0, time=slice(None, 2))
>
> wouldn't that be nice as:
>
> da.isel(space=0, time=:2)
>
> or:
>
> da.sel(time=slice("2000-01-01", "2000-01-02"))
>
> could be:
>
> da.sel(time="2000-01-01":"2000-01-02")
>
> As far as I can tell, slicing syntax is currently a syntax error in
> these
> cases, and any others I thought to test.
>
> Is there a reason to not allow syntax for creating a slice object to
> be
> used anywhere (Or more places, anyway)?
>
> By the way, I just noticed this note in the xarray docs:
>
> """Note: We would love to be able to do indexing with labeled
> dimension
> names inside brackets, but unfortunately, Python does yet not support
> indexing with keyword arguments like da[space=0]
> """

This would be the thing I would think of first when indexing with
keywords.  But, there are a few points about named dimensions:

First, using it for named dimensions, means you don't actually need to
mix it with normal tuple indexing, mixing both seems rather confusing?
(I cannot think of how to interpret it)

Second,  using keyword arguments for indexing `mode=` or `method=`
switches as Stephan Hoyer mentioned as well seems neat.  But I am
worried that the two potential uses clash way too much and my gut
feeling is to prefer the labeled use (which is why I would be extremely
hesitant to add mode-switching things to NumPy or pandas).  I might
rather prefer mode switching to be spelled as::

    temperature.loc(method="nearest")[longitude=longs, latitude=lats]

even if that has to create an intermediate indexer object (twice, since
`.loc` here is also an index helper object).
(This means axis labels must be strings, that is likely no issue, but
should maybe be mentioned.)

Thus, for most containers, my knee jerk reaction would be to discourage
the use of keywords in indexing for mode switching.  But some of the
use-cases seemed more like class factories, for which there is no clash
of these two concepts/applications.

That said, labeled dimensions/axis do seem like nice syntax with quite
a bit of potential to me, even with 3 dimensions, remembering whether
your coordinate order was x,y,z or z,x,y or z,y,x can be annoying
(especially if you mix in a 1-D dataset with only a z axis).

For what it's worth, I (as the original author of xarray) totally agree with both Sebastian and Christopher. For indexing labeled arrays, the most compelling use-case is cleaner syntax for creating slice() objects along with keyword arguments for dimension names.

I don't particularly care whether that's spelled with [] or (), e.g.,
da.sel(time="2000-01-01":"2000-01-02")
or
da.loc[time="2000-01-01":"2000-01-02"]
neither of which is currently valid syntax.

The further advantages of supporting keyword arguments in __getitem__/__setitem__ would be:
1. We wouldn't need separate methods for positional vs keyword argument indexing. Currently, xarray has both .loc[] and .sel().
2. We could support matching syntax with keyword arguments in assignment. This is mostly relevant for inexperienced Python users, who will try something like "da.sel(x=0) = value" and encounter a SyntaxError. (This does come up with some regularity, because xarray's target audience includes scientists who often aren't experienced programmers.