[Python-ideas] Re: PEP 472 -- Support for indexing with keyword arguments

July 21, 2020

      On Tue, Jul 21, 2020 at 9:15 AM Sebastian Berg <sebastian@sipsolutions.net>
wrote:
...
On Mon, 2020-07-20 at 22:27 -0700, Christopher Barker wrote:
...
On Mon, Jul 20, 2020 at 3:17 AM Rhodri James <rhodri@kynesim.co.uk>
wrote:
...
Ironically that example pushes me back to -1.  It may look a lot
like
xarray and pandas working, but that just means it should be in
xarray
and/or pandas.
after following most of this discussion, I'm still not sure what we'd
get
with keywords in indexing.
But I do think it would be nice of we could use slice syntax in other
places. That would allow things like xarray and pandas to use slices
in
regular function calls. here's an example from the xarray docs:
da.isel(space=0, time=slice(None, 2))
wouldn't that be nice as:
da.isel(space=0, time=:2)
or:
da.sel(time=slice("2000-01-01", "2000-01-02"))
could be:
da.sel(time="2000-01-01":"2000-01-02")
As far as I can tell, slicing syntax is currently a syntax error in
these
cases, and any others I thought to test.
Is there a reason to not allow syntax for creating a slice object to
be
used anywhere (Or more places, anyway)?
By the way, I just noticed this note in the xarray docs:
"""Note: We would love to be able to do indexing with labeled
dimension
names inside brackets, but unfortunately, Python does yet not support
indexing with keyword arguments like da[space=0]
"""
This would be the thing I would think of first when indexing with
keywords.  But, there are a few points about named dimensions:
First, using it for named dimensions, means you don't actually need to
mix it with normal tuple indexing, mixing both seems rather confusing?
(I cannot think of how to interpret it)
Second,  using keyword arguments for indexing `mode=` or `method=`
switches as Stephan Hoyer mentioned as well seems neat.  But I am
worried that the two potential uses clash way too much and my gut
feeling is to prefer the labeled use (which is why I would be extremely
hesitant to add mode-switching things to NumPy or pandas).  I might
rather prefer mode switching to be spelled as::
temperature.loc(method="nearest")[longitude=longs, latitude=lats]
even if that has to create an intermediate indexer object (twice, since
`.loc` here is also an index helper object).
(This means axis labels must be strings, that is likely no issue, but
should maybe be mentioned.)
Thus, for most containers, my knee jerk reaction would be to discourage
the use of keywords in indexing for mode switching.  But some of the
use-cases seemed more like class factories, for which there is no clash
of these two concepts/applications.
That said, labeled dimensions/axis do seem like nice syntax with quite
a bit of potential to me, even with 3 dimensions, remembering whether
your coordinate order was x,y,z or z,x,y or z,y,x can be annoying
(especially if you mix in a 1-D dataset with only a z axis).
For what it's worth, I (as the original author of xarray) totally agree
with both Sebastian and Christopher. For indexing labeled arrays, the most
compelling use-case is cleaner syntax for creating slice() objects along
with keyword arguments for dimension names.

I don't particularly care whether that's spelled with [] or (), e.g.,
da.sel(time="2000-01-01":"2000-01-02")
or
da.loc[time="2000-01-01":"2000-01-02"]
neither of which is currently valid syntax.

The further advantages of supporting keyword arguments in
__getitem__/__setitem__ would be:
1. We wouldn't need separate methods for positional vs keyword argument
indexing. Currently, xarray has both .loc[] and .sel().
2. We could support matching syntax with keyword arguments in assignment.
This is mostly relevant for inexperienced Python users, who will try
something like "da.sel(x=0) = value" and encounter a SyntaxError. (This
does come up with some regularity, because xarray's target audience
includes scientists who often aren't experienced programmers.