[Python-ideas] Re: PEP 472 -- Support for indexing with keyword arguments

July 18, 2020

      On Sat, Jul 18, 2020 at 12:18:38AM -0400, Ricky Teachey wrote:
...
On Fri, Jul 17, 2020 at 7:21 PM Steven D'Aprano <steve@pearwood.info> wrote:
...
On Fri, Jul 17, 2020 at 11:11:17AM -0400, Ricky Teachey wrote:
...
For backwards-compatibility, there will only ever be a single positional
argument passed into the method. That's because comma-separated values
in a subscript are already passed as a tuple:
# this calls __getitem__ with a single tuple argument
    obj[a,b:c,d]  ==> (1, slice(2, 3), 4)
So that's not going to change (at least not without a long and painful
deprecation process). But adding support for keyword arguments requires
no changes to any existing class or a new builtin "key object" type.
This strikes me as problematic for having a consistent mental model of how
stuff works in python. I think that for many the difference in the meaning
of the syntax between item-getting/setting and function-calling would be...
glaring.
Yes, but what are we going to do about it?

Break a million existing scripts, applications and libraries that rely 
on `__getitem__` only receiving a single tuple argument when passed 
comma-separated values? I don't think the core devs will accept that, I 
think the numpy devs will object strongly, and I'm pretty sure that the 
Steering Council will say no.

But if you disagree, then feel free to start writing a PEP.

The fact that multiple comma-separated subscripts are passed to the 
method as a single tuple argument is a historical fact we (almost 
certainly) cannot change now. But that is orthogonal to how we choose to 
proceed with keyword arguments. We aren't obliged to repeat the same 
design.

We have a few choices:

(1) There is a minor inconsistency between subscripts and function 
calls, so let's just forget all about the whole idea. If we cannot agree 
on a decision, this is the default. (Status quo wins a stalement.)

(2) Let the Perfect be the enemy of the Good. No compromises! Insist on 
breaking the entire Python ecosystem for the sake of fixing this minor 
inconsistency between subscripting and function calls.

(3) Reinforce that inconsistency, and continue to obfuscate the 
similarities, by handling keyword arguments in the same fashion as 
comma-separated subscripts. This will require a new builtin "key-object" 
class, and it will require every class that cares about keyword 
arguments in their subscripts to parse them themselves.

We'll also need to decide how to combine subscripts and keywords:

    obj[a, b:c, x=1]
    # is this a tuple argument (a, slice(b, c), key(x=1))
    # or key argument key(a, slice(b, c), x=1)

(4) Or keep the subscript processing as-is, for backwards-compatibility, 
but pass keyword arguments as normal for functions.

Both (3) and (4) would get the job done, but (3) requires everyone who 
needs keyword arguments to parse the tuple and/or key object by hand to 
extract them. Having done something similiar in the past (emulating 
keyword-only arguments in Python 2), I can tell you this is painful.

With (4), the interpreter automatically matches up passed keyword 
arguments to my `__getitem__` parameters, filling in defaults if needed, 
and I can concentrate on using the arguments, not parsing them.
...
On the one hand, a fairly experienced person (who is familiar with the
history of the item dunders, and a preexisting mental model that they are
always being supplied a single positional argument ) should not have too
much of a problem understanding WHY these would behave differently.
But on the other hand, even as an experienced person, this really messes
with my brain, looking at it. It's hard for me to believe this isn't going
to be a painful distinction for a large number of people to hold in their
head-- especially beginners (but not only beginners).
I think you are overthinking this.

Inside a subscript, multiple positional arguments are collected into a 
tuple and passed as a single argument. (Vaguely similar to the way 
`*args` positional arguments are collected.)

Why? Because of historical reasons and backwards compatibility. If 
someone wants to trawl the archives looking for a discussion, I look 
forward to hearing the result, but we don't need to care about the past 
reason to learn it.

If you define your getitem like this:

    def __getitem__(self, item, more):

then you'll get a TypeError when you try to subscript, because `more` 
doesn't get a value. This is already the case! So anyone writing getitem 
methods already knows that positional arguments aren't handled the same 
way as function calls.

If you give `more` a default, then you won't get an error... but even 
the tiniest bit of testing will reveal that `item` receives a tuple, and 
`more` always gets the default.

In practice, anyone writing getitem methods only ever gives it a single 
argument (aside from self of course :-) so if they add keyword 
arguments, the most natural way to do so is to make them keyword only:

    def __getitem__(self, item, *, more, keyword, arguments):

(with or without defaults). Problem solved.
...
A potentially elegant way around this glaring difference in the meaning of
the syntax might be the key-object paradigm Jonathan Fine has suggested.
However, that only works if you *disallow mixing together* positional
arguments and kwd args inside the [ ]:
No, we can still mix them. We just have to decide whether to mix them 
together into a tuple, or into a key-object:

    obj[a,b:c, x=1, y=2]
    # tuple (a, slice(b, c), key(x=1, y=2))
    # or key-object key(a, slice(b, c), x=1, y=2)

Either way, it means that the getitem method itself has to pull the 
object (tuple or key-object) apart, parsing keyword arguments, filling 
in defaults, and dealing with missing values. Why would we choose to do 
that when the interpreter can do it for us?

If you do want to do it yourself, you can always just use `**kwargs`` 
like you would in any other method.

Likewise, if you want an atomic "keyword object", just pass your kwargs 
to something like SimpleNamespace:

    py> from types import SimpleNamespace
    py> kwargs = {'spam': 1, 'eggs': 2}
    py> SimpleNamespace(**kwargs)
    namespace(spam=1, eggs=2)
...
This raises a question that needs to be answered, then: what would be the
utility of mixing together positional and kwd arguments in this way?
That's easy to answer. Positional subscripts represent a key or index or 
equivalent; keyword arguments can represent *modifiers*.

So I might index into a tree:

    tree[18, order='preorder']  # or postorder, inorder

or a two-dimensional array:

    matrix[18, order='row']  # row-column order rather than column-row

I don't think builtin dicts should support this concept, but third-party 
mappings might allow you to modify what happens if the key already 
exists:

    # add or replace if the key already exists?
    mapping[key, if_exist='add'] = 5

[...]
...
Getting to the end here, I guess I'm really just wondering whether mixing
positional and kwd args is worth doing. If it isn't, then the key-object
paradigm seems like might be a nicer solution to me for the sole reason
that the mental model gets confused otherwise.
Here is an exercise for you. Let's pretend that function calls existed 
with the same limitation that Jonathan is sujecting for subscripting. Go 
through your code and find any method or function that currently uses 
keyword arguments (that will be nearly all of them, if we include 
"positional-or-keyword arguments").

Now imagine that instead of receiving named keyword parameters, all of 
your functions received an opaque namespace "key" object, which you can 
pretend is just a dict. Re-write your methods to have this signature:

    def method(self, **key):

That's Jonathan's model. If you pass keyword args, they all get packed 
into a single parameter. Now you get to pull it apart, test for 
unwanted keywords, deal with missing keywords, assign defaults, etc.

Go through the exercise. I have -- I've written Python 2 code that 
needed to handle-keyword only arguments, and this was the only way to do 
so.

The "only one parameter, which may receive a keyobject" design will 
have us writing code something like this:

    # I want this: def __getitem__(self, item, * a, b, c, d=0)
    # but have to write this:

    def def __getitem__(self, item):
        # Determine whether we got any keyword arguments.
        if isinstance(item, keyobject):
            keys = item
            item = ()

        elif isinstance(item, tuple):
            # Assuming that all keyword args are at the end;
            # if there could be more than one keyobject, or if
            # they could be anywhere in the tuple, this becomes
            # even more complex. I don't even want to think
            # about that case.
            if item and isinstance(item[-1], keyobject):
                keys = item[-1]
                item = item[:-1]

        else:
            keys = keyobject()

        # Now extract the parameters from the key object.
        if 'a' in keys:
            a = keys.pop('a')
        else:
            raise TypeError('missing keyword argument "a"')
        # same for b and c

        d = keys.pop('d', 0)
        # Check for unexpected keywords.
        if keys:
            raise TypeError('unexpected keyword')

(Any bugs in the above are not intentional.) And now finally we can 
actually use the keyword parameters and write the method.

-- 
Steven