
On Sat, Jul 18, 2020 at 12:18:38AM -0400, Ricky Teachey wrote:
On Fri, Jul 17, 2020 at 7:21 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Jul 17, 2020 at 11:11:17AM -0400, Ricky Teachey wrote:
...
For backwards-compatibility, there will only ever be a single positional argument passed into the method. That's because comma-separated values in a subscript are already passed as a tuple:
# this calls __getitem__ with a single tuple argument obj[a,b:c,d] ==> (1, slice(2, 3), 4)
So that's not going to change (at least not without a long and painful deprecation process). But adding support for keyword arguments requires no changes to any existing class or a new builtin "key object" type.
This strikes me as problematic for having a consistent mental model of how stuff works in python. I think that for many the difference in the meaning of the syntax between item-getting/setting and function-calling would be... glaring.
Yes, but what are we going to do about it? Break a million existing scripts, applications and libraries that rely on `__getitem__` only receiving a single tuple argument when passed comma-separated values? I don't think the core devs will accept that, I think the numpy devs will object strongly, and I'm pretty sure that the Steering Council will say no. But if you disagree, then feel free to start writing a PEP. The fact that multiple comma-separated subscripts are passed to the method as a single tuple argument is a historical fact we (almost certainly) cannot change now. But that is orthogonal to how we choose to proceed with keyword arguments. We aren't obliged to repeat the same design. We have a few choices: (1) There is a minor inconsistency between subscripts and function calls, so let's just forget all about the whole idea. If we cannot agree on a decision, this is the default. (Status quo wins a stalement.) (2) Let the Perfect be the enemy of the Good. No compromises! Insist on breaking the entire Python ecosystem for the sake of fixing this minor inconsistency between subscripting and function calls. (3) Reinforce that inconsistency, and continue to obfuscate the similarities, by handling keyword arguments in the same fashion as comma-separated subscripts. This will require a new builtin "key-object" class, and it will require every class that cares about keyword arguments in their subscripts to parse them themselves. We'll also need to decide how to combine subscripts and keywords: obj[a, b:c, x=1] # is this a tuple argument (a, slice(b, c), key(x=1)) # or key argument key(a, slice(b, c), x=1) (4) Or keep the subscript processing as-is, for backwards-compatibility, but pass keyword arguments as normal for functions. Both (3) and (4) would get the job done, but (3) requires everyone who needs keyword arguments to parse the tuple and/or key object by hand to extract them. Having done something similiar in the past (emulating keyword-only arguments in Python 2), I can tell you this is painful. With (4), the interpreter automatically matches up passed keyword arguments to my `__getitem__` parameters, filling in defaults if needed, and I can concentrate on using the arguments, not parsing them.
On the one hand, a fairly experienced person (who is familiar with the history of the item dunders, and a preexisting mental model that they are always being supplied a single positional argument ) should not have too much of a problem understanding WHY these would behave differently.
But on the other hand, even as an experienced person, this really messes with my brain, looking at it. It's hard for me to believe this isn't going to be a painful distinction for a large number of people to hold in their head-- especially beginners (but not only beginners).
I think you are overthinking this. Inside a subscript, multiple positional arguments are collected into a tuple and passed as a single argument. (Vaguely similar to the way `*args` positional arguments are collected.) Why? Because of historical reasons and backwards compatibility. If someone wants to trawl the archives looking for a discussion, I look forward to hearing the result, but we don't need to care about the past reason to learn it. If you define your getitem like this: def __getitem__(self, item, more): then you'll get a TypeError when you try to subscript, because `more` doesn't get a value. This is already the case! So anyone writing getitem methods already knows that positional arguments aren't handled the same way as function calls. If you give `more` a default, then you won't get an error... but even the tiniest bit of testing will reveal that `item` receives a tuple, and `more` always gets the default. In practice, anyone writing getitem methods only ever gives it a single argument (aside from self of course :-) so if they add keyword arguments, the most natural way to do so is to make them keyword only: def __getitem__(self, item, *, more, keyword, arguments): (with or without defaults). Problem solved.
A potentially elegant way around this glaring difference in the meaning of the syntax might be the key-object paradigm Jonathan Fine has suggested. However, that only works if you *disallow mixing together* positional arguments and kwd args inside the [ ]:
No, we can still mix them. We just have to decide whether to mix them together into a tuple, or into a key-object: obj[a,b:c, x=1, y=2] # tuple (a, slice(b, c), key(x=1, y=2)) # or key-object key(a, slice(b, c), x=1, y=2) Either way, it means that the getitem method itself has to pull the object (tuple or key-object) apart, parsing keyword arguments, filling in defaults, and dealing with missing values. Why would we choose to do that when the interpreter can do it for us? If you do want to do it yourself, you can always just use `**kwargs`` like you would in any other method. Likewise, if you want an atomic "keyword object", just pass your kwargs to something like SimpleNamespace: py> from types import SimpleNamespace py> kwargs = {'spam': 1, 'eggs': 2} py> SimpleNamespace(**kwargs) namespace(spam=1, eggs=2)
This raises a question that needs to be answered, then: what would be the utility of mixing together positional and kwd arguments in this way?
That's easy to answer. Positional subscripts represent a key or index or equivalent; keyword arguments can represent *modifiers*. So I might index into a tree: tree[18, order='preorder'] # or postorder, inorder or a two-dimensional array: matrix[18, order='row'] # row-column order rather than column-row I don't think builtin dicts should support this concept, but third-party mappings might allow you to modify what happens if the key already exists: # add or replace if the key already exists? mapping[key, if_exist='add'] = 5 [...]
Getting to the end here, I guess I'm really just wondering whether mixing positional and kwd args is worth doing. If it isn't, then the key-object paradigm seems like might be a nicer solution to me for the sole reason that the mental model gets confused otherwise.
Here is an exercise for you. Let's pretend that function calls existed with the same limitation that Jonathan is sujecting for subscripting. Go through your code and find any method or function that currently uses keyword arguments (that will be nearly all of them, if we include "positional-or-keyword arguments"). Now imagine that instead of receiving named keyword parameters, all of your functions received an opaque namespace "key" object, which you can pretend is just a dict. Re-write your methods to have this signature: def method(self, **key): That's Jonathan's model. If you pass keyword args, they all get packed into a single parameter. Now you get to pull it apart, test for unwanted keywords, deal with missing keywords, assign defaults, etc. Go through the exercise. I have -- I've written Python 2 code that needed to handle-keyword only arguments, and this was the only way to do so. The "only one parameter, which may receive a keyobject" design will have us writing code something like this: # I want this: def __getitem__(self, item, * a, b, c, d=0) # but have to write this: def def __getitem__(self, item): # Determine whether we got any keyword arguments. if isinstance(item, keyobject): keys = item item = () elif isinstance(item, tuple): # Assuming that all keyword args are at the end; # if there could be more than one keyobject, or if # they could be anywhere in the tuple, this becomes # even more complex. I don't even want to think # about that case. if item and isinstance(item[-1], keyobject): keys = item[-1] item = item[:-1] else: keys = keyobject() # Now extract the parameters from the key object. if 'a' in keys: a = keys.pop('a') else: raise TypeError('missing keyword argument "a"') # same for b and c d = keys.pop('d', 0) # Check for unexpected keywords. if keys: raise TypeError('unexpected keyword') (Any bugs in the above are not intentional.) And now finally we can actually use the keyword parameters and write the method. -- Steven