On Tue, Aug 25, 2020 at 9:50 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Mon, Aug 24, 2020 at 01:10:26PM -0400, Ricky Teachey wrote: 
> SIGNATURE === SEMANTICS
> (self, a, b) === (self, key_tuple, value)
>
> In the above, a on the left side, semantically, is the key tuple, and b in
> the value on the RHS.

That's not how Python works today. Individual values aren't packed into
a tuple.

Sorry when I wrote that my intention was that `a` would be a tuple. But you're right, a better version would be to say:

SIGNATURE === SEMANTICS
(self, a, b) === (self, key_tuple_or_atomic_key_value, value)
 
 
I think that the best (correct?) way of interpreting the subscript
behaviour is that the subscript pseudo-operator `[...]` has a lower
precedence than the comma pseudo-operator. So when the parser sees the
subscript square brackets:

    obj[   ]

it parses the contents of those brackets as an expression. What do
commas do inside expressions? They make tuples. So "positional
arguments" to a subscript are naturally bundled together into a tuple.
It's not that the interpreter has a special rule for this, it's just a
way the precedence works out.

Makes sense to me.
 
 
The only way it could tell that would be to inspect *at runtime* the
`__setitem__` method. And it would have to do this on every subscript
call. Introspection is likely to be slow, possibly very slow. This would
make subscripting slow.

Well it would not have to inspect the signature on every subscript call, but it would have to call a wrapped function like this on every call:

def _adapt_get(func):
    @wraps(func)
    def wrapped(self, key):
        return func(self, *key)
    return wrapped

So you're right, it would result in the need to make a second function call if and only if the writer of the class chose to write their function with more than the "normal" number of arguments.

In the case the only one argument is in the signature of __getitem__ and __delitem__, and two in __setitem__, and all with no default values, it would not need to be wrapped and everything remains exactly as it does today:

def _adapt_item_dunders(cls):
    for m_name, adapter, n_params in zip(("getitem", "setitem", "delitem"),
                                         (_adapt_get, _adapt_set, _adapt_del),
                                         (2, 3, 2),
                                         ):
        m_dunder = f"__{m_name}__"
        if method := getattr(cls, m_dunder, None):
            if all(p.default==_empty for p in signature(method).parameters.values()) \
                    and len(signature(method).parameters)==n_params:
                return
            setattr(cls, m_dunder, adapter(method))
  
And it would break backwards compatibility. Right now, it is legal for
subscript methods to be written like this:

    def __setitem__(self, item, value, extra=something)

and there is no way for subscripting to call the method with that extra
argument provided. Only if the user intentionally calls the dunder
method themselves can they provide that extra argument.

Not only is that legal, but it's also useful. E.g. the class might call
the dunder directly, and provide non-default extra arguments, or it
might use parameters as static storage (see the random module for
examples of methods that do that).

But your proposal will break that code.

Right now, I could even define my dunder method like this:

    def __setitem__(*args)

and it will Just Work because there is nothing special at all about
parameter passing, even self is handled as a standard argument. Your
proposal will break that...

I'll respond to this at the end.
 
So, we have a proposal for a change that nobody has requested, that adds
no useful functionality and fixes no problems, that is backwards
incompatible and will slow down every single subscripting operation.
What's not to like about it? :-)


People have actually been asking for ways to make subscripting operator act more like a function call, so that's not true. And it could be useful. And it does help address the problem of incongruity (though not perfectly) between the way a function call handles args and kwargs, and the way the subscript operator does it. And it is totally backwards compatible except for the case of what I'd call skirting very clear convention (more on that below). And it does not slow down any subscripting operation unless the class author chooses to use it.
 
Maybe we wouldn't have designed subscripting this way back in Python 1
if we know what we know now, but it works well enough, and we have heard
from numpy developers like Stephan Hoyer that this is not a problem that
needs fixing. Can we please stop trying to "fix" positional subscripts?

I'm not done trying, sorry. I think the incongruity is a problem.
 
Adding keywords to subscripts is a genuinely useful new feature that I
personally am really hoping I can use in the future, and it is really
frustrating to see the PEP being derailed time and time again.


--
Steve

I'm sorry to derail it if that is what I am doing, truly I am. But at this point it has honestly started to feel likely to me that adding kwargs support to [ ] is going to happen. All I am intending to do is explore other, possibly better, ways of doing that than the quick and easy way. 

On Tue, Aug 25, 2020 at 10:03 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Aug 26, 2020 at 11:31:25AM +1200, Greg Ewing wrote:

> I think an argument can be made that the new dunder -- let's call
> it __getindex__ for now -- shouldn't be considered part of the
> mapping protocol. It's a new thing for classes that want to use the
> indexing notation in ways that don't fit the simple idea of mapping a
> key object to a value object.

Most existing uses of subscripts already don't fit that key:value
mapping idea, starting with lists and tuples.

Given `obj[spam]`, how does the interpreter know whether to call
`__getitem__` or `__getindex__`?

What if the class defines both?


> The only drawback I can think of right now is that it would make
> possible objects for which x[1, 2] and x[(1, 2)] were not equivalent.

Right now, both sets of syntax mean the same thing and call the same
method, so you are introducing a backwards incompatible change that will
break code. Can we postpone this proposal for Python 6000 in 2050?


--
Steve


Above and in your previous response to me, I think you're overstating your case on this by a large amount.

Remember: the signature dependent semantics proposal is to maintain backwards compatibility, 100%, for any code that has been following the very clear intent of the item dunder methods.

The only time the tuples will get broken up is if the author of the class signals their intent for that to occur.

Sure, it might break some code because somebody, somewhere, has a __getitem__ method written like this:

def __getitem__(self, a, b=None, c=None): ...

...and they are counting on a [1,2,3] operation to call:

obj.__getitem__((1,2,3))

...and not:

obj.__getitem__(1,2,3)

But are you really saying there is a very important responsibility not to break that person's code? Come on. The intent of the item dunders is extremely clear. People writing code like the above are skirting convention and there really should not be much expectation, on their part, to be able to do that forever with nothing breaking. I certainly wouldn't.