Changing item dunder method signatures to utilize positional arguments (open thread)

The following comment is from the thread about adding kwd arg support to the square bracket operator (eg, `Vec = Dict[i=float, j=float]`). On Tue, Aug 4, 2020, 2:57 AM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote: On 4/08/20 1:16 pm, Steven D'Aprano wrote:
These methods are already kind of screwy in that they don't handle *positional* arguments in the usual way -- packing them into a tuple instead of passing them as individual arguments. I think this is messing up everyone's intuition on how indexing should be extended to incorporate keyword args, or even whether this should be done at all. -- Greg So here is the main question of this thread: Is there really not a backwards compatible, creative way a transition to positional args from a tuple in the item dunders could not be accomplished? It's not my intention to champion any specific idea here, just creating a specific space for ideas to be suggested and mulled over. There could be several reasons for changing the item dunder signatures. The immediate reason is making the intuition around how to add kwd arg support to square brackets more obvious and sane. A second reason is it might be more intuitive for users who have to learn and remember that multiple arguments to [ ] get packed into a tuple, but this doesn't happen anywhere else. Another reason: it could make writing code for specialized libraries that tend to abuse (for the good of us all!) item dunders, like pandas, much easier. Right now such libraries have to rely on their own efforts to break up a key: def __getitem__(self, key): try: k1, k2 = key except TypeError: raise TypeError("two tuple key required") But for regular function calls (as opposed to item getting) we get to write our signature however we want and rely on the language to handle all of this for us: def f(k1, k2): # no worries about parsing out the arguments ----------- One idea: change the "real" names of the dunders. Give `type` default versions of the new dunders that direct the call to the old dunder names. The new get and del dunders would have behavior and signatures like (I am including **__kwargs since that could be an option in the future) : def __getx__(self, /, *__key, **__kwargs): return self.__getitem__(__key, **__kwargs) def __delx__(self, /,, *__key, **__kwargs): del self.__delitem__(__key, **__kwargs) However the set dunder signature would be a problem, because to mirror the current behavior we end up writing what is now a syntax error: def __setx__(self, /, *__key, __value, **__kwargs): self.__setitem__(__key, __value, **__kwargs) The intended meaning above would be that the last positional argument gets assigned to __value. Maybe someone could suggest a way to fix this. Item getting, setting, deleting would call these new dunders instead of the old ones. I haven't thought through how to handle inheritance-- I'm sort of hoping someone smarter than me could come up with a way to solve that...: class My: def __getx__(self, my_arg, *args, my_kwarg, **kwargs): # the way I have written things this super call will cause a recursion error v = super().__getitem__(*args, **kwargs) return combine(my_arg, my_kwarg, v) By the way I'm not bike shedding on what these new dunder names should be, though we could probably do worse then getx, setx, and delx (nice and short!). ----------- Ok now this is the part where I get to wait for everyone smarter than me to show the errors of my ways/how naive I am. :)

On Tue, Aug 4, 2020 at 8:17 AM Ricky Teachey <ricky@teachey.org> wrote:
My main issue with this is that, in my opinion, dunders are not something a beginner should be messing with anyway. By the time someone is experienced enough to start working on this, they are also experienced enough to understand that special cases like this exist for historical reasons.
But this is still a pretty simple piece of code. Is it worth having everyone start over from scratch to avoid dealing with 4 lines of code? Especially since knowing the number of indices ahead of time is a special case for a small number of projects like pandas. In most cases, the number of indices cannot be known until runtime, so this would provide no practical benefit for most projects.
The simplest way would be to put "value" first: def __setx__(self, __value, /, *__key, **__kwargs):

Hi Todd thanks for your response. On Tue, Aug 4, 2020 at 11:01 AM Todd <toddrjen@gmail.com> wrote:
Yeah I understand and agree but even non-beginners benefit a lot from having consistent behaviors in the language, and not having to remember exceptions. As for my specific idea of how to accomplish the signature change, it's true that adding replacement getx/setx/delx dunders to the language would in fact be *additional* things to remember, not *fewer *things. But the goal would be to eventually replace getitem/setitem/delitem-- so this would be a temporary situation and eventually go away. Similar to how most people don't have a daily need to remember any number of old, previously standard and important, ways of doing things after a few years of transition.
This alone wouldn't be enough of a benefit, I agree. I find the combined benefits taken together compelling enough to at least warrant the exploration.
I didn't mention that as an option because-- since we are dealing with positional only arguments much of the time-- it could become very confusing to switch the order of the key and value arguments in actual code. But if that is the case then the __setx__ hurdle appears insurmountable, apart from modifying the language so that function signatures can behave similar to this syntax feature: first, *rest, last = [1, 2, 3, 4, 5, 6] assert first == 1 assert last == 6 assert rest == [2, 3, 4, 5] ... so then you could write a function signature in this way: def f(first, *rest, last, /): return first, rest, last first, rest, last = f(1, 2, 3, 4, 5, 6) assert first == 1 assert last == 6 assert rest == [2, 3, 4, 5] So I suppose that would be yet another change that would need to happen first. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On Wed, Aug 5, 2020, 09:47 Ricky Teachey <ricky@teachey.org> wrote:
But it isn't really an exception. Lots of arguments accept sequences of various types. It doesn't take full advantage of what the language can do, but it also isn't inconsistent with the language. Any change has to balanced against the cost of rewriting every textbook and tutorial on the subject. Adding labelled indexes wouldn't be as much of an issue since nobody who doesn't need it needs to think about it. But changing the default dunder signature is something everyone dealing with those dunder methods would need to deal with. As for my specific idea of how to accomplish the signature change, it's
The problem is backwards-compatibility. The current solution isn't going to go away any time soon, if ever. It is too fundamental to the language to change. The time to do that would have been the python 2-to-3 switch. Nothing like that is planned.
I guess that is where we disagree. I don't see the advantages as all that large compared to the disadvantage of everyone having to deal with two implementations, perhaps for decades.
More confusing than changing from a tuple to separate arguments, with specific requirements for where the ”/” must go for it to work? I think if they are making such a big switch already, the order of the arguments is a relatively small change. But if that is the case then the __setx__ hurdle appears insurmountable,
Yes, that is a completely different discussion.

On 2020-08-04 at 10:58:51 -0400, Todd <toddrjen@gmail.com> wrote:
Ouch. Who am I to tell beginners what they should and shouldn't be messing with, and in what order? IMNSHO, history (let alone understanding it) comes *way* after writing a few dunders, even if you don't count __init__ as a dunder. Special cases aren't special enough to break the rules.

On Wed, Aug 5, 2020, 10:38 <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
I should have been more clear, I was talking about these specific dunder methods. Overriding indexing isn't something that people will typically deal with before understanding the sequences indexing acts on, which is all they really need to understand to make sense of the current API. And history is hard to avoid. Why is it "def" instead of "function"? Why do dunder methods use "__" at all? Why does indexing use 0 instead of 1? Some things they are just going to need to accept are the way they are, and these are all things people are almost certainly going to need to encounter before making their own classes. Special cases aren't special enough to break the rules.
But they're aren't really any rules being broken here. It may be a sub-optimal solution in this case, but not a forbidden or even uncommon approach.

On Tue, Aug 04, 2020 at 10:58:51AM -0400, Todd wrote:
Define "beginner". I'm serious -- beginners to Python vary from eight year olds who have never programmed before, to people with fifty years of programming experience in a dozen different languages aside from Python. I'm not going to teach newcomers to programming object oriented techniques in the first day, but as soon as a programmer wants to create their own class, they will surely need to understand how to write dunders.
This proposal doesn't say anything about reversing the decision made all those years ago to bundle all positional arguments in a subscript into a single positional parameter. What's done is done, that's not going to change. Nobody has to start over from scratch. Nobody needs to change a single line of code unless they want to add support for keyword arguments to their class, and only some classes will do that. This proposal is completely 100% backwards compatible except that what was a SyntaxError turns into a TypeError: obj[param=value] TypeError: __getitem__ got an unexpected keyword argument 'param' (or something like that). -- Steven

On Fri, Aug 07, 2020 at 05:54:18PM +1000, Steven D'Aprano wrote:
Sorry, I was referring to the proposal that inspired this thread, to add keyword arguments to subscripting. There's an actual concrete use-case for adding this, specifically for typing annotations, and I cannot help but feel that this thread is derailing the conversation to something that has not been requested by anyone actually affected by it. I may have allowed my frustration to run ahead of me, sorry. There is a tonne of code that relies on subscripting positional arguments to be bundled into a single parameter. Even if we agreed that this was suboptimal, and I don't because I don't know the rationale for doing it in the first place, I would be very surprised if the Steering Council gave the go-ahead to a major disruption and complication to the language just for the sake of making subscript dunders like other functions. Things would be different if, say, numpy or pandas or other heavy users of subscripting said "we want the short term churn and pain for long term benefit". But unless that happens, I feel this is just a case of piggy-backing a large, disruptive change of minimal benefit onto a a small, focused change, which tends to ruin the chances of the small change. So please excuse my frustration, I will try to be less grumpy about it. -- Steven

On Fri, Aug 7, 2020 at 4:19 AM Steven D'Aprano <steve@pearwood.info> wrote:
Well I wonder if they haven't asked because it would be such a huge change, and it seems unlikely to happen. But I surely don't know enough about the implementation details of these libraries to be able to say for certain one way or the other. I may have allowed my frustration to run ahead of me, sorry.
I understand the grumpiness given your explanation. I'm really not wanting to derail that kwd args proposal-- I really like it, whatever the semantics of it turn out to be. I was actually trying to help the kwd arg case here. As illustrated by the quote I included from Greg Ewing, there seems to be not even close to a consensus over what the semantic meaning of this should be: m[1, 2, a=3, b=2] Which could be made to mean one of the following things, or another thing I haven't considered: 1. m.__get__((1, 2), a=3, b=4) # handling of positional arguments unchanged from current behavior 2. m.__get__(1, 2, a=3, b=4) # change positional argument handling from current behavior 3. m.__get__((1, 2), {'a': 3, 'b': 4}) # handling of positional arguments unchanged from current behavior 4. m.__get__(KeyObject( (1, 2), {'a': 3, 'b': 4} )) # change positional argument handling from current behavior only in the case that kwd args are provided As Greg said: These methods are already kind of screwy in that they don't
To illustrate the comments of "kind of screwy" and "the usual way", using semantic meaning # 1 above, then these are totally equivalent * : m[1, 2, a=3, b=4] m[(1, 2), a=3, b=4] ...even though these are totally different: f(1, 2, a=3, b=4) f((1, 2), a=3, b=4) So my intention here isn't to derail, but to help the kwd argument proposal along by solving this screwiness problem. It is to suggest that maybe a way forward-- to make the intuition of the semantics of kwd args to [ ] much more obvious-- would be to change the signature so that this incongruity between what happens with "normal" method calls and the "call" for item-get/set/del can be smoothed out. If that incongruity were to be fixed, it seems to me it would become *obvious* that the semantic meaning of ` m[1, 2, a=3, b=2]` should definitely be: m.__get__(1, 2, a=3, b=4) But if all of this is not helping but hindering. I am happy to withdraw the idea. * Citing my source: I borrowed these examples from Jonathan Fine's message in the other thread --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On 8/08/20 4:09 am, Ricky Teachey wrote:
It would certainly achieve that goal. The question is whether it would be worth the *enormous* upheaval of replacing the whole __getitem__ protocol. It's hard to overstate what a big deal that would be. The old protocol would still be with us, complicating everything unnecessarily, for a very long time. It took Python 3 to finally get rid of the __getslice__ protocol, and I don't think anyone has the appetite for a Python 4 any time soon. -- Greg

On 8/7/2020 8:28 PM, Greg Ewing wrote:
I don't think anyone has the appetite for a Python 4 any time soon.
I'm included in "anyone" here. From reading this list, it seems to me that "Python 4" is invoked as some folks favorite magical justification for proposing major breaking changes. Python 3 works quite well, I think. Non-breaking, incremental changes suite me much better that large breaking ones. I have better things to do with my time than complete software rewrites of all the software projects I work on. --Edwin

On Fri, Aug 7, 2020 at 9:25 PM Edwin Zimmerman <edwin@211mainstreet.net> wrote:
Nobody is asking you to rewrite anything in this thread. Quoting my first message: On Tue, Aug 4, 2020 at 8:14 AM Ricky Teachey <ricky@teachey.org> wrote:
That's what I'm after, and making a (likely poor) attempt at a proposal to accomplish. If the answer is no, fine. On Fri, Aug 7, 2020 at 8:30 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
My hope-- with the proposal I made (new getx/setx/delx dunders that call the old getitem/setitem/delitem dunders)-- was to avoid most of that. Folks would be free to continue using the existing dunder methods as long as they find it to be beneficial. But maybe it wouldn't work out that way. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On Fri, Aug 7, 2020 at 9:12 AM Ricky Teachey <ricky@teachey.org> wrote:
NumPy and pandas both care a lot about backwards compatibility, and don't like churn for the sake of churn. Speaking as someone who has been contributing to these libraries for years, the present syntax for positional arguments in __getitem__ is not a big deal, and certainly isn't worth breaking backwards compatibility over. In practice, this is worked around with a simple normalization step, e.g., something like: def __getitem__(self, key): if not isinstance(key, tuple): key = (key,) # normal __getitem__ method This precludes incompatible definitions for x[(1, 2)] and x[1, 2], but really, nobody this minor inconsistency is not a big deal. It is easy to write x[(1, 2), :] if you want to indicate a tuple for indexing along the first axis of an array. From my perspective, the other reasonable way to add keyword arguments to indexing would be in a completely backwards compatible with **kwargs.

On Fri, Aug 7, 2020 at 6:29 PM Stephan Hoyer <shoyer@gmail.com> wrote:
I'm sorry, I did a poor job of editing this. To fill in my missing word: From my perspective, the *only* reasonable way to add keyword arguments to indexing would be in a completely backwards compatible way with **kwargs.

On Sat, 8 Aug 2020 at 02:34, Stephan Hoyer <shoyer@gmail.com> wrote:
I'm sorry, I did a poor job of editing this.
To fill in my missing word: From my perspective, the *only* reasonable way to add keyword arguments to indexing would be in a completely backwards compatible way with **kwargs.
Let me clarify that I don't like the kwargs solution to the current getitem, and the reason is simple. The original motivation for PEP-472 was to do axis naming, that is, to be sure you would be able to clearly indicate (if you so wished) data[23, 2] explicitly as e.g. data[day=23, detector=2] or data[detector=2, day=23] If you have **kwargs, now the first case would send the two values to the nameless index, the second case to the kwargs, and inside the getitem you would have to reconcile the two, especially if someone then writes data[24, 5, day=23, detector=2] typing of course has the same issue. What this extended syntax is supposed to be used for is to define specialisation of generics. but it's difficult to now handle the cases List[Int] vs List[T=Int] vs List[Int, T=int], which are really telling the same thing the first two, and something wrong (TypeError multiple arguments) the second. -- Kind regards, Stefano Borini

On Tue, Aug 25, 2020 at 4:41 PM Stefano Borini <stefano.borini@gmail.com> wrote:
But that assumes that all keyword axes can be mapped to positional axes. This is not the case in xarray, there are axes that are keyword-only. And it precludes other uses-cases that have been mentioned, such as parameters for indexing. So I understand that this was the original motivation, I don't think it is good to restrict to it only this use-case when there are other valid use-cases.
That is a problem that anyone using **kwargs in functions has to deal with. It is neither a new problem nor a unique one, and there are plenty of ways to deal with it.

On Fri, Aug 07, 2020 at 12:09:28PM -0400, Ricky Teachey wrote:
This is Python-Ideas. If we asked for a consensus on what the print function should do, I'm sure we would find at least one person seriously insist that it ought to erase your hard drive *wink* But seriously, getting consensus is difficult, especially when people seem to be unwilling or unable to articulate why they prefer one behaviour over another, or the advantages vs disadvantages of a proposal.
By the way, I assume you meant `__getitem__` in each of your examples, since `__get__` is part of the descriptor protocol. Advantages: (1) Existing positional only subscripting does not change (backwards compatible). (2) Easy to handle keyword arguments. (3) Those who want to bundle all their keywords into a single object can just define a single `**kw` parameter. (4) Probably requires little special handling in the interpreter? (5) Probably requires the minimum amount of implementation effort? (6) Requires no extra effort for developers who don't need or want keyword parameters in their subscript methods. Just do nothing. Disadvantages: none that I can see. (Assuming we agree that this is a useful feature.)
2. m.__get__(1, 2, a=3, b=4) # change positional argument handling from current behavior
Advantages: 1. Consistency with other methods and functions. Disadvantages: 1. Breaks backwards compatibility. 2. Will require a long and painful transition period during which time libraries will have to somehow support both calling conventions.
3. m.__get__((1, 2), {'a': 3, 'b': 4}) # handling of positional arguments unchanged from current behavior
I assume that if there are no keyword arguments given, only the first argument is passed to the method (as opposed to passing an empty dict). If not, the advantages listed below disappear. Advantages: (1) Existing positional only subscripting does not change (backwards compatible). (2) Requires no extra effort for developers who don't need or want keyword parameters in their subscript methods. Just do nothing. Disadvantages: (1) Forces people to do their own parsing of keyword arguments to local variables inside the method, instead of allowing the interpreter to do it. (2) Compounds the "Special case breaks the rules" of subscript methods to keyword arguments as well as positional arguments. (3) It's not really clear to me that anyone actually wants this, apart from just suggesting it as an option. What's the concrete use-case for this?
Use-case: you want to wrap an arbitrary number of positional arguments, plus an arbitrary set of keyword arguments, into a single hashable "key object", for some unstated reason, and be able to store that key object into a dict. Advantage (double-edged, possible): (1) Requires no change to the method signature to support keyword parameters (whether you want them or not, you will get them). Disadvantages: (1) If you don't want keyword parameters in your subscript methods, you can't just *do nothing* and have them be a TypeError, you have to explicitly check for a KeyObject argument and raise: def __getitem__(self, index): if isinstance(item, KeyObject): raise TypeError('MyClass index takes no keyword arguments') (2) Seems to be a completely artificial and useless use-case to me. If there is a concrete use-case for this, either I have missed it, (in which case my apologies) or Jonathan seems to be unwilling or unable to give it. But if you really wanted it, you could get it with this signature and a single line in the body: def __getitem__(self, *args, **kw): key = KeyObject(*args, **kw) (3) Forces those who want named keyword parameters to parse them from the KeyObject value themselves. Since named keyword parameters are surely going to be the most common use-case (just as they are for other functions), this makes the common case difficult and the rare and unusual case easy. (4) KeyObject doesn't exist. We would need a new builtin type to support this, as well as the new syntax. This increases the complexity and maintenance burden of this new feature. (5) Compounds the "kind of screwy" (Greg's words) nature of subscripting by extending it to keyword arguments as well as positional arguments. -- Steven

On Fri, Aug 7, 2020 at 10:47 PM Steven D'Aprano <steve@pearwood.info> wrote:
Yeah I get it. Thanks. I just noticed it didn't seem like anything close to a consensus was even sort of rising to the surface.
By the way, I assume you meant `__getitem__` in each of your examples, since `__get__` is part of the descriptor protocol.
Whoops thanks for the correction. And thanks for being less grumpy. And thank you also for going through the 4 options I gave. I accept and agree with you on all of them. But I had forgotten the fifth. The semantic meaning of m[1, 2, a=3, b=2] might be made to mean: 5. m.__getx__(1, 2, a=3, b=4) ...which would in turn call, by default: m.__getitem__((1, 2), a=3, b=4) I was a little bit more detailed about in my first message so I'll quote that:
When overriding `__getx__` et al, you would always call super() on the __getx__ method, never use super().__getitem__: class My: def __getx__(self, my_arg, *args, my_kwarg, **kwargs): # the way I have written things this super call will cause a recursion error v = super().__getx__(*args, **kwargs) return combine(my_arg, my_kwarg, v) Many of the advantages are shared with #1 as you recounted them: (1) Existing positional only subscripting does not have to change for any existing code (backwards compatible). (2) Easy to handle keyword arguments. (3) Those who want to bundle all their keywords into a single object can just define a single `**kw` parameter. (4) Probably requires little special handling in the interpreter? (5) Requires no extra effort for developers who don't need or want keyword parameters in their subscript methods. Just do nothing. (6). Consistency with other methods and functions for those that want to use the new dunders. Disadvantages: (1) Probably requires more implementation effort than #1 (2) Similar to #2 will also create a long transition period- but hopefully quite a bit less painful than just outright changing the signature of __getitem__ etc. However libraries do not have to support both calling conventions at all. They should be encouraged to start using the new one, but the old one will continue to work, perhaps perpetually. But maybe things would eventually get to the point that it could be eventually done away with. (3) Creates a new "kind of screwy" (Greg's words) situation that will at times need to be explained and understood. (4) Creates a sort of dual MRO for square bracket usage. You would end up with situations like this: class A: def __getitem__(self, key, **kwargs): print("A") class B(A): def __getitem__(self, key, **kwargs): print("B") super().__getitem__(key, **kwargs) class C(B): def __getx__(self, *key, **kwargs): print("C") super().__getx__(*key, **kwargs) class D(C): def __getx__(self, *key, **kwargs): print("D") super().__getx__(*key, **kwargs)
This code obviously looks just a little bit odd but the result is fine. However when different libraries and classes start getting mashed together over time, you might end up with a situation like this: class A: def __getitem__(self, key, **kwargs): print("A") class B(A): def __getx__(self, key, **kwargs): print("B") super().__getx__(*key, **kwargs) class C(B): def __getitem__(self, key, **kwargs): print("C") super().__getitem__(key, **kwargs) class D(C): def __getx__(self, *key, **kwargs): print("D") super().__getx__(*key, **kwargs)
Seems like this could be easily fixed with a class decorator, or perhaps the language could know to call the methods in the right order (which should be possible so long as the rule to never do a super().__getitem__ call inside of the new __getx__ method, and vice versa: don't super()__getx__ from a __getitem__ method_, is followed). Also Steven please ignore the messages I accidentally sent just to your email; apologies.

On Sat, 8 Aug 2020 at 05:12, Ricky Teachey <ricky@teachey.org> wrote:
I am currently in the process of scouting the whole set of threads and rewrite PEP-472, somehow. but just as a 2 cents to the discussion, the initial idea was focused on one thing only: give names to axes. When you have a getitem operation, you are acting on a set of axes. e.g. a[4,5,6] acts on three axes. The first axis index is 4, the second is 5 and the third is 6. These axes currently are anonymous, but the whole idea is that a name could be assigned to them. Which is kind of asymmetric with the whole args/kwargs structure of a function. In a function, your "axes" (which are your arguments) _always_ have a name. Not so in getitem operations: naming axes is optional. There's no such thing as optionally named function arguments. Given the asymmetry, and the need for backward compat, would it make a possible solution to have __getitem__() accept one additional argument "names" containing a tuple with the names? e.g. if you call a[1,2] __getitem__(index, names) will receive index=(1,2), names=(None, None) if you call a[foo=1, bar=2] __getitem__ will receive index=(1,2), names=("foo", "bar") if you call a[1, bar=2] __getitem__ will receive index=(1,2), names=(None, "bar") Now, I personally don't like this solution, especially because now passing names depend if it was declared in the signature to begin with, but I am just throwing also this idea in the mix. Apologies if it was already passed by someone else. My point is that the core of the issue is to name axes (with loose definition of what axes are. In the case of generic types, they are the degrees of freedom of the type). How these names are then handled (and recognised) _could_ be put in the hands of the user (in other words, python will say "here are the names, I have no idea if they mean something or not. it's your duty to find out, if you care about it") But I would like to raise another question. Is there another language out there that provides the same feature? I am not aware of any. -- Kind regards, Stefano Borini

On Sun, Aug 23, 2020 at 10:48 AM Stefano Borini <stefano.borini@gmail.com> wrote:
It's a way of handling it I for one haven't seen suggested yet. I don't think I like it either. I suppose a trouble with the basis of this is the assumption that most users will be using what's inside a [ ] to define axes. Maybe this is correct. But it could easily be that the most popular usage of providing these named indices will be more akin to named arguments. Indeed, nearly all of the discussion (that I've seen at least) has seemed to presuppose that adding the ability to provide these named indices is really going to be more akin to a function call. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On Sun, Aug 23, 2020 at 03:47:59PM +0100, Stefano Borini wrote:
That is one of the motivating use-cases, but "axis name" is not the only use-case for this. Right now I would give my left arm for the ability to write `matrix[row, column, base=1]` to support standard mathematical notation which starts indexes at 1 rather than 0.
Advantages over the other suggestions already made: none. Disadvantages: everything. (1) Fragile. You mentioned backwards compatibility, so presumably a single, non-comma separated argument won't receive a name argument at all: a[index] => a.__getitem__(index) rather than receiving a "names=None" or "names=()" argument. That will at least allow code that never uses a comma in the subscript to work unchanged, but it will break as soon as somebody includes a comma without a protective set of parentheses: a[(2,3,4)] => a.__getitem__((2, 3, 4)) a[2,3,4] => a.__getitem__((2, 3, 4), names=(None, None, None) So the first version is backwards compatible but the semantically identical second version breaks. (2) Not actually backwards compatible. Every `__getitem__` will need to have a new "names" parameter added or it will break even if no keywords are used. (See point 1 above.) (3) We already have two models for matching up arguments to formal parameters: there is the standard model that uses named parameters and matches arguments to the named parameter, used everywhere in functions and methods; and there is the slightly odd, but long established, convention for subscripts that collects comma-separated positional arguments into a tuple before matching them to a single named parameter. This proposal adds a third way to do it: split the parameter names into a separate positional argument. (4) This will require the method to do the work of matching up names with values: - eliminate values which are paired up with None (positional only); - match up remaining values with the provided names; - deal with duplicate names (raise a TypeError); - deal with missing names (raise a TypeError or provide a default); - deal with unexpected names (raise a TypeError). We get this for free with normal parameter passing, the interpreter handles it all for us. This proposal will require everyone to do it for themselves. I once had to emulate keyword-only arguments in Python 2, for a project that required identical signatures in both 2 and 3. So I wrote the methods to accept `**kwargs` and parsed it by hand. It really made me appreciate how much work the interpreter does for us, and how easy it makes argument passing. If I had to do this by hand every single time I wanted to support keyword arguments, I wouldn't support keyword arguments :-) (5) Being able to see the parameters in the method signature line is important. It is (partially) self-documenting, it shows default values, and allows type-hints. There are lots of introspection tools that operate on the signatures of functions and methods. We lose all of that with this.
Now, I personally don't like this solution
Then why suggest it? Can we just eliminate this from contention right now, without a six week debate about ways to tweak it so that it's ever so marginally less awful? You're the PEP author, you don't have to propose this as part of your PEP if you don't like it. Anyone who thinks they want this can write a competing PEP :-) [...]
But I would like to raise another question. Is there another language out there that provides the same feature? I am not aware of any.
It quite astonishes me that the majority of programming languages don't even supported named keyword arguments in function calls, so I would not be surprised if there are none that support named arguments to subscripting. -- Steven

Here is another way forward-- inspired by a conversation off-list with Jonathan Fine. I am calling it "signature dependent semantics". Right now, the semantic meaning of a __setitem__ function like this: # using ambiguous names for the parameters on purpose def __setitem__ (self, a, b): ... ...is currently as follows: *Note: below the === is not supposed to be code - I am using it as a way to state the semantics on the RHS of the signature on the LHS.* SIGNATURE === SEMANTICS (self, a, b) === (self, key_tuple, value) In the above, a on the left side, semantically, is the key tuple, and b in the value on the RHS. So right now, line 1 below calls line 2: [1]: d[1, 2] = foo [2]: d.__setitem__(key_tuple, value) And the call occurs this way: d.__setitem__((1,2), foo) So far, all of this is just a description of what currently happens. The signature dependent semantics proposal is NOT to change the semantic meaning of the above code in any way. These semantics would be maintained. Signature dependent semantics, as the name suggests, would change the semantic meaning of the __setitem__ signature in the case that more than two parameters are given in the signature. In other words, if a signature is provided like this, with 3 or more arguments: def: __setitem__(self, a, b, c): ... ...then in that case, the language would know there is a different semantic meaning intended. Right now, the above signature would raise a TypeError:
However, using signature dependent semantics, the language would know to call using semantics like this instead of raising a TypeError: d.__setitem__(foo, 1, 2) # value is first positional argument In other words, in the case that there are three or more arguments present in the __getitem__ signature, the semantics of the signature CHANGES: SIGNATURE === CURRENT SEMANTICS: where key_tuple on the RHS contains (a,b), and value on the RHS is just c (self, a, b, c) === (self, key_tuple, value, CURRENTLY_UNUSED_PARAMETER) SIGNATURE === NEW SEMANTICS: a is value, b is positional argument 1, c is positional argument 2 (self, a, b, c) === (self, value, pos1, pos2) I'm no cpython expert, but I think signature dependent semantics for __setitem__, as well as for __getitem__ and __delitem__, could be implemented. And that change alone could be made INDEPENDENT of supporting kwd args, or KeyObjects, or anything else. Furthermore, after signature dependent semantics were implemented (or perhaps at the same time), semantics could easily be extended so that code like this: d[1, b=2] = foo ...on a an object with a __setitem__ signature like this: def __setitem__(self, value, a, b): ... ...gets called like this: d.__setitem__(foo, 1, b=2) BUT, it could also be implemented such that if that same item setting code were made against a signature like this: def __setitem__(self, a, value): ... You would get a TypeError, because the semantic meaning of the signature with just two positional arguments, key_tuple and value, does not support kwd arguments. DOWNSIDE The biggest downside to this that I see is that it would be confusing to the uninitiated, especially in the following case. If the signature dependent semantics proposal were to go forward, and one wrote a signature with two arguments, the following semantic meaning could be mistakenly expected: # mistaken perception of __setitem__ method semantics SIGNATURE === MISTAKEN SEMANTICS (self, a, b) === (self, value, atomic_key) The mistaken belief above stated outright is that the RHS atomic_key is NOT a tuple, it is just b from the LHS. But this would not be correct. The actual semantics, in order to maintain backward compatibility, would not change from what is currently true: # backward compatible semantics SIGNATURE === CORRECT, CURRENT SEMANTICS (self, a, b) === (self, key_tuple, value) The correct understanding illustrated above is that the RHS key_tuple is a tuple (containing b from the LHS) of the form (b,). That seems like potentially a big downside. But maybe not? I don't know. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

I’m not at all sure this Idea is possible, But even if so, there’s a real trick here. The [] operator is not a function call, it is a special operator that “takes” a single expression. Thing[a, b] is not “getting” two objects, it is getting a single tuple, which is created by the comma. That is, expression: a,b Has the same value as: (a, b) Or tuple(a,b) Which means: t = tuple(a, b) thing[a, b] Is exactly the same as thing[a,b] And that equivalency needs to be maintained. In practice, two common use cases treat the resulting tuple essentially semantically differently: 1) Using tuples as dict keys i.e. a single value that happens to be a tuple is a common practice. 2) numpy uses the elements of a tuple as separate indices. I don’t think the interpreter would have any way to know which of these is intended. -CHB On Mon, Aug 24, 2020 at 10:12 AM Ricky Teachey <ricky@teachey.org> wrote:
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Mon, Aug 24, 2020 at 1:52 PM Christopher Barker <pythonchb@gmail.com> wrote:
The interpreter wouldn't. I'm talking about adding this knowledge of signature dependent semantics to `type`. To implement this, under the hood `type` would detect the signatures with different semantics, and choose to wrap the functions with those signatures in a closure based on the intended semantic meaning. Then everything proceeds as it does today. All of this is possible today, of course, using a metaclass, or using a regular class and the __init_subclass__ method, or using decorators. But my suggestion is to roll it into type. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

Here is an illustration of what I am talking about: sigdepsem: Signature Dependent Semantics <https://github.com/Ricyteach/sigdepsem> --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler On Mon, Aug 24, 2020 at 2:10 PM Ricky Teachey <ricky@teachey.org> wrote:

On Mon, Aug 24, 2020 at 11:10 AM Ricky Teachey <ricky@teachey.org> wrote:
but it would still not know that: t = (1,2,3) something[t] is the same as: something[1,2,3] would it?
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Tue, Aug 25, 2020, 2:09 AM Christopher Barker <pythonchb@gmail.com> wrote:
Definitely not. But I'm not sure that poses any kind of problem unless a person is trying to abuse the subscript operator to be some kind of alternative function call syntax, and they are wanting/expecting d[t] and d(t) to behave the same way. If I were to imagine myself explaining this to a beginner: 1. If you provide an item related dunder method that only accepts a single argument for the index or key, the index or key will be passed as a single object to that argument no matter how many "parameters" were passed to the subscript operator. This is because you are not passing positional parameters like a function call, you are passing an index. This is why, unlike a regular function call, you cannot unpack a list or tuple in an subscript operation. There would be no purpose. 2. Sometimes it is convenient to write item dunder methods that partially mimic a function call, breaking up an index into positional parameters in a similar way. When this is needed, just provide the desired number of positional arguments in the signature for it to be broken up. But this isn't intended to be a real function call. So there's no difference between these no matter how many arguments are in your dunder method signatures: d[1,2,3] d[(1,2,3)] But maybe I'm not a great teacher.

On Mon, Aug 24, 2020 at 01:10:26PM -0400, Ricky Teachey wrote:
That's not how Python works today. Individual values aren't packed into a tuple. mylist[5] = "value" calls `__setitem__(5, "value")`. The argument here is the int 5, not the tuple (5,). Presumably you wouldn't say that the semantics are: __setitem__(self, key_tuple, value_tuple) just because we have this: obj[1] = spam, eggs, cheese I think that the best (correct?) way of interpreting the subscript behaviour is that the subscript pseudo-operator `[...]` has a lower precedence than the comma pseudo-operator. So when the parser sees the subscript square brackets: obj[ ] it parses the contents of those brackets as an expression. What do commas do inside expressions? They make tuples. So "positional arguments" to a subscript are naturally bundled together into a tuple. It's not that the interpreter has a special rule for this, it's just a way the precedence works out. (I welcome correction if that is wrong.) In the same way that assignment has lower precedence than comma, so the parser automatically bundles the `spam, eggs, cheese` into a single value which just happens to be a tuple before doing the assignment. [...]
The only way it could tell that would be to inspect *at runtime* the `__setitem__` method. And it would have to do this on every subscript call. Introspection is likely to be slow, possibly very slow. This would make subscripting slow. And it would break backwards compatibility. Right now, it is legal for subscript methods to be written like this: def __setitem__(self, item, value, extra=something) and there is no way for subscripting to call the method with that extra argument provided. Only if the user intentionally calls the dunder method themselves can they provide that extra argument. Not only is that legal, but it's also useful. E.g. the class might call the dunder directly, and provide non-default extra arguments, or it might use parameters as static storage (see the random module for examples of methods that do that). But your proposal will break that code. Right now, I could even define my dunder method like this: def __setitem__(*args) and it will Just Work because there is nothing special at all about parameter passing, even self is handled as a standard argument. Your proposal will break that: obj[1, 2, 3] = None # current semantics will pass in # args = (obj, (1, 2, 3), None) # your semantics will presumably pass in # args = (obj, 1, 2, 3, None) So, we have a proposal for a change that nobody has requested, that adds no useful functionality and fixes no problems, that is backwards incompatible and will slow down every single subscripting operation. What's not to like about it? :-) Maybe we wouldn't have designed subscripting this way back in Python 1 if we know what we know now, but it works well enough, and we have heard from numpy developers like Stephan Hoyer that this is not a problem that needs fixing. Can we please stop trying to "fix" positional subscripts? Adding keywords to subscripts is a genuinely useful new feature that I personally am really hoping I can use in the future, and it is really frustrating to see the PEP being derailed time and time again. -- Steve

There's another option (but I am not endorsing it): a[1:2, 2, j=4:7, k=3] means: a.__getitem__((slice(1, 2, None), 2, named("j", slice(4, 7, None)), named("k", 3)})) Where named is an object kind like slice, and it evaluates to the pure value, but also has a .name like slice() has .start. In any case, I think that Ricky Teachey might be onto something. Imagine we start from zero. There's no __getitem__. How would you envision it to work? The answer is probably going to be "like a function". that is, if you pass a[1, 2] it will receive two arguments as if it were __new_getitem__(self, v1, v2), with only a couple of magic stuff for slices and Ellipsis. From this, everything would behave rather naturally: a[v1=1, v2=2] would be automatic. and so the flipped one a[v2=2, v1=1] would return the same value. Now, my question is, would it _really_ be a major issue to introduce this __new_getitem__ to be used if available, like we have done already in the past for similar enhanced protocols? Because to me, this solution seems the most rational, flexible, and clear compared to any other option, which in the end are mostly workarounds fraught with either manual work or odd syntax. It would solve the ambiguity of the API, it would allow for the parser to handle the assignment of named arguments to the correct arguments, with the only problem of potentially allowing a[] as valid syntax if __new_getitem__(self, *args, **kwargs) is used. On Sat, 8 Aug 2020 at 03:49, Steven D'Aprano <steve@pearwood.info> wrote:
-- Kind regards, Stefano Borini

On Tue, Aug 25, 2020 at 4:26 PM Stefano Borini <stefano.borini@gmail.com> wrote:
In any case, I think that Ricky Teachey might be onto something.
Well when it comes to python ideas, that actually might be a first for me. ;)
Actually I think this *may not* be true. Consider, if we were starting at zero ,and we agreed we wanted dict literal behavior to work like the following:
This is what we have today, and I think most agree this is all a Good Thing. Then, we might reason, if I want to be able to do a lookup using tuple literals, I'd do this, just as we do today:
No tuple brackets required. I think it is REALLY NICE not to have to do THIS, which is what we'd have to do if things behaved the function-call-way inside square brackets:
Of course all of these DO work today-- it just isn't required to use the tuple brackets. They're optional. Could it be that the convenience of not needing to give the tuple brackets, if we were starting at zero, might win out over the function-call syntax even today? I don't know. I wonder. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On Tue, 25 Aug 2020 at 22:42, Ricky Teachey <ricky@teachey.org> wrote:
If we were to start from scratch, dict.__getitem__ would probably accept *args, **kwargs, and use them to create the index. I would also say that very likely it would not accept kwargs at all. But you have a point that whatever the implementation might be, it has to play nice with the current dict() behavior. Yet, if we were to add an enhanced dunder, nothing for the current dict would change. It would still use the old getitem, and it would still create a tuple from its (nameless) index group to be used as a key. -- Kind regards, Stefano Borini

On 26/08/20 10:03 am, Stefano Borini wrote:
Despite arguing against it earlier, I think I'm starting to shift towards being in favour of a new dunder. But only if it can be justified as existing in its own right alongside __getitem__, without any intention to deprecate or remove the latter. I think an argument can be made that the new dunder -- let's call it __getindex__ for now -- shouldn't be considered part of the mapping protocol. It's a new thing for classes that want to use the indexing notation in ways that don't fit the simple idea of mapping a key object to a value object. The only drawback I can think of right now is that it would make possible objects for which x[1, 2] and x[(1, 2)] were not equivalent. I don't think that's a serious problem. A class would always be able to arrange for its __getindex__ method to *make* them equivalent, and things like pandas would be encouraged to do so. For things like generic types, which use the index notation for something that isn't really indexing at all, it probably doesn't matter much. -- Greg

On Tue, Aug 25, 2020, 7:35 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
To be backwards compatible, any proposed change has to pass this test: ############ """Test #1""" class D: def __getitem__(self, key): return key d = D() # note the commas assert d[1,] == f((1,)) ################ Where f is some function that mimicks how python "translates" the "stuff" inside the square brackets. It is something like this, not quite: def f(*args): if not args: # empty brackets not allowed raise SyntaxError() if len(args) ==1: return args[0] return args The desire of many (me included) seems to be to make it possible for the subscript operator to utilize syntax very similar, or identical, to the behavior of a function call. In other words, to make it possible to write some class, Q, that allows the following tests to pass: ################# """Test #2""" # note the commas assert q[1,] == f(1,) assert q[1] == f(1) assert q[(1,)] == f((1,)) ################## The problem is, I don't think it is possible to change python internals in a backwards compatible way so that we can write a function that passes all of these tests. Because the index operator recognizes this as a tuple: 1, But function calls do not. This is an impasse. Does this constitute a big problem for the idea of providing new dunders? I'm not sure.

On Wed, Aug 26, 2020 at 11:31:25AM +1200, Greg Ewing wrote:
Most existing uses of subscripts already don't fit that key:value mapping idea, starting with lists and tuples. Given `obj[spam]`, how does the interpreter know whether to call `__getitem__` or `__getindex__`? What if the class defines both?
The only drawback I can think of right now is that it would make possible objects for which x[1, 2] and x[(1, 2)] were not equivalent.
Right now, both sets of syntax mean the same thing and call the same method, so you are introducing a backwards incompatible change that will break code. Can we postpone this proposal for Python 6000 in 2050? -- Steve

On Tue, Aug 25, 2020 at 9:50 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Mon, Aug 24, 2020 at 01:10:26PM -0400, Ricky Teachey wrote:
Sorry when I wrote that my intention was that `a` would be a tuple. But you're right, a better version would be to say: SIGNATURE === SEMANTICS (self, a, b) === (self, key_tuple_or_atomic_key_value, value)
Makes sense to me.
Well it would not have to inspect the signature on every subscript call, but it would have to call a wrapped function like this on every call <https://github.com/Ricyteach/sigdepsem/blob/master/sigdepsem/main.py#L5-L9> : def _adapt_get(func): @wraps(func) def wrapped(self, key): return func(self, *key) return wrapped So you're right, it would result in the need to make a second function call *if and only if* the writer of the class chose to write their function with more than the "normal" number of arguments. In the case the only one argument is in the signature of __getitem__ and __delitem__, and two in __setitem__, and all with no default values, it would not need to be wrapped and everything remains exactly as it does today <https://github.com/Ricyteach/sigdepsem/blob/master/sigdepsem/main.py#L26-L36> : def _adapt_item_dunders(cls): for m_name, adapter, n_params in zip(("getitem", "setitem", "delitem"), (_adapt_get, _adapt_set, _adapt_del), (2, 3, 2), ): m_dunder = f"__{m_name}__" if method := getattr(cls, m_dunder, None): if all(p.default==_empty for p in signature(method).parameters.values()) \ and len(signature(method).parameters)==n_params: return setattr(cls, m_dunder, adapter(method))
I'll respond to this at the end.
People have actually been asking for ways to make subscripting operator act more like a function call, so that's not true. And it could be useful. And it does help address the problem of incongruity (though not perfectly) between the way a function call handles args and kwargs, and the way the subscript operator does it. And it is totally backwards compatible except for the case of what I'd call skirting very clear convention (more on that below). And it does not slow down any subscripting operation unless the class author chooses to use it.
I'm not done trying, sorry. I think the incongruity is a problem.
I'm sorry to derail it if that is what I am doing, truly I am. But at this point it has honestly started to feel likely to me that adding kwargs support to [ ] is going to happen. All I am intending to do is explore other, possibly better, ways of doing that than the quick and easy way. On Tue, Aug 25, 2020 at 10:03 PM Steven D'Aprano <steve@pearwood.info> wrote:
Above and in your previous response to me, I think you're overstating your case on this by a large amount. Remember: the signature dependent semantics proposal is to maintain backwards compatibility, 100%, for any code that has been following the very clear intent of the item dunder methods. The only time the tuples will get broken up is if the author of the class signals their intent for that to occur. Sure, it might break some code because somebody, somewhere, has a __getitem__ method written like this: def __getitem__(self, a, b=None, c=None): ... ...and they are counting on a [1,2,3] operation to call: obj.__getitem__((1,2,3)) ...and not: obj.__getitem__(1,2,3) But are you really saying there is a very important responsibility not to break that person's code? Come on. The intent of the item dunders is extremely clear. People writing code like the above are skirting convention and there really should not be much expectation, on their part, to be able to do that forever with nothing breaking. I certainly wouldn't.

On Tue, Aug 25, 2020 at 10:51:42PM -0400, Ricky Teachey wrote:
Python is a very dynamic language and objects can change their own methods and even their class at any time, so, yes, it will have to inspect the signature on every call. Otherwise you are changing the language execution model. Demonstration: py> class Dynamic: ... def __getitem__(self, arg): ... print("original") ... type(self).__getitem__ = lambda *args: print("replaced") ... py> py> x = Dynamic() py> x[0] original py> x[0] replaced [...]
People have actually been asking for ways to make subscripting operator act more like a function call, so that's not true.
Yes, people have asked for keyword arguments. This proposal doesn't get us any closer to the feature wanted.
And it could be useful.
It doesn't give subscripting any additional functionality that doesn't already exist. It's a purely cosmetic change.
So not actually totally backwards compatible. Please stop calling things "totally backwards compatible" if they aren't totally backwards compatible. If you change the behaviour of existing legal code, it's not backwards compatible.
A problem for whom? A problem in what way? PEP 472 goes into detail about the sorts of things people find really hard to do with subscripting because of the lack of keywords. Where is the discussion by people about the things that are really hard to do because of the comma handling in subscripts? [...]
The Python language doesn't follow the rule "Do what we intend you to do", it follows the rule "this is how the language works, do whatever the language allows". PEP 5 doesn't mention the word "intent" and doesn't say "its okay to break people's code if they are using Python in ways we don't like". You don't get to say that breaking legal code that works today is okay because it doesn't follow "the intent". It's still a breaking change and you have to convince the Steering Council that breaking other people's code for the sake of fixing an incongruity is worthwhile. Imagine it was code *you* relied on that broke, and you no longer had the source code for it, it was a library compiled away in a .pyc file and the company that distributed it has gone broke and the source code lost, and replacing that library is going to cost your business a year of time and a hundred thousand dollars in development costs. But that's okay, because an incongruity that makes no difference to anyone's code has been fixed. Now maybe you can convince the Steering Council that this is a clear win for the broader community. We do break backwards compatibility, sometimes, if the risks are small enough and the benefits large enough. But be honest about what you are doing.
Yes. -- Steve

On Wed, Aug 26, 2020 at 9:48 AM Steven D'Aprano <steve@pearwood.info> wrote:
I see what you're saying there. So any solution involving `type` of `object` wrapping a child class method at class creation time should probably not be considered as a cpython implementation option because of the dynamic nature of classes. Understood.
Alright, you've made progress with me on this one. I'm still not totally convinced such code isn't asking to be broken, but you're right: just dismissing that as a nonissue isn't correct. It has to be weighed.
Well I will point to Greg Ewin's message from a while ago that I quoted at the start of this thread: On Tue, Aug 4, 2020, 2:57 AM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote: On 4/08/20 1:16 pm, Steven D'Aprano wrote:
These methods are already kind of screwy in that they don't handle *positional* arguments in the usual way -- packing them into a tuple instead of passing them as individual arguments. I think this is messing up everyone's intuition on how indexing should be extended to incorporate keyword args, or even whether this should be done at all. -- Greg If we simply add kwarg passing to item dunders, the incongruity between the subscripting operator and regular function calls will really hinder people's ability to understand the two. I think most people in these conversations seem to now agree that this isn't a good enough reason to exercise the do nothing option, but I think it is a very good reason to explore other ways of doing it to alleviate that incongruity. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On 26/08/20 1:59 pm, Steven D'Aprano wrote:
Most existing uses of subscripts already don't fit that key:value mapping idea, starting with lists and tuples.
Not sure what you mean by that.
Given `obj[spam]`, how does the interpreter know whether to call `__getitem__` or `__getindex__`? What if the class defines both?
If it has a __getindex__ method, it calls that using normal function parameter passing rules. Otherwise it uses a fallback something like def __getindex__(self, *args, **kwds): if kwds: raise TypeError("Object does not support keyword indexes") if not args: raise TypeError("Object does not accept empty indexes") if len(args) == 1: args = args[0] return self.__getitem__(args)
No existing object will have a __getindex__ method[1], so it won't change any existing behaviour. [1] Or at least not one that we aren't entitled to break. -- Greg

On Wed, Aug 26, 2020 at 03:06:25PM +1200, Greg Ewing wrote:
Lists and tuples aren't key:value mappings.
Presumably the below method is provided by `object`. If not, what provides it?
Empty subscripts will remain a syntax error, so I don't think this case is possible. What is your reasoning behind prohibiting keywords, when we're right in the middle of a discussion over PEP 474 which aims to allow keywords?
This is going to slow down the most common cases of subscripting: the interpreter has to follow the entire MRO to find `__getindex__` in object, which then dispatches to the `__getitem__` method. It would make more sense to put the logic in the byte-code, rather than `object`: if the object supports __getindex__ call it otherwise call __getitem__ which is closer to how other operators work, and avoids giving object a spurious dunder that it doesn't use. However it still has to check the MRO, so I don't know if that actually gains us much performance. The fundamental issue with this proposal is that, as far as I can see, it solves no problems and offers no new functionality. It just adds complexity. As Stephan Hoyer has said, the existing handling of commas in subscripts is not a problem for Numpy. I doubt it is a problem for Pandas either. Is it a problem for anyone?
In your earlier statement, you said that it would be possible for subscripting to mean something different depending on whether the comma-separated subscripts had parentheses around them or not: obj[(2, 3)] obj[2, 3] How does that happen? I thought I understood what you meant by that, but I obviously didn't. Can you explain how they would be different? I understood that you meant that the intepreter would choose which dunder to call according to whether or not there are parentheses around the items, but now I'm not sure what you meant. -- Steve

On 27/08/20 12:53 am, Steven D'Aprano wrote:
It's not literally a method, I just wrote it like that to illustrate the semantics. It would be done by the interpreter as part of the process of translating indexing operations into dunder calls.
What is your reasoning behind prohibiting keywords, when we're right in the middle of a discussion over PEP 474 which aims to allow keywords?
We're falling back to __getitem__ here, which doesn't currently allow keywords, and would stay that way. The point of this proposal is to not change __getitem__. If you want to get keywords, you provide __getindex__.
No, it would be done by checking type slots, no MRO search involved.
If the object has a __getindex__ method, it gets whatever is between the [] the same way as a normal function call, so comma-separated expressions become separate positional arguments. -- Greg

On Wed, Aug 26, 2020 at 11:30 AM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Can you elaborate on this for my understanding?
I am interested in this proposal, but I am pretty sure that last goal isn't entirely possible, it is only mostly possible. I believe that the way python handles the comma-separated expression inside the [ ] operator is just due to operator precedence, as Steve pointed out previously. So without a major change we cannot write code where this test suite fully passes (three part test): # TEST SUITE 1 assert q[1,] == f(1,) # part 1 assert q[1] == f(1) # part 2 assert q[1,] == f((1,)) # part 3 The only way around this would be for the function, f, to "know" about the presence of hanging commas. Similarly, without such "runtime comma detection" we cannot make it possible for this test suite to fully pass: # TEST SUITE 2 assert q[1,2] == f(1,2) # part 1 assert q[(1,2)] == f((1,2)) # part 2 assert q[(1,2),] == f((1,2),) # part 3 Can we? --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On 27/08/20 3:51 am, Ricky Teachey wrote:
For frequently-used special methods such as __getitem__, the type object contains direct pointers to C functions (so-called "type slots"). These are set up when the type object is created (and updated if the MRO is modified). For classes written in Python, they are filled with wrappers that invoke the user's Python code. So the implementation of the bytecodes that perform indexing would first look for a value in the slot for __getindex__; if it finds one, it calls it like a normal function. Otherwise it does the same as now with the slot for __getitem__. The overhead for this would be just one extra C pointer check, which you would be hard-pressed to measure.
We could probably cope with that by generating different bytecode when there is a single argument with a trailing comma, so that a runtime decision can be made as to whether to tupleify it. However, I'm not sure whether it's necessary to go that far. The important thing isn't to make the indexing syntax exactly match function call syntax, it's to pass multiple indexes as positional arguments to __getindex__. So I'd be fine with having to write a[(1,)] to get a one-element tuple in both the old and new cases. It might actually be better that way, because having trailing commas mean different things depending on the type of object being indexed could be quite confusing. -- Greg

On Wed, Aug 26, 2020 at 9:00 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Wow that's great news.
I didn't know that was possible. It's far beyond my knowledge. The idea of a new dunder is starting to sound like a much more realistic possibility than I had hoped. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

A bunch of the conversation here is how to handle both positional and keyword arguments with a single signature. Let me suggest an alternative. At compile time, we know if the call is made with keyword arguments or not. a[1] positional only a[b=1] keyword only a[1, b=1] both a[**kwargs] keyword only I suggest that in the first case it calls __getitem__(self, args) as it does now and in the other cases it calls __getitemx__(self, args, **kwargs) instead (the signature is what matters here, don't bikeshed the name). Some people have proposed arbitrary signatures for __getitemx__ but I think that's an unnecessary degree of complexity for little benefit. Personally, I'm not even sure I'd ever want to mix args and kwargs but including that in the signature preserves that possibility for others that find it useful. To explain further: when I use [...] without positional arguments the code works exactly as it does today with all args in a tuple. Therefore, for consistency when I add a keyword argument that should not change the args value which is why both signatures include a single args parameter. If you write a form that uses a keyword argument, and __getitemx__ does not exist then it would raise an error other than KeyError (either TypeError or AttributeError, with the latter requiring no special handling). --- Bruce

On Thu, Aug 27, 2020 at 03:28:07AM +1200, Greg Ewing wrote:
Okay, so similar to my suggestion that this would be better implemented in the byte-code rather than as a method of object.
Point of order: **the getitem dunder** already allows keywords, and always has, and always will. It's just a method. It's the **subscript (pseudo-)operator** which doesn't support keywords. This is a syntax limitation, not a limitation of the dunder method. If the interpreter supports the syntax, it's neither here nor there to the interpreter whether it calls `__getitem__` or `__getindex__` or `__my_hovercraft_is_full_of_eels__` for that matter. So if you want to accept keywords, you just add keywords to your existing dunder method. If you don't want them, don't add them. We don't need a new dunder just for the sake of keywords.
Okay, I didn't think of type slots. But type slots are expensive in other ways. Every new type slot increases the size of objects, and I've seen proposals for new dunders knocked back for that reason, so presumably the people care about the C-level care about the increase in memory and complexity from adding new type slots. Looking here: https://docs.python.org/3/c-api/typeobj.html I see that `__setitem__` and `__delitem__` are handled by the same type slot, so presumably the triplet get-, set- and del- index type slots would be the same. Still, that means adding two new type slots to both sequence and mapping objects. (I assume that it's not *all* objects that carry these slots. If it is all objects, that makes the cost of this proposal correspondingly higher.) So if I understand it correctly, we have some choices when it comes to sequence/mapping types: 1. the existing `__*item__` methods keep their slots, and new `__*index__` slots are created, which makes both the item and index dunders fast but increases the size of every object which uses any of those methods; 2. the existing item slots stay as they are, there are no new index slots, which keeps objects the same size but the new index protocol will be slow; 3. the existing item slots are repurposed for index, which keeps objects the same size, and the new protocol fast, but makes calling item dunders slow; 4. and just for completion because of course this is not going to happen, we could remove the existing item slots and not add index slots so that both protocols are equally slow; 5. alternatively, we could leave the existing C-level sequence and mapping objects alone, and create *four* brand new C-level objects: - a sequence object that supports only the new index protocol; - a sequence object that supports both index and item protocols; - and likewise two new mapping objects. Do I understand this correctly? Have I missed any options? Assuming I do, 4 is never going to happen and each of the others have some fairly large disadvantages and costs in speed, memory, and complexity. Without a correspondingly large advantage to this new `__*index__` protocol, I don't see this going anywhere.
The compiler doesn't know whether the object has the `__getindex__` method at compile time, so any process that relies on that knowledge isn't going to work. There can only be one set of parsing rules that applies regardless of whether the object defines the item dunders or the index dunders or neither. Right now, if you call `obj[1,]` the dunder receives the tuple (1,) as index. If it were treated as function call syntax, that would receive a single argument 1 instead. It were treated as a tuple, as required by backwards compatibility, that's an inconsistency between subscripts and function calls, and the whole point of your proposal is to remove that inconsistency. Rock (you are here) Hard Place. Do you break existing code, or fail in your effort to remove the inconsistencies? I don't care two hoots about the inconsistencies, I just want to use keywords in my subscripts, so for me the answer is obvious: keep backwards compatibility, and there is no need to add new dunders to only partially fix something which isn't a problem. Another inconsistency: function call syntax looks like this: call ::= primary "(" [argument_list [","] | comprehension] ")" which means we can write generator comprehensions inside function calls without additional parentheses: func(expr for x in items) # unambiguously a generator comprehension This is nice because the round brackets of the function call match the round brackets used in generator comprehensions, so it is perfctly consistent and unambiguous. But if you do that in a subscript, we currently get a syntax error. If we allowed it, it would be pretty weird for the square brackets of the subscript to create a *generator* comprehension rather than a list comprehension. But we surely don't want a list comprehension by default: obj[(expr for x in items)] # unambiguously a generator comprehension obj[[expr for x in items]] # unambiguously a list comprehension obj[expr for x in items] # and this is... what? It looks like it should be a list comprehension (it has square brackets, right?) but we probably don't want it to be a list comp, we'd prefer it to be a generator comp because they are more flexible. Only that would look weird and would lead to all sorts of questions about why list comprehension syntax sometimes gives a list and sometimes a generator. But if we leave it out, we have an inconsistency between subscripting and function calls, and for those who are motivated by removing that inconsistency, that's a Bad Thing. For me, again, the answer is obvious: we don't have to support this for the sake of consistency, because consistency isn't the motivation. I just want keywords. -- Steve

On 27/08/20 3:56 pm, Steven D'Aprano wrote:
Yes, I could have worded that better. What I meant was that no existing __getitem__ method expects to get keywords given to it via indexing notation, and under my proposal that wouldn't change.
Nobody disputes that it *could* be made to work that way. But I'm not convinced that it's the *best* way for it to work. The killer argument in my mind is what you would have to do to make an object where all of the following are equivalent: a[17, 42] a[time = 17, money = 42] a[money = 42, time = 17] With a fresh new dunder, it's dead simple: def __getindex__(self, time, money): ... With a __getitem__ that's been enhanced to take keyword args, but still get positional args packed into a tuple, it's nowhere near as easy.
It doesn't seem like a serious problem to me. Type objects are typically created once at program startup, and they're already quite big. If it's really a concern, the new slots could be put in a substructure, so the type object would only be bigger by one pointer if they weren't being used. Another possibility would be to have them share the slots for the existing methods, with flags indicating which variant is being used for each one. A given type is only going to need either __getindex__ or __getitem__, etc., not both.
I don't follow this one. How can there be both old and new protocols without adding new dunder methods?
See my earlier post about that.
As I said earlier, I don't mind if the contents of the square brackets don't behave exactly like a function argument list. I expect the use cases for passing a generator expression as an index to be sufficiently rare that I won't mind having to put parens around it.
obj[expr for x in items] # and this is... what?
I'm fine with it being a syntax error. -- Greg

On Thu, Aug 27, 2020 at 11:13:38PM +1200, Greg Ewing wrote:
I don't think that something like that is exactly the intended use-case for the feature, at least not intended by *me*, that looks more like it should be a function API. But I'm sure people will want to do it, so we should allow it even if it is an (ever-so-slight) abuse of subscript notation. (But I note that one of the PEP co-authors, Joseph Martinot-Lagarde, has now suggested that we bypass this question by simply requiring an index, always.)
Okay, that's an advantage if you're writing this sort of code. I'm not sure its a big enough advantage to require such changes, new dunders, new byte codes, etc, but YMMV. [...]
*shrug* I don't know enough about type slots, except I am confident that I have seen dunders rejected because it would require adding new type slots. But maybe I have the details wrong and am thinking of something else.
In which case, why do we need new names for the methods? Effectively they're the same method, just with different behaviour and a different name, but since you can only use one even if you define both, why would you define both? I would expect it is less confusing and error prone to just say "Starting in version 3.10 subscript parsing changes and item dunders should change their signature ..." than to have two sets of dunders, especially when the names are so similar that invariably people will typo the names: class X: def __getindex__(self, ...): # later def __setitem__(self, ...): # oops I'd rather a conditional: class X: if sys.version_info >= (3, 10): # New signatures def __getitem__(self, ...): else: # Old signatures Or conditionally inherit the dunders from a mixin: if sys.version_info >= (3, 10): from this_is_the_future import item_mixin: else: from legacy import item_mixin class X(item_mixin): pass Having to remember which name goes with which version is going to be annoying and tricky, especially since for many simple cases like lists and dicts the getitem and getindex versions will be identical. [...]
I didn't say anything about inventing new protocols without new dunders. All five of my alternatives were under the assumption that the new protocol and new dunders was going ahead. But as I mention above, if we did introduce this, I'd rather we keep the dunders and change the name rather than introduce confusable names. -- Steve

On 30/08/20 3:06 pm, Steven D'Aprano wrote:
I don't see why we need to pick one use case to bless as the official "true" use case. If keyword indexing becomes a thing, this is something that people will naturally *expect* to be able to do with it, and it will be confusing and frustrating if it can't be done, or is unnecessarily difficult to do. Plus we have at least one real-world use case, i.e. xarray, where it would be a very obvious thing to do.
How is it an abuse? To me it seems less of an abuse than other uses where the keyword args *don't* represent index values.
It's a cost-benefit tradeoff. I'm just pointing out that what we're discussing can be implemented efficiently, if the benefits are judged worth the effort of doing so.
I'm not really serious about that suggestion, I was just saying that it *could* be done if we were really paranoid about making the type object slightly bigger. I would consider that level of paranoia to be excessive. As for why we need new names, it's for backwards compatibility with code that expects to be able to call an object's __xxxitem__ methods directly, e.g. for delegation. It's not that an object can't or won't *have* both __getitem__ and __getindex__, it's that we don't strictly need to *optimise* access to both of them. Dunders can exist without having type slots, they're just slower to invoke then. As part of this, if a class only defines one of __getitem__ and __getindex__, the code that sets up the type object would have to provide a wrapper for the other one that delegates to the defined one.
Are you volunteering to rewrite all existing Python code that defines item dunders? :-)
Having to remember which name goes with which version is going to be annoying and tricky,
I'm not wedded to the name __getindex__. It can be called __newgetitem__ or __getitemwithfancyparemeterpassing__ if people thought that would be less confusing. -- Greg

On Sun, Aug 30, 2020 at 05:49:50PM +1200, Greg Ewing wrote:
We do that for every other operator and operator-like function. Operator overloading is a thing, so if you want `a + b` to mean looking up a database for `a` and writing it to file `b`, you can. But the blessed use-cases for the `+` operator are numeric addition and sequence concatenation. Anything else is an abuse of the operator: permitted, possibly a *mild* abuse, but at best it's a bit wiffy. (There are a few gray areas, such as array/matrix/vector addition, which sneak into the "acceptable" area by virtue of their long usage in mathematics.) If you don't like the term "abuse", I should say I am heavily influenced by its use in mathematics where "abuse of notation" is not considered a bad thing: https://en.wikipedia.org/wiki/Abuse_of_notation In any case, as you say (and agreeing with me) people will want to do something like this, and it is not my intent to stop them.
Plus we have at least one real-world use case, i.e. xarray, where it would be a very obvious thing to do.
Indeed, and I can think of at least two use-cases where I would immediately use it myself, if I could.
Compares to the majority of realistic examples given in this discussion and the PEP, your example doesn't look to me like an item or key lookup. It looks more like a function call of some sort. I know that, in a sense, subscript lookups are like function calls, and I also acknowledge that I don't know your intended semantics of that example. It might even be that you intended it as a lookup of some kind of record with time=17 and money=42. But let's not get too bogged down into not-picking. We agree that people should be allowed to do this. [...]
Of course not. This is why we ought to be cautious about introducing backwards-incompatible changes. I don't think that we should care about this "problem" that subscripting only takes a single subscript. But if we do "fix" it, I'd rather the straight-forward future directive (and a long, long deprecation period for the old behaviour) than two sets of confusingly similar getters and setters.
The trouble with naming anything "new" is that in ten years time it's not new any more, and you could end up needing a "new new" method too. -- Steve

On 30/08/20 3:06 pm, Steven D'Aprano wrote:
But yes, it's just another example. And when I teach it I always say "this is a funny but intuitive overloading." -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

very related to this conversation -- isn't the use of [] for indexing and its use in Type hinting functionally very different as well? -CHB On Tue, Sep 1, 2020 at 9:55 AM David Mertz <mertz@gnosis.cx> wrote:
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

A digression on abuse of operators, notation, and "abuse of notation". Steven D'Aprano writes:
On Sun, Aug 30, 2020 at 05:49:50PM +1200, Greg Ewing wrote:
Of course the example is abusive. But not being "blessed" isn't the reason. (And shouldn't the notation be "b += a" for that operation? ;-)
Anything else is an abuse of the operator: permitted, possibly a *mild* abuse, but at best it's a bit wiffy.
I don't agree. The reasons for picking a "blessed" use case for operators do not include "to exclude other uses". Rather, they are to help the language designer decide things like precedence and associativity, to give developers some guidance in choosing an operator to overload for a two-argument function, and to provide readers with a familiar analog for the behavior of an operator. For example, sequence concatenation is not "blessed because it's blessed". It's blessed because it just works. What does "5 * 'a' + 'b'" mean to the reader? Is the result 'aaaaab', or 'ababababab'? The reader knows it's the former, and that is the right answer for the developer because 'ababababab' == 5 * 'ab' -- there's another way to construct that without parenthesis. Why is 5 * 'a' == 'aaaaa'? Because 5 * 'a' == 'a' + 'a' + 'a' + 'a' + 'a'. None of this is an accident. There's nothing wiffy about it. Recall that some people don't like the use of '+' for sequence concatenation because the convention in abstract algebra is that '+' denotes an commutative associative binary operation, while '*' need not be commutative. But I think you don't want to use '*' for sequence concatenation in Python because 5 * 'a' * 'b' is not associative: it matters which '*' you execute first. Where things get wiffy is when you have conflicting requirements, such as if for some reason you want '*' to bind more tightly than '+' which in turn should bind at least as tightly as '/'. You can swap the semantics of '+' and '/' but that may not "look right" for the types involved. You may want operator notation badly for concise expression of long programs, but it will always be wiffy in this circumstance. And use of "a + b" to denote "looking up a database for `a` and writing it to file `b`" is abusive, but not of notation. It's an abuse because it doesn't help the reader remember the semantics, and I don't think that operation is anywhere near as composable as arithmetic expressions are.
I think operator overloading and abuse of notation are quite different things, though. Using '*' to denote matrix multiplication or "vertical composition of morphisms in a category" is not an abuse of notation. It's overloading, using the same symbol for a different operation in a different domain. Something like numpy broadcasting *is* an abuse of notation, at least until you get used to it: you have nonconforming matrices, so the operation "should" be undefined (in the usual treatment of matrices). If you take "undefined" *literally*, however, there's nothing to stop you from *defining* the operation for a broader class of environments, and it's not an abuse to do so. So if you have a class Vector and a function mean(x: Vector) -> Vector, v = Vector(1, 2, 3) m = mean(v) # I Feel Like a Number, but I am not variance = (v - m) * (v - m) / len(v) "looks like" an abuse of notation to the high school student targeted by the "let's have a Matrix class in stdlib" thread. But as Numpy arrays show, that's implementable, it's well-defined, and once you get used to it, the meaning is clear, even in the general case. You could do the same computation with lists v = [1, 2, 3] m = sum(v)/len(v) variance = (v - len(v) * [m]) * (v - len(v) * [m]) / len(v) but I don't think that's anywhere near as nice. To call definitions extending the domain of an operator "abuse of notation" is itself an abuse of language, because abuse of notation *can't happen* in programming! Abuse of notation is synecdoche (use of a part to denote the whole) or metonymy (use of a thing to represent something related but different). But do either in programming, and you get an incorrect program. No matter how much you yell at Guido, the Python translator is never going to "get" synecdoche or metonymy -- it's just going to do what you don't want it to do.

On Wed, Sep 2, 2020 at 3:48 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Possibly a better example would be path division in Python, or stream left/right shift in C++. What does it mean to divide a path by a string? What does it mean to left-shift std::cout by 5? Those uses don't make a lot of sense based on the operations involved, but people accept them because the symbols look right:
pathlib.Path("/") / "foo" / "bar" / "quux" PosixPath('/foo/bar/quux')
Is that abuse of notation or something else? Whatever it is, it's not "operator overloading" in its normal sense; division is normally the inverse of multiplication, but there's no way you can multiply that by "quux" to undo that last operation, and nobody would expect so. Maybe we need a different term for this kind of overloading, where we're not even TRYING to follow the normal semantics for that operation, but are just doing something because it "feels right". ChrisA

Chris Angelico writes:
Possibly a better example would be path division in Python,
I would have chosen '+' for that operation for the same reason '+' "just works" for sequences. It's a very specialized use case but 5 * pathlib.Path('..') / "foo" makes sense as an operation but oh, the cognitive dissonance! It's so specialized that <0.99 wink>. '/' is fine for pathlib, and I trust Antoine's intuition that it will "work" for (rather than against ;-) pathlib users.
or stream left/right shift in C++.
I'm very sympathetic to your argument for this one.
Is that abuse of notation or something else?
As I wrote earlier, I don't think "abuse of notation" is a useful analogy here.
But if you spell it '+', pathlib.Path("/") + "foo" + "bar" + "quux" - "bar" makes some sense, modulo the definition of the case with multiple "bar". I don't argue it's useful, just interpretable without head explosion.
"Overloading" for this case doesn't bother me, but if you want to introduce a new term, Greg Ewing's "operator repurposing" WFM. Steve

On 2/09/20 2:24 am, Steven D'Aprano wrote:
It was a reference to earlier discussions of pandas and xarray, where there is a notion of axes having names, and a desire to be able to specify index values by name, for the same reasons that we sometimes want to specify function arguments by name. It's true that you can't tell just by looking at an indexing expression what the semantics of the keywords are, but the same is true of function calls. We rely on contextual knowledge about what the function does in order to interpret the keywords. Likewise here. If you know from context that a is an array-like object with axes named 'time' and 'money', then I think the meaning of the indexing expression will be quite clear. I also think that it's something people will naturally expect to be able to use index keywords for -- to the point that if they can't, their reaction will be "Why the flipping heck not?" Which is why I was somewhat perplexed when it was suggested that we should discount this use case purely because it wasn't cited in the original proposal (which turned out to be wrong anyway). -- Greg

On Tue, Sep 1, 2020 at 4:57 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I agree it's a fine use case. Using the currently prevailing proposal (which I steadfastly will refer to as "Steven's proposal") it's quite possible to implement this. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On Tue, Sep 1, 2020, 8:35 PM Guido van Rossum <guido@python.org> wrote:
-- --Guido van Rossum (python.org/~guido)
Can someone tell me: what will be the canonical way to allow both named or unnamed arguments under Steven's proposal? With function call syntax, it is trivial: def f(time, money): ... The interpreter handles it all for us:
But what is the way when we can't fully benefit from python function signature syntax? Here's my attempt, it is probably lacking. I'm five years into a self-taught venture in python... If I can't get this right the first time, it worries me a little. MISSING=object() def __getitem__(self, key=MISSING, time=MISSING, money=MISSING): if time is MISSING or money is MISSING: if time is not MISSING and key is not MISSING: money = key elif money is not MISSING and key is not MISSING: time = key else: time, money = key Is this right? Wrong? The hard way? The slow way? What about when there are three arguments? What then?

On Tue, Sep 1, 2020, 9:06 PM Ricky Teachey <ricky@teachey.org> wrote:
Is this right? Wrong? The hard way? The slow way?
Actually just realized I definitely did get it wrong: def __getitem__(self, key=MISSING, time=MISSING, money=MISSING): if time is MISSING or money is MISSING: if time is not MISSING and key is not MISSING: money, key = key, MISSING elif money is not MISSING and key is not MISSING: time, key = key, MISSING elif time is MISSING and money is MISSING: (time, money), key = key, MISSING if key is not MISSING: raise TypeError() Still not sure if that is right.

On Tue, Sep 1, 2020, 9:20 PM Ricky Teachey <ricky@teachey.org> wrote:
Actually I suppose this is the best way to do it: def _real_signature(time, money): ... class C: def __getitem__(self, key, **kwargs): try: _real_signature(*key, **kwargs) except TypeError: try: _real_signature(key, **kwargs) except TypeError: raise TypeError() from None Sorry for all the replies but I'm honestly very unsure how do this correctly under Steven's proposal.

I may get a chance to look carefully at this in a bit, but for now: On Tue, Sep 1, 2020 at 6:38 PM Ricky Teachey <ricky@teachey.org> wrote:
Sorry for all the replies but I'm honestly very unsure how do this correctly under Steven's proposal.
That's OK. It's going to be tricky, and keeping backward compatibility makes that necessary. And, in fact, the current sytax is pretty tricky as well. Say you want to make a class that takes multiple indices -- like a Mtraix, for instance. your __getitem__ looks like: def __getitem__(self, index): as they all do. But what might index be? It could be one of: * an object with an __index__ method -- so a single index, simple * a slice object * a tuple, where each item in the tuple could be both of the above. Or, of course, it could be invalid, and you need to raise an Exception with a meaningful message. So there's a lot of logic there and we haven't even started in the keywords :-) But that's all OK, as the number of people that need to write these methods is small, and the number of people that can use the feature is large. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Wed, 2 Sep 2020 at 05:20, Christopher Barker <pythonchb@gmail.com> wrote:
And this is the reason why personally I would prefer to add a new dunder with more defined semantics. This poor sod of a method has already a lot to handle, exactly because it's behaving nothing like a function. As a programmer developing that method, you are essentially doing the job of the interpreter.
But that's all OK, as the number of people that need to write these methods is small, and the number of people that can use the feature is large.
True but there's a point where one should question if the direction getitem has taken is the right one. -- Kind regards, Stefano Borini

On Wed, Sep 2, 2020, 12:19 AM Christopher Barker <pythonchb@gmail.com> wrote:
I really appreciate that thanks. Good to know this is a hard thing it's ok to struggle with. But that's all OK, as the number of people that need to write these methods
is small, and the number of people that can use the feature is large.
-CHB
A point I haven't seen stated elsewhere yet so I'll state it: shouldn't we expect the number of people wanting to write them to go up quite a bit once the subscript syntax becomes more flexible/capable/expressive with the addition of kwargs, and it becomes pretty obvious that function-like subscript calls are now possible... Perhaps they might even appear by many to be encouraged.

On Tue, Sep 1, 2020, at 21:06, Ricky Teachey wrote:
I'm using () rather than a MISSING as the sentinel here because it makes the code simpler. For the same reason, if we need to choose a key to pass in by default for a[**kwargs only] without defining a () in __getitem__, I have consistently advocated using () for this. Likewise, *not* having a default [or requiring the method to define its own default] key presents a slight problem for __setitem__, illustrated below. def __getitem__(self, key=(), /, **kwargs): return self._getitem(*(key if isinstance(key, tuple) else (key,)), **kwargs) def __setitem__(self, key=(), value=???, /, **kwargs): return self._setitem(value, *(key if isinstance(key, tuple) else (key,)), **kwargs) def _getitem(self, time, money): ... def _setitem(self, value, time, money): ... [delitem the same as getitem] Basically, you can easily write code that leverages the existing function argument parsing, simply by performing a function call.

On 2020-09-01 07:24, Steven D'Aprano wrote:
I don't agree with that at all. I see the operators as abstract symbols indicating vague areas of semantic space. Any operation that involves "combining" of some sort is a fair and non-wiffy use of `+`. Any operation that involves "removal" of some sort is a fair and non-wiffy use of `-`. And so on. For instance, some parsing libraries use `+` or `>>` to indicate sequences of syntactic constructs, and that's great. (I actually think that this flexibility should be more clearly expressed in the Python docs, but that's a separate "idea" that I'll have to post about another time.) For me the equally important (or more important) question is not what an individual operator in isolation means, but how a given type coherently organizes the operators. In other words, it's not enough to have `+` defined to mean something sensible for a given type; the type must assign its operations to operators in such a way that you don't get confused about which ones is `+` and which one is `*` and which one is `>>` and so on. Similarly, even if a particular use of `+` seems odd conceptually on its own, it can still be excellent if it fits in with the overall scheme of how operators work on that type. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Sun, 30 Aug 2020 at 04:09, Steven D'Aprano <steve@pearwood.info> wrote:
The above feature is well intended and part of the original PEP, mostly for two reasons: 1. During my career I encountered issues where people confused the axis of a matrix. This is especially problematic when the matrix is symmetric. 2. pandas has this exact problem with column names, and while they have a workaround, in my opinion it is suboptimal
But as I mention above, if we did introduce this, I'd rather we keep the dunders and change the name rather than introduce confusable names.
that is something it's hard to fight against. unless we call it something like _extended_getitem_? getItemEx was proposed by GvR for the C API... it seems to be a frequent convention.

On Sun, Aug 30, 2020 at 6:19 PM Stefano Borini <stefano.borini@gmail.com> wrote:
2. pandas has this exact problem with column names, and while they have a workaround, in my opinion it is suboptimal
What is the workaround? <http://python.org/psf/codeofconduct/> --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On Tue, Aug 25, 2020 at 09:23:18PM +0100, Stefano Borini wrote:
This is not another option, it's just a variant on Jonathan Fine's "key object" idea with a new name.
Where named is an object kind like slice, and it evaluates to the pure value, but also has a .name like slice() has .start.
This is contradictory, and not possible in Python without a lot of work, if at all. You want `named("k", 3)` to evaluate to 3, but 3 has no `.name` attribute. So you can only have one: either `named("k", 3)` evaluates to a special key object with a .name attribute "k", or it evaluates to 3. Pick one. -- Steve

On Tue, Aug 4, 2020 at 8:17 AM Ricky Teachey <ricky@teachey.org> wrote:
My main issue with this is that, in my opinion, dunders are not something a beginner should be messing with anyway. By the time someone is experienced enough to start working on this, they are also experienced enough to understand that special cases like this exist for historical reasons.
But this is still a pretty simple piece of code. Is it worth having everyone start over from scratch to avoid dealing with 4 lines of code? Especially since knowing the number of indices ahead of time is a special case for a small number of projects like pandas. In most cases, the number of indices cannot be known until runtime, so this would provide no practical benefit for most projects.
The simplest way would be to put "value" first: def __setx__(self, __value, /, *__key, **__kwargs):

Hi Todd thanks for your response. On Tue, Aug 4, 2020 at 11:01 AM Todd <toddrjen@gmail.com> wrote:
Yeah I understand and agree but even non-beginners benefit a lot from having consistent behaviors in the language, and not having to remember exceptions. As for my specific idea of how to accomplish the signature change, it's true that adding replacement getx/setx/delx dunders to the language would in fact be *additional* things to remember, not *fewer *things. But the goal would be to eventually replace getitem/setitem/delitem-- so this would be a temporary situation and eventually go away. Similar to how most people don't have a daily need to remember any number of old, previously standard and important, ways of doing things after a few years of transition.
This alone wouldn't be enough of a benefit, I agree. I find the combined benefits taken together compelling enough to at least warrant the exploration.
I didn't mention that as an option because-- since we are dealing with positional only arguments much of the time-- it could become very confusing to switch the order of the key and value arguments in actual code. But if that is the case then the __setx__ hurdle appears insurmountable, apart from modifying the language so that function signatures can behave similar to this syntax feature: first, *rest, last = [1, 2, 3, 4, 5, 6] assert first == 1 assert last == 6 assert rest == [2, 3, 4, 5] ... so then you could write a function signature in this way: def f(first, *rest, last, /): return first, rest, last first, rest, last = f(1, 2, 3, 4, 5, 6) assert first == 1 assert last == 6 assert rest == [2, 3, 4, 5] So I suppose that would be yet another change that would need to happen first. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On Wed, Aug 5, 2020, 09:47 Ricky Teachey <ricky@teachey.org> wrote:
But it isn't really an exception. Lots of arguments accept sequences of various types. It doesn't take full advantage of what the language can do, but it also isn't inconsistent with the language. Any change has to balanced against the cost of rewriting every textbook and tutorial on the subject. Adding labelled indexes wouldn't be as much of an issue since nobody who doesn't need it needs to think about it. But changing the default dunder signature is something everyone dealing with those dunder methods would need to deal with. As for my specific idea of how to accomplish the signature change, it's
The problem is backwards-compatibility. The current solution isn't going to go away any time soon, if ever. It is too fundamental to the language to change. The time to do that would have been the python 2-to-3 switch. Nothing like that is planned.
I guess that is where we disagree. I don't see the advantages as all that large compared to the disadvantage of everyone having to deal with two implementations, perhaps for decades.
More confusing than changing from a tuple to separate arguments, with specific requirements for where the ”/” must go for it to work? I think if they are making such a big switch already, the order of the arguments is a relatively small change. But if that is the case then the __setx__ hurdle appears insurmountable,
Yes, that is a completely different discussion.

On 2020-08-04 at 10:58:51 -0400, Todd <toddrjen@gmail.com> wrote:
Ouch. Who am I to tell beginners what they should and shouldn't be messing with, and in what order? IMNSHO, history (let alone understanding it) comes *way* after writing a few dunders, even if you don't count __init__ as a dunder. Special cases aren't special enough to break the rules.

On Wed, Aug 5, 2020, 10:38 <2QdxY4RzWzUUiLuE@potatochowder.com> wrote:
I should have been more clear, I was talking about these specific dunder methods. Overriding indexing isn't something that people will typically deal with before understanding the sequences indexing acts on, which is all they really need to understand to make sense of the current API. And history is hard to avoid. Why is it "def" instead of "function"? Why do dunder methods use "__" at all? Why does indexing use 0 instead of 1? Some things they are just going to need to accept are the way they are, and these are all things people are almost certainly going to need to encounter before making their own classes. Special cases aren't special enough to break the rules.
But they're aren't really any rules being broken here. It may be a sub-optimal solution in this case, but not a forbidden or even uncommon approach.

On Tue, Aug 04, 2020 at 10:58:51AM -0400, Todd wrote:
Define "beginner". I'm serious -- beginners to Python vary from eight year olds who have never programmed before, to people with fifty years of programming experience in a dozen different languages aside from Python. I'm not going to teach newcomers to programming object oriented techniques in the first day, but as soon as a programmer wants to create their own class, they will surely need to understand how to write dunders.
This proposal doesn't say anything about reversing the decision made all those years ago to bundle all positional arguments in a subscript into a single positional parameter. What's done is done, that's not going to change. Nobody has to start over from scratch. Nobody needs to change a single line of code unless they want to add support for keyword arguments to their class, and only some classes will do that. This proposal is completely 100% backwards compatible except that what was a SyntaxError turns into a TypeError: obj[param=value] TypeError: __getitem__ got an unexpected keyword argument 'param' (or something like that). -- Steven

On Fri, Aug 07, 2020 at 05:54:18PM +1000, Steven D'Aprano wrote:
Sorry, I was referring to the proposal that inspired this thread, to add keyword arguments to subscripting. There's an actual concrete use-case for adding this, specifically for typing annotations, and I cannot help but feel that this thread is derailing the conversation to something that has not been requested by anyone actually affected by it. I may have allowed my frustration to run ahead of me, sorry. There is a tonne of code that relies on subscripting positional arguments to be bundled into a single parameter. Even if we agreed that this was suboptimal, and I don't because I don't know the rationale for doing it in the first place, I would be very surprised if the Steering Council gave the go-ahead to a major disruption and complication to the language just for the sake of making subscript dunders like other functions. Things would be different if, say, numpy or pandas or other heavy users of subscripting said "we want the short term churn and pain for long term benefit". But unless that happens, I feel this is just a case of piggy-backing a large, disruptive change of minimal benefit onto a a small, focused change, which tends to ruin the chances of the small change. So please excuse my frustration, I will try to be less grumpy about it. -- Steven

On Fri, Aug 7, 2020 at 4:19 AM Steven D'Aprano <steve@pearwood.info> wrote:
Well I wonder if they haven't asked because it would be such a huge change, and it seems unlikely to happen. But I surely don't know enough about the implementation details of these libraries to be able to say for certain one way or the other. I may have allowed my frustration to run ahead of me, sorry.
I understand the grumpiness given your explanation. I'm really not wanting to derail that kwd args proposal-- I really like it, whatever the semantics of it turn out to be. I was actually trying to help the kwd arg case here. As illustrated by the quote I included from Greg Ewing, there seems to be not even close to a consensus over what the semantic meaning of this should be: m[1, 2, a=3, b=2] Which could be made to mean one of the following things, or another thing I haven't considered: 1. m.__get__((1, 2), a=3, b=4) # handling of positional arguments unchanged from current behavior 2. m.__get__(1, 2, a=3, b=4) # change positional argument handling from current behavior 3. m.__get__((1, 2), {'a': 3, 'b': 4}) # handling of positional arguments unchanged from current behavior 4. m.__get__(KeyObject( (1, 2), {'a': 3, 'b': 4} )) # change positional argument handling from current behavior only in the case that kwd args are provided As Greg said: These methods are already kind of screwy in that they don't
To illustrate the comments of "kind of screwy" and "the usual way", using semantic meaning # 1 above, then these are totally equivalent * : m[1, 2, a=3, b=4] m[(1, 2), a=3, b=4] ...even though these are totally different: f(1, 2, a=3, b=4) f((1, 2), a=3, b=4) So my intention here isn't to derail, but to help the kwd argument proposal along by solving this screwiness problem. It is to suggest that maybe a way forward-- to make the intuition of the semantics of kwd args to [ ] much more obvious-- would be to change the signature so that this incongruity between what happens with "normal" method calls and the "call" for item-get/set/del can be smoothed out. If that incongruity were to be fixed, it seems to me it would become *obvious* that the semantic meaning of ` m[1, 2, a=3, b=2]` should definitely be: m.__get__(1, 2, a=3, b=4) But if all of this is not helping but hindering. I am happy to withdraw the idea. * Citing my source: I borrowed these examples from Jonathan Fine's message in the other thread --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On 8/08/20 4:09 am, Ricky Teachey wrote:
It would certainly achieve that goal. The question is whether it would be worth the *enormous* upheaval of replacing the whole __getitem__ protocol. It's hard to overstate what a big deal that would be. The old protocol would still be with us, complicating everything unnecessarily, for a very long time. It took Python 3 to finally get rid of the __getslice__ protocol, and I don't think anyone has the appetite for a Python 4 any time soon. -- Greg

On 8/7/2020 8:28 PM, Greg Ewing wrote:
I don't think anyone has the appetite for a Python 4 any time soon.
I'm included in "anyone" here. From reading this list, it seems to me that "Python 4" is invoked as some folks favorite magical justification for proposing major breaking changes. Python 3 works quite well, I think. Non-breaking, incremental changes suite me much better that large breaking ones. I have better things to do with my time than complete software rewrites of all the software projects I work on. --Edwin

On Fri, Aug 7, 2020 at 9:25 PM Edwin Zimmerman <edwin@211mainstreet.net> wrote:
Nobody is asking you to rewrite anything in this thread. Quoting my first message: On Tue, Aug 4, 2020 at 8:14 AM Ricky Teachey <ricky@teachey.org> wrote:
That's what I'm after, and making a (likely poor) attempt at a proposal to accomplish. If the answer is no, fine. On Fri, Aug 7, 2020 at 8:30 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
My hope-- with the proposal I made (new getx/setx/delx dunders that call the old getitem/setitem/delitem dunders)-- was to avoid most of that. Folks would be free to continue using the existing dunder methods as long as they find it to be beneficial. But maybe it wouldn't work out that way. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On Fri, Aug 7, 2020 at 9:12 AM Ricky Teachey <ricky@teachey.org> wrote:
NumPy and pandas both care a lot about backwards compatibility, and don't like churn for the sake of churn. Speaking as someone who has been contributing to these libraries for years, the present syntax for positional arguments in __getitem__ is not a big deal, and certainly isn't worth breaking backwards compatibility over. In practice, this is worked around with a simple normalization step, e.g., something like: def __getitem__(self, key): if not isinstance(key, tuple): key = (key,) # normal __getitem__ method This precludes incompatible definitions for x[(1, 2)] and x[1, 2], but really, nobody this minor inconsistency is not a big deal. It is easy to write x[(1, 2), :] if you want to indicate a tuple for indexing along the first axis of an array. From my perspective, the other reasonable way to add keyword arguments to indexing would be in a completely backwards compatible with **kwargs.

On Fri, Aug 7, 2020 at 6:29 PM Stephan Hoyer <shoyer@gmail.com> wrote:
I'm sorry, I did a poor job of editing this. To fill in my missing word: From my perspective, the *only* reasonable way to add keyword arguments to indexing would be in a completely backwards compatible way with **kwargs.

On Sat, 8 Aug 2020 at 02:34, Stephan Hoyer <shoyer@gmail.com> wrote:
I'm sorry, I did a poor job of editing this.
To fill in my missing word: From my perspective, the *only* reasonable way to add keyword arguments to indexing would be in a completely backwards compatible way with **kwargs.
Let me clarify that I don't like the kwargs solution to the current getitem, and the reason is simple. The original motivation for PEP-472 was to do axis naming, that is, to be sure you would be able to clearly indicate (if you so wished) data[23, 2] explicitly as e.g. data[day=23, detector=2] or data[detector=2, day=23] If you have **kwargs, now the first case would send the two values to the nameless index, the second case to the kwargs, and inside the getitem you would have to reconcile the two, especially if someone then writes data[24, 5, day=23, detector=2] typing of course has the same issue. What this extended syntax is supposed to be used for is to define specialisation of generics. but it's difficult to now handle the cases List[Int] vs List[T=Int] vs List[Int, T=int], which are really telling the same thing the first two, and something wrong (TypeError multiple arguments) the second. -- Kind regards, Stefano Borini

On Tue, Aug 25, 2020 at 4:41 PM Stefano Borini <stefano.borini@gmail.com> wrote:
But that assumes that all keyword axes can be mapped to positional axes. This is not the case in xarray, there are axes that are keyword-only. And it precludes other uses-cases that have been mentioned, such as parameters for indexing. So I understand that this was the original motivation, I don't think it is good to restrict to it only this use-case when there are other valid use-cases.
That is a problem that anyone using **kwargs in functions has to deal with. It is neither a new problem nor a unique one, and there are plenty of ways to deal with it.

On Fri, Aug 07, 2020 at 12:09:28PM -0400, Ricky Teachey wrote:
This is Python-Ideas. If we asked for a consensus on what the print function should do, I'm sure we would find at least one person seriously insist that it ought to erase your hard drive *wink* But seriously, getting consensus is difficult, especially when people seem to be unwilling or unable to articulate why they prefer one behaviour over another, or the advantages vs disadvantages of a proposal.
By the way, I assume you meant `__getitem__` in each of your examples, since `__get__` is part of the descriptor protocol. Advantages: (1) Existing positional only subscripting does not change (backwards compatible). (2) Easy to handle keyword arguments. (3) Those who want to bundle all their keywords into a single object can just define a single `**kw` parameter. (4) Probably requires little special handling in the interpreter? (5) Probably requires the minimum amount of implementation effort? (6) Requires no extra effort for developers who don't need or want keyword parameters in their subscript methods. Just do nothing. Disadvantages: none that I can see. (Assuming we agree that this is a useful feature.)
2. m.__get__(1, 2, a=3, b=4) # change positional argument handling from current behavior
Advantages: 1. Consistency with other methods and functions. Disadvantages: 1. Breaks backwards compatibility. 2. Will require a long and painful transition period during which time libraries will have to somehow support both calling conventions.
3. m.__get__((1, 2), {'a': 3, 'b': 4}) # handling of positional arguments unchanged from current behavior
I assume that if there are no keyword arguments given, only the first argument is passed to the method (as opposed to passing an empty dict). If not, the advantages listed below disappear. Advantages: (1) Existing positional only subscripting does not change (backwards compatible). (2) Requires no extra effort for developers who don't need or want keyword parameters in their subscript methods. Just do nothing. Disadvantages: (1) Forces people to do their own parsing of keyword arguments to local variables inside the method, instead of allowing the interpreter to do it. (2) Compounds the "Special case breaks the rules" of subscript methods to keyword arguments as well as positional arguments. (3) It's not really clear to me that anyone actually wants this, apart from just suggesting it as an option. What's the concrete use-case for this?
Use-case: you want to wrap an arbitrary number of positional arguments, plus an arbitrary set of keyword arguments, into a single hashable "key object", for some unstated reason, and be able to store that key object into a dict. Advantage (double-edged, possible): (1) Requires no change to the method signature to support keyword parameters (whether you want them or not, you will get them). Disadvantages: (1) If you don't want keyword parameters in your subscript methods, you can't just *do nothing* and have them be a TypeError, you have to explicitly check for a KeyObject argument and raise: def __getitem__(self, index): if isinstance(item, KeyObject): raise TypeError('MyClass index takes no keyword arguments') (2) Seems to be a completely artificial and useless use-case to me. If there is a concrete use-case for this, either I have missed it, (in which case my apologies) or Jonathan seems to be unwilling or unable to give it. But if you really wanted it, you could get it with this signature and a single line in the body: def __getitem__(self, *args, **kw): key = KeyObject(*args, **kw) (3) Forces those who want named keyword parameters to parse them from the KeyObject value themselves. Since named keyword parameters are surely going to be the most common use-case (just as they are for other functions), this makes the common case difficult and the rare and unusual case easy. (4) KeyObject doesn't exist. We would need a new builtin type to support this, as well as the new syntax. This increases the complexity and maintenance burden of this new feature. (5) Compounds the "kind of screwy" (Greg's words) nature of subscripting by extending it to keyword arguments as well as positional arguments. -- Steven

On Fri, Aug 7, 2020 at 10:47 PM Steven D'Aprano <steve@pearwood.info> wrote:
Yeah I get it. Thanks. I just noticed it didn't seem like anything close to a consensus was even sort of rising to the surface.
By the way, I assume you meant `__getitem__` in each of your examples, since `__get__` is part of the descriptor protocol.
Whoops thanks for the correction. And thanks for being less grumpy. And thank you also for going through the 4 options I gave. I accept and agree with you on all of them. But I had forgotten the fifth. The semantic meaning of m[1, 2, a=3, b=2] might be made to mean: 5. m.__getx__(1, 2, a=3, b=4) ...which would in turn call, by default: m.__getitem__((1, 2), a=3, b=4) I was a little bit more detailed about in my first message so I'll quote that:
When overriding `__getx__` et al, you would always call super() on the __getx__ method, never use super().__getitem__: class My: def __getx__(self, my_arg, *args, my_kwarg, **kwargs): # the way I have written things this super call will cause a recursion error v = super().__getx__(*args, **kwargs) return combine(my_arg, my_kwarg, v) Many of the advantages are shared with #1 as you recounted them: (1) Existing positional only subscripting does not have to change for any existing code (backwards compatible). (2) Easy to handle keyword arguments. (3) Those who want to bundle all their keywords into a single object can just define a single `**kw` parameter. (4) Probably requires little special handling in the interpreter? (5) Requires no extra effort for developers who don't need or want keyword parameters in their subscript methods. Just do nothing. (6). Consistency with other methods and functions for those that want to use the new dunders. Disadvantages: (1) Probably requires more implementation effort than #1 (2) Similar to #2 will also create a long transition period- but hopefully quite a bit less painful than just outright changing the signature of __getitem__ etc. However libraries do not have to support both calling conventions at all. They should be encouraged to start using the new one, but the old one will continue to work, perhaps perpetually. But maybe things would eventually get to the point that it could be eventually done away with. (3) Creates a new "kind of screwy" (Greg's words) situation that will at times need to be explained and understood. (4) Creates a sort of dual MRO for square bracket usage. You would end up with situations like this: class A: def __getitem__(self, key, **kwargs): print("A") class B(A): def __getitem__(self, key, **kwargs): print("B") super().__getitem__(key, **kwargs) class C(B): def __getx__(self, *key, **kwargs): print("C") super().__getx__(*key, **kwargs) class D(C): def __getx__(self, *key, **kwargs): print("D") super().__getx__(*key, **kwargs)
This code obviously looks just a little bit odd but the result is fine. However when different libraries and classes start getting mashed together over time, you might end up with a situation like this: class A: def __getitem__(self, key, **kwargs): print("A") class B(A): def __getx__(self, key, **kwargs): print("B") super().__getx__(*key, **kwargs) class C(B): def __getitem__(self, key, **kwargs): print("C") super().__getitem__(key, **kwargs) class D(C): def __getx__(self, *key, **kwargs): print("D") super().__getx__(*key, **kwargs)
Seems like this could be easily fixed with a class decorator, or perhaps the language could know to call the methods in the right order (which should be possible so long as the rule to never do a super().__getitem__ call inside of the new __getx__ method, and vice versa: don't super()__getx__ from a __getitem__ method_, is followed). Also Steven please ignore the messages I accidentally sent just to your email; apologies.

On Sat, 8 Aug 2020 at 05:12, Ricky Teachey <ricky@teachey.org> wrote:
I am currently in the process of scouting the whole set of threads and rewrite PEP-472, somehow. but just as a 2 cents to the discussion, the initial idea was focused on one thing only: give names to axes. When you have a getitem operation, you are acting on a set of axes. e.g. a[4,5,6] acts on three axes. The first axis index is 4, the second is 5 and the third is 6. These axes currently are anonymous, but the whole idea is that a name could be assigned to them. Which is kind of asymmetric with the whole args/kwargs structure of a function. In a function, your "axes" (which are your arguments) _always_ have a name. Not so in getitem operations: naming axes is optional. There's no such thing as optionally named function arguments. Given the asymmetry, and the need for backward compat, would it make a possible solution to have __getitem__() accept one additional argument "names" containing a tuple with the names? e.g. if you call a[1,2] __getitem__(index, names) will receive index=(1,2), names=(None, None) if you call a[foo=1, bar=2] __getitem__ will receive index=(1,2), names=("foo", "bar") if you call a[1, bar=2] __getitem__ will receive index=(1,2), names=(None, "bar") Now, I personally don't like this solution, especially because now passing names depend if it was declared in the signature to begin with, but I am just throwing also this idea in the mix. Apologies if it was already passed by someone else. My point is that the core of the issue is to name axes (with loose definition of what axes are. In the case of generic types, they are the degrees of freedom of the type). How these names are then handled (and recognised) _could_ be put in the hands of the user (in other words, python will say "here are the names, I have no idea if they mean something or not. it's your duty to find out, if you care about it") But I would like to raise another question. Is there another language out there that provides the same feature? I am not aware of any. -- Kind regards, Stefano Borini

On Sun, Aug 23, 2020 at 10:48 AM Stefano Borini <stefano.borini@gmail.com> wrote:
It's a way of handling it I for one haven't seen suggested yet. I don't think I like it either. I suppose a trouble with the basis of this is the assumption that most users will be using what's inside a [ ] to define axes. Maybe this is correct. But it could easily be that the most popular usage of providing these named indices will be more akin to named arguments. Indeed, nearly all of the discussion (that I've seen at least) has seemed to presuppose that adding the ability to provide these named indices is really going to be more akin to a function call. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On Sun, Aug 23, 2020 at 03:47:59PM +0100, Stefano Borini wrote:
That is one of the motivating use-cases, but "axis name" is not the only use-case for this. Right now I would give my left arm for the ability to write `matrix[row, column, base=1]` to support standard mathematical notation which starts indexes at 1 rather than 0.
Advantages over the other suggestions already made: none. Disadvantages: everything. (1) Fragile. You mentioned backwards compatibility, so presumably a single, non-comma separated argument won't receive a name argument at all: a[index] => a.__getitem__(index) rather than receiving a "names=None" or "names=()" argument. That will at least allow code that never uses a comma in the subscript to work unchanged, but it will break as soon as somebody includes a comma without a protective set of parentheses: a[(2,3,4)] => a.__getitem__((2, 3, 4)) a[2,3,4] => a.__getitem__((2, 3, 4), names=(None, None, None) So the first version is backwards compatible but the semantically identical second version breaks. (2) Not actually backwards compatible. Every `__getitem__` will need to have a new "names" parameter added or it will break even if no keywords are used. (See point 1 above.) (3) We already have two models for matching up arguments to formal parameters: there is the standard model that uses named parameters and matches arguments to the named parameter, used everywhere in functions and methods; and there is the slightly odd, but long established, convention for subscripts that collects comma-separated positional arguments into a tuple before matching them to a single named parameter. This proposal adds a third way to do it: split the parameter names into a separate positional argument. (4) This will require the method to do the work of matching up names with values: - eliminate values which are paired up with None (positional only); - match up remaining values with the provided names; - deal with duplicate names (raise a TypeError); - deal with missing names (raise a TypeError or provide a default); - deal with unexpected names (raise a TypeError). We get this for free with normal parameter passing, the interpreter handles it all for us. This proposal will require everyone to do it for themselves. I once had to emulate keyword-only arguments in Python 2, for a project that required identical signatures in both 2 and 3. So I wrote the methods to accept `**kwargs` and parsed it by hand. It really made me appreciate how much work the interpreter does for us, and how easy it makes argument passing. If I had to do this by hand every single time I wanted to support keyword arguments, I wouldn't support keyword arguments :-) (5) Being able to see the parameters in the method signature line is important. It is (partially) self-documenting, it shows default values, and allows type-hints. There are lots of introspection tools that operate on the signatures of functions and methods. We lose all of that with this.
Now, I personally don't like this solution
Then why suggest it? Can we just eliminate this from contention right now, without a six week debate about ways to tweak it so that it's ever so marginally less awful? You're the PEP author, you don't have to propose this as part of your PEP if you don't like it. Anyone who thinks they want this can write a competing PEP :-) [...]
But I would like to raise another question. Is there another language out there that provides the same feature? I am not aware of any.
It quite astonishes me that the majority of programming languages don't even supported named keyword arguments in function calls, so I would not be surprised if there are none that support named arguments to subscripting. -- Steven

Here is another way forward-- inspired by a conversation off-list with Jonathan Fine. I am calling it "signature dependent semantics". Right now, the semantic meaning of a __setitem__ function like this: # using ambiguous names for the parameters on purpose def __setitem__ (self, a, b): ... ...is currently as follows: *Note: below the === is not supposed to be code - I am using it as a way to state the semantics on the RHS of the signature on the LHS.* SIGNATURE === SEMANTICS (self, a, b) === (self, key_tuple, value) In the above, a on the left side, semantically, is the key tuple, and b in the value on the RHS. So right now, line 1 below calls line 2: [1]: d[1, 2] = foo [2]: d.__setitem__(key_tuple, value) And the call occurs this way: d.__setitem__((1,2), foo) So far, all of this is just a description of what currently happens. The signature dependent semantics proposal is NOT to change the semantic meaning of the above code in any way. These semantics would be maintained. Signature dependent semantics, as the name suggests, would change the semantic meaning of the __setitem__ signature in the case that more than two parameters are given in the signature. In other words, if a signature is provided like this, with 3 or more arguments: def: __setitem__(self, a, b, c): ... ...then in that case, the language would know there is a different semantic meaning intended. Right now, the above signature would raise a TypeError:
However, using signature dependent semantics, the language would know to call using semantics like this instead of raising a TypeError: d.__setitem__(foo, 1, 2) # value is first positional argument In other words, in the case that there are three or more arguments present in the __getitem__ signature, the semantics of the signature CHANGES: SIGNATURE === CURRENT SEMANTICS: where key_tuple on the RHS contains (a,b), and value on the RHS is just c (self, a, b, c) === (self, key_tuple, value, CURRENTLY_UNUSED_PARAMETER) SIGNATURE === NEW SEMANTICS: a is value, b is positional argument 1, c is positional argument 2 (self, a, b, c) === (self, value, pos1, pos2) I'm no cpython expert, but I think signature dependent semantics for __setitem__, as well as for __getitem__ and __delitem__, could be implemented. And that change alone could be made INDEPENDENT of supporting kwd args, or KeyObjects, or anything else. Furthermore, after signature dependent semantics were implemented (or perhaps at the same time), semantics could easily be extended so that code like this: d[1, b=2] = foo ...on a an object with a __setitem__ signature like this: def __setitem__(self, value, a, b): ... ...gets called like this: d.__setitem__(foo, 1, b=2) BUT, it could also be implemented such that if that same item setting code were made against a signature like this: def __setitem__(self, a, value): ... You would get a TypeError, because the semantic meaning of the signature with just two positional arguments, key_tuple and value, does not support kwd arguments. DOWNSIDE The biggest downside to this that I see is that it would be confusing to the uninitiated, especially in the following case. If the signature dependent semantics proposal were to go forward, and one wrote a signature with two arguments, the following semantic meaning could be mistakenly expected: # mistaken perception of __setitem__ method semantics SIGNATURE === MISTAKEN SEMANTICS (self, a, b) === (self, value, atomic_key) The mistaken belief above stated outright is that the RHS atomic_key is NOT a tuple, it is just b from the LHS. But this would not be correct. The actual semantics, in order to maintain backward compatibility, would not change from what is currently true: # backward compatible semantics SIGNATURE === CORRECT, CURRENT SEMANTICS (self, a, b) === (self, key_tuple, value) The correct understanding illustrated above is that the RHS key_tuple is a tuple (containing b from the LHS) of the form (b,). That seems like potentially a big downside. But maybe not? I don't know. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

I’m not at all sure this Idea is possible, But even if so, there’s a real trick here. The [] operator is not a function call, it is a special operator that “takes” a single expression. Thing[a, b] is not “getting” two objects, it is getting a single tuple, which is created by the comma. That is, expression: a,b Has the same value as: (a, b) Or tuple(a,b) Which means: t = tuple(a, b) thing[a, b] Is exactly the same as thing[a,b] And that equivalency needs to be maintained. In practice, two common use cases treat the resulting tuple essentially semantically differently: 1) Using tuples as dict keys i.e. a single value that happens to be a tuple is a common practice. 2) numpy uses the elements of a tuple as separate indices. I don’t think the interpreter would have any way to know which of these is intended. -CHB On Mon, Aug 24, 2020 at 10:12 AM Ricky Teachey <ricky@teachey.org> wrote:
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Mon, Aug 24, 2020 at 1:52 PM Christopher Barker <pythonchb@gmail.com> wrote:
The interpreter wouldn't. I'm talking about adding this knowledge of signature dependent semantics to `type`. To implement this, under the hood `type` would detect the signatures with different semantics, and choose to wrap the functions with those signatures in a closure based on the intended semantic meaning. Then everything proceeds as it does today. All of this is possible today, of course, using a metaclass, or using a regular class and the __init_subclass__ method, or using decorators. But my suggestion is to roll it into type. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

Here is an illustration of what I am talking about: sigdepsem: Signature Dependent Semantics <https://github.com/Ricyteach/sigdepsem> --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler On Mon, Aug 24, 2020 at 2:10 PM Ricky Teachey <ricky@teachey.org> wrote:

On Mon, Aug 24, 2020 at 11:10 AM Ricky Teachey <ricky@teachey.org> wrote:
but it would still not know that: t = (1,2,3) something[t] is the same as: something[1,2,3] would it?
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Tue, Aug 25, 2020, 2:09 AM Christopher Barker <pythonchb@gmail.com> wrote:
Definitely not. But I'm not sure that poses any kind of problem unless a person is trying to abuse the subscript operator to be some kind of alternative function call syntax, and they are wanting/expecting d[t] and d(t) to behave the same way. If I were to imagine myself explaining this to a beginner: 1. If you provide an item related dunder method that only accepts a single argument for the index or key, the index or key will be passed as a single object to that argument no matter how many "parameters" were passed to the subscript operator. This is because you are not passing positional parameters like a function call, you are passing an index. This is why, unlike a regular function call, you cannot unpack a list or tuple in an subscript operation. There would be no purpose. 2. Sometimes it is convenient to write item dunder methods that partially mimic a function call, breaking up an index into positional parameters in a similar way. When this is needed, just provide the desired number of positional arguments in the signature for it to be broken up. But this isn't intended to be a real function call. So there's no difference between these no matter how many arguments are in your dunder method signatures: d[1,2,3] d[(1,2,3)] But maybe I'm not a great teacher.

On Mon, Aug 24, 2020 at 01:10:26PM -0400, Ricky Teachey wrote:
That's not how Python works today. Individual values aren't packed into a tuple. mylist[5] = "value" calls `__setitem__(5, "value")`. The argument here is the int 5, not the tuple (5,). Presumably you wouldn't say that the semantics are: __setitem__(self, key_tuple, value_tuple) just because we have this: obj[1] = spam, eggs, cheese I think that the best (correct?) way of interpreting the subscript behaviour is that the subscript pseudo-operator `[...]` has a lower precedence than the comma pseudo-operator. So when the parser sees the subscript square brackets: obj[ ] it parses the contents of those brackets as an expression. What do commas do inside expressions? They make tuples. So "positional arguments" to a subscript are naturally bundled together into a tuple. It's not that the interpreter has a special rule for this, it's just a way the precedence works out. (I welcome correction if that is wrong.) In the same way that assignment has lower precedence than comma, so the parser automatically bundles the `spam, eggs, cheese` into a single value which just happens to be a tuple before doing the assignment. [...]
The only way it could tell that would be to inspect *at runtime* the `__setitem__` method. And it would have to do this on every subscript call. Introspection is likely to be slow, possibly very slow. This would make subscripting slow. And it would break backwards compatibility. Right now, it is legal for subscript methods to be written like this: def __setitem__(self, item, value, extra=something) and there is no way for subscripting to call the method with that extra argument provided. Only if the user intentionally calls the dunder method themselves can they provide that extra argument. Not only is that legal, but it's also useful. E.g. the class might call the dunder directly, and provide non-default extra arguments, or it might use parameters as static storage (see the random module for examples of methods that do that). But your proposal will break that code. Right now, I could even define my dunder method like this: def __setitem__(*args) and it will Just Work because there is nothing special at all about parameter passing, even self is handled as a standard argument. Your proposal will break that: obj[1, 2, 3] = None # current semantics will pass in # args = (obj, (1, 2, 3), None) # your semantics will presumably pass in # args = (obj, 1, 2, 3, None) So, we have a proposal for a change that nobody has requested, that adds no useful functionality and fixes no problems, that is backwards incompatible and will slow down every single subscripting operation. What's not to like about it? :-) Maybe we wouldn't have designed subscripting this way back in Python 1 if we know what we know now, but it works well enough, and we have heard from numpy developers like Stephan Hoyer that this is not a problem that needs fixing. Can we please stop trying to "fix" positional subscripts? Adding keywords to subscripts is a genuinely useful new feature that I personally am really hoping I can use in the future, and it is really frustrating to see the PEP being derailed time and time again. -- Steve

There's another option (but I am not endorsing it): a[1:2, 2, j=4:7, k=3] means: a.__getitem__((slice(1, 2, None), 2, named("j", slice(4, 7, None)), named("k", 3)})) Where named is an object kind like slice, and it evaluates to the pure value, but also has a .name like slice() has .start. In any case, I think that Ricky Teachey might be onto something. Imagine we start from zero. There's no __getitem__. How would you envision it to work? The answer is probably going to be "like a function". that is, if you pass a[1, 2] it will receive two arguments as if it were __new_getitem__(self, v1, v2), with only a couple of magic stuff for slices and Ellipsis. From this, everything would behave rather naturally: a[v1=1, v2=2] would be automatic. and so the flipped one a[v2=2, v1=1] would return the same value. Now, my question is, would it _really_ be a major issue to introduce this __new_getitem__ to be used if available, like we have done already in the past for similar enhanced protocols? Because to me, this solution seems the most rational, flexible, and clear compared to any other option, which in the end are mostly workarounds fraught with either manual work or odd syntax. It would solve the ambiguity of the API, it would allow for the parser to handle the assignment of named arguments to the correct arguments, with the only problem of potentially allowing a[] as valid syntax if __new_getitem__(self, *args, **kwargs) is used. On Sat, 8 Aug 2020 at 03:49, Steven D'Aprano <steve@pearwood.info> wrote:
-- Kind regards, Stefano Borini

On Tue, Aug 25, 2020 at 4:26 PM Stefano Borini <stefano.borini@gmail.com> wrote:
In any case, I think that Ricky Teachey might be onto something.
Well when it comes to python ideas, that actually might be a first for me. ;)
Actually I think this *may not* be true. Consider, if we were starting at zero ,and we agreed we wanted dict literal behavior to work like the following:
This is what we have today, and I think most agree this is all a Good Thing. Then, we might reason, if I want to be able to do a lookup using tuple literals, I'd do this, just as we do today:
No tuple brackets required. I think it is REALLY NICE not to have to do THIS, which is what we'd have to do if things behaved the function-call-way inside square brackets:
Of course all of these DO work today-- it just isn't required to use the tuple brackets. They're optional. Could it be that the convenience of not needing to give the tuple brackets, if we were starting at zero, might win out over the function-call syntax even today? I don't know. I wonder. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On Tue, 25 Aug 2020 at 22:42, Ricky Teachey <ricky@teachey.org> wrote:
If we were to start from scratch, dict.__getitem__ would probably accept *args, **kwargs, and use them to create the index. I would also say that very likely it would not accept kwargs at all. But you have a point that whatever the implementation might be, it has to play nice with the current dict() behavior. Yet, if we were to add an enhanced dunder, nothing for the current dict would change. It would still use the old getitem, and it would still create a tuple from its (nameless) index group to be used as a key. -- Kind regards, Stefano Borini

On 26/08/20 10:03 am, Stefano Borini wrote:
Despite arguing against it earlier, I think I'm starting to shift towards being in favour of a new dunder. But only if it can be justified as existing in its own right alongside __getitem__, without any intention to deprecate or remove the latter. I think an argument can be made that the new dunder -- let's call it __getindex__ for now -- shouldn't be considered part of the mapping protocol. It's a new thing for classes that want to use the indexing notation in ways that don't fit the simple idea of mapping a key object to a value object. The only drawback I can think of right now is that it would make possible objects for which x[1, 2] and x[(1, 2)] were not equivalent. I don't think that's a serious problem. A class would always be able to arrange for its __getindex__ method to *make* them equivalent, and things like pandas would be encouraged to do so. For things like generic types, which use the index notation for something that isn't really indexing at all, it probably doesn't matter much. -- Greg

On Tue, Aug 25, 2020, 7:35 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
To be backwards compatible, any proposed change has to pass this test: ############ """Test #1""" class D: def __getitem__(self, key): return key d = D() # note the commas assert d[1,] == f((1,)) ################ Where f is some function that mimicks how python "translates" the "stuff" inside the square brackets. It is something like this, not quite: def f(*args): if not args: # empty brackets not allowed raise SyntaxError() if len(args) ==1: return args[0] return args The desire of many (me included) seems to be to make it possible for the subscript operator to utilize syntax very similar, or identical, to the behavior of a function call. In other words, to make it possible to write some class, Q, that allows the following tests to pass: ################# """Test #2""" # note the commas assert q[1,] == f(1,) assert q[1] == f(1) assert q[(1,)] == f((1,)) ################## The problem is, I don't think it is possible to change python internals in a backwards compatible way so that we can write a function that passes all of these tests. Because the index operator recognizes this as a tuple: 1, But function calls do not. This is an impasse. Does this constitute a big problem for the idea of providing new dunders? I'm not sure.

On Wed, Aug 26, 2020 at 11:31:25AM +1200, Greg Ewing wrote:
Most existing uses of subscripts already don't fit that key:value mapping idea, starting with lists and tuples. Given `obj[spam]`, how does the interpreter know whether to call `__getitem__` or `__getindex__`? What if the class defines both?
The only drawback I can think of right now is that it would make possible objects for which x[1, 2] and x[(1, 2)] were not equivalent.
Right now, both sets of syntax mean the same thing and call the same method, so you are introducing a backwards incompatible change that will break code. Can we postpone this proposal for Python 6000 in 2050? -- Steve

On Tue, Aug 25, 2020 at 9:50 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Mon, Aug 24, 2020 at 01:10:26PM -0400, Ricky Teachey wrote:
Sorry when I wrote that my intention was that `a` would be a tuple. But you're right, a better version would be to say: SIGNATURE === SEMANTICS (self, a, b) === (self, key_tuple_or_atomic_key_value, value)
Makes sense to me.
Well it would not have to inspect the signature on every subscript call, but it would have to call a wrapped function like this on every call <https://github.com/Ricyteach/sigdepsem/blob/master/sigdepsem/main.py#L5-L9> : def _adapt_get(func): @wraps(func) def wrapped(self, key): return func(self, *key) return wrapped So you're right, it would result in the need to make a second function call *if and only if* the writer of the class chose to write their function with more than the "normal" number of arguments. In the case the only one argument is in the signature of __getitem__ and __delitem__, and two in __setitem__, and all with no default values, it would not need to be wrapped and everything remains exactly as it does today <https://github.com/Ricyteach/sigdepsem/blob/master/sigdepsem/main.py#L26-L36> : def _adapt_item_dunders(cls): for m_name, adapter, n_params in zip(("getitem", "setitem", "delitem"), (_adapt_get, _adapt_set, _adapt_del), (2, 3, 2), ): m_dunder = f"__{m_name}__" if method := getattr(cls, m_dunder, None): if all(p.default==_empty for p in signature(method).parameters.values()) \ and len(signature(method).parameters)==n_params: return setattr(cls, m_dunder, adapter(method))
I'll respond to this at the end.
People have actually been asking for ways to make subscripting operator act more like a function call, so that's not true. And it could be useful. And it does help address the problem of incongruity (though not perfectly) between the way a function call handles args and kwargs, and the way the subscript operator does it. And it is totally backwards compatible except for the case of what I'd call skirting very clear convention (more on that below). And it does not slow down any subscripting operation unless the class author chooses to use it.
I'm not done trying, sorry. I think the incongruity is a problem.
I'm sorry to derail it if that is what I am doing, truly I am. But at this point it has honestly started to feel likely to me that adding kwargs support to [ ] is going to happen. All I am intending to do is explore other, possibly better, ways of doing that than the quick and easy way. On Tue, Aug 25, 2020 at 10:03 PM Steven D'Aprano <steve@pearwood.info> wrote:
Above and in your previous response to me, I think you're overstating your case on this by a large amount. Remember: the signature dependent semantics proposal is to maintain backwards compatibility, 100%, for any code that has been following the very clear intent of the item dunder methods. The only time the tuples will get broken up is if the author of the class signals their intent for that to occur. Sure, it might break some code because somebody, somewhere, has a __getitem__ method written like this: def __getitem__(self, a, b=None, c=None): ... ...and they are counting on a [1,2,3] operation to call: obj.__getitem__((1,2,3)) ...and not: obj.__getitem__(1,2,3) But are you really saying there is a very important responsibility not to break that person's code? Come on. The intent of the item dunders is extremely clear. People writing code like the above are skirting convention and there really should not be much expectation, on their part, to be able to do that forever with nothing breaking. I certainly wouldn't.

On Tue, Aug 25, 2020 at 10:51:42PM -0400, Ricky Teachey wrote:
Python is a very dynamic language and objects can change their own methods and even their class at any time, so, yes, it will have to inspect the signature on every call. Otherwise you are changing the language execution model. Demonstration: py> class Dynamic: ... def __getitem__(self, arg): ... print("original") ... type(self).__getitem__ = lambda *args: print("replaced") ... py> py> x = Dynamic() py> x[0] original py> x[0] replaced [...]
People have actually been asking for ways to make subscripting operator act more like a function call, so that's not true.
Yes, people have asked for keyword arguments. This proposal doesn't get us any closer to the feature wanted.
And it could be useful.
It doesn't give subscripting any additional functionality that doesn't already exist. It's a purely cosmetic change.
So not actually totally backwards compatible. Please stop calling things "totally backwards compatible" if they aren't totally backwards compatible. If you change the behaviour of existing legal code, it's not backwards compatible.
A problem for whom? A problem in what way? PEP 472 goes into detail about the sorts of things people find really hard to do with subscripting because of the lack of keywords. Where is the discussion by people about the things that are really hard to do because of the comma handling in subscripts? [...]
The Python language doesn't follow the rule "Do what we intend you to do", it follows the rule "this is how the language works, do whatever the language allows". PEP 5 doesn't mention the word "intent" and doesn't say "its okay to break people's code if they are using Python in ways we don't like". You don't get to say that breaking legal code that works today is okay because it doesn't follow "the intent". It's still a breaking change and you have to convince the Steering Council that breaking other people's code for the sake of fixing an incongruity is worthwhile. Imagine it was code *you* relied on that broke, and you no longer had the source code for it, it was a library compiled away in a .pyc file and the company that distributed it has gone broke and the source code lost, and replacing that library is going to cost your business a year of time and a hundred thousand dollars in development costs. But that's okay, because an incongruity that makes no difference to anyone's code has been fixed. Now maybe you can convince the Steering Council that this is a clear win for the broader community. We do break backwards compatibility, sometimes, if the risks are small enough and the benefits large enough. But be honest about what you are doing.
Yes. -- Steve

On Wed, Aug 26, 2020 at 9:48 AM Steven D'Aprano <steve@pearwood.info> wrote:
I see what you're saying there. So any solution involving `type` of `object` wrapping a child class method at class creation time should probably not be considered as a cpython implementation option because of the dynamic nature of classes. Understood.
Alright, you've made progress with me on this one. I'm still not totally convinced such code isn't asking to be broken, but you're right: just dismissing that as a nonissue isn't correct. It has to be weighed.
Well I will point to Greg Ewin's message from a while ago that I quoted at the start of this thread: On Tue, Aug 4, 2020, 2:57 AM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote: On 4/08/20 1:16 pm, Steven D'Aprano wrote:
These methods are already kind of screwy in that they don't handle *positional* arguments in the usual way -- packing them into a tuple instead of passing them as individual arguments. I think this is messing up everyone's intuition on how indexing should be extended to incorporate keyword args, or even whether this should be done at all. -- Greg If we simply add kwarg passing to item dunders, the incongruity between the subscripting operator and regular function calls will really hinder people's ability to understand the two. I think most people in these conversations seem to now agree that this isn't a good enough reason to exercise the do nothing option, but I think it is a very good reason to explore other ways of doing it to alleviate that incongruity. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On 26/08/20 1:59 pm, Steven D'Aprano wrote:
Most existing uses of subscripts already don't fit that key:value mapping idea, starting with lists and tuples.
Not sure what you mean by that.
Given `obj[spam]`, how does the interpreter know whether to call `__getitem__` or `__getindex__`? What if the class defines both?
If it has a __getindex__ method, it calls that using normal function parameter passing rules. Otherwise it uses a fallback something like def __getindex__(self, *args, **kwds): if kwds: raise TypeError("Object does not support keyword indexes") if not args: raise TypeError("Object does not accept empty indexes") if len(args) == 1: args = args[0] return self.__getitem__(args)
No existing object will have a __getindex__ method[1], so it won't change any existing behaviour. [1] Or at least not one that we aren't entitled to break. -- Greg

On Wed, Aug 26, 2020 at 03:06:25PM +1200, Greg Ewing wrote:
Lists and tuples aren't key:value mappings.
Presumably the below method is provided by `object`. If not, what provides it?
Empty subscripts will remain a syntax error, so I don't think this case is possible. What is your reasoning behind prohibiting keywords, when we're right in the middle of a discussion over PEP 474 which aims to allow keywords?
This is going to slow down the most common cases of subscripting: the interpreter has to follow the entire MRO to find `__getindex__` in object, which then dispatches to the `__getitem__` method. It would make more sense to put the logic in the byte-code, rather than `object`: if the object supports __getindex__ call it otherwise call __getitem__ which is closer to how other operators work, and avoids giving object a spurious dunder that it doesn't use. However it still has to check the MRO, so I don't know if that actually gains us much performance. The fundamental issue with this proposal is that, as far as I can see, it solves no problems and offers no new functionality. It just adds complexity. As Stephan Hoyer has said, the existing handling of commas in subscripts is not a problem for Numpy. I doubt it is a problem for Pandas either. Is it a problem for anyone?
In your earlier statement, you said that it would be possible for subscripting to mean something different depending on whether the comma-separated subscripts had parentheses around them or not: obj[(2, 3)] obj[2, 3] How does that happen? I thought I understood what you meant by that, but I obviously didn't. Can you explain how they would be different? I understood that you meant that the intepreter would choose which dunder to call according to whether or not there are parentheses around the items, but now I'm not sure what you meant. -- Steve

On 27/08/20 12:53 am, Steven D'Aprano wrote:
It's not literally a method, I just wrote it like that to illustrate the semantics. It would be done by the interpreter as part of the process of translating indexing operations into dunder calls.
What is your reasoning behind prohibiting keywords, when we're right in the middle of a discussion over PEP 474 which aims to allow keywords?
We're falling back to __getitem__ here, which doesn't currently allow keywords, and would stay that way. The point of this proposal is to not change __getitem__. If you want to get keywords, you provide __getindex__.
No, it would be done by checking type slots, no MRO search involved.
If the object has a __getindex__ method, it gets whatever is between the [] the same way as a normal function call, so comma-separated expressions become separate positional arguments. -- Greg

On Wed, Aug 26, 2020 at 11:30 AM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Can you elaborate on this for my understanding?
I am interested in this proposal, but I am pretty sure that last goal isn't entirely possible, it is only mostly possible. I believe that the way python handles the comma-separated expression inside the [ ] operator is just due to operator precedence, as Steve pointed out previously. So without a major change we cannot write code where this test suite fully passes (three part test): # TEST SUITE 1 assert q[1,] == f(1,) # part 1 assert q[1] == f(1) # part 2 assert q[1,] == f((1,)) # part 3 The only way around this would be for the function, f, to "know" about the presence of hanging commas. Similarly, without such "runtime comma detection" we cannot make it possible for this test suite to fully pass: # TEST SUITE 2 assert q[1,2] == f(1,2) # part 1 assert q[(1,2)] == f((1,2)) # part 2 assert q[(1,2),] == f((1,2),) # part 3 Can we? --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On 27/08/20 3:51 am, Ricky Teachey wrote:
For frequently-used special methods such as __getitem__, the type object contains direct pointers to C functions (so-called "type slots"). These are set up when the type object is created (and updated if the MRO is modified). For classes written in Python, they are filled with wrappers that invoke the user's Python code. So the implementation of the bytecodes that perform indexing would first look for a value in the slot for __getindex__; if it finds one, it calls it like a normal function. Otherwise it does the same as now with the slot for __getitem__. The overhead for this would be just one extra C pointer check, which you would be hard-pressed to measure.
We could probably cope with that by generating different bytecode when there is a single argument with a trailing comma, so that a runtime decision can be made as to whether to tupleify it. However, I'm not sure whether it's necessary to go that far. The important thing isn't to make the indexing syntax exactly match function call syntax, it's to pass multiple indexes as positional arguments to __getindex__. So I'd be fine with having to write a[(1,)] to get a one-element tuple in both the old and new cases. It might actually be better that way, because having trailing commas mean different things depending on the type of object being indexed could be quite confusing. -- Greg

On Wed, Aug 26, 2020 at 9:00 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Wow that's great news.
I didn't know that was possible. It's far beyond my knowledge. The idea of a new dunder is starting to sound like a much more realistic possibility than I had hoped. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

A bunch of the conversation here is how to handle both positional and keyword arguments with a single signature. Let me suggest an alternative. At compile time, we know if the call is made with keyword arguments or not. a[1] positional only a[b=1] keyword only a[1, b=1] both a[**kwargs] keyword only I suggest that in the first case it calls __getitem__(self, args) as it does now and in the other cases it calls __getitemx__(self, args, **kwargs) instead (the signature is what matters here, don't bikeshed the name). Some people have proposed arbitrary signatures for __getitemx__ but I think that's an unnecessary degree of complexity for little benefit. Personally, I'm not even sure I'd ever want to mix args and kwargs but including that in the signature preserves that possibility for others that find it useful. To explain further: when I use [...] without positional arguments the code works exactly as it does today with all args in a tuple. Therefore, for consistency when I add a keyword argument that should not change the args value which is why both signatures include a single args parameter. If you write a form that uses a keyword argument, and __getitemx__ does not exist then it would raise an error other than KeyError (either TypeError or AttributeError, with the latter requiring no special handling). --- Bruce

On Thu, Aug 27, 2020 at 03:28:07AM +1200, Greg Ewing wrote:
Okay, so similar to my suggestion that this would be better implemented in the byte-code rather than as a method of object.
Point of order: **the getitem dunder** already allows keywords, and always has, and always will. It's just a method. It's the **subscript (pseudo-)operator** which doesn't support keywords. This is a syntax limitation, not a limitation of the dunder method. If the interpreter supports the syntax, it's neither here nor there to the interpreter whether it calls `__getitem__` or `__getindex__` or `__my_hovercraft_is_full_of_eels__` for that matter. So if you want to accept keywords, you just add keywords to your existing dunder method. If you don't want them, don't add them. We don't need a new dunder just for the sake of keywords.
Okay, I didn't think of type slots. But type slots are expensive in other ways. Every new type slot increases the size of objects, and I've seen proposals for new dunders knocked back for that reason, so presumably the people care about the C-level care about the increase in memory and complexity from adding new type slots. Looking here: https://docs.python.org/3/c-api/typeobj.html I see that `__setitem__` and `__delitem__` are handled by the same type slot, so presumably the triplet get-, set- and del- index type slots would be the same. Still, that means adding two new type slots to both sequence and mapping objects. (I assume that it's not *all* objects that carry these slots. If it is all objects, that makes the cost of this proposal correspondingly higher.) So if I understand it correctly, we have some choices when it comes to sequence/mapping types: 1. the existing `__*item__` methods keep their slots, and new `__*index__` slots are created, which makes both the item and index dunders fast but increases the size of every object which uses any of those methods; 2. the existing item slots stay as they are, there are no new index slots, which keeps objects the same size but the new index protocol will be slow; 3. the existing item slots are repurposed for index, which keeps objects the same size, and the new protocol fast, but makes calling item dunders slow; 4. and just for completion because of course this is not going to happen, we could remove the existing item slots and not add index slots so that both protocols are equally slow; 5. alternatively, we could leave the existing C-level sequence and mapping objects alone, and create *four* brand new C-level objects: - a sequence object that supports only the new index protocol; - a sequence object that supports both index and item protocols; - and likewise two new mapping objects. Do I understand this correctly? Have I missed any options? Assuming I do, 4 is never going to happen and each of the others have some fairly large disadvantages and costs in speed, memory, and complexity. Without a correspondingly large advantage to this new `__*index__` protocol, I don't see this going anywhere.
The compiler doesn't know whether the object has the `__getindex__` method at compile time, so any process that relies on that knowledge isn't going to work. There can only be one set of parsing rules that applies regardless of whether the object defines the item dunders or the index dunders or neither. Right now, if you call `obj[1,]` the dunder receives the tuple (1,) as index. If it were treated as function call syntax, that would receive a single argument 1 instead. It were treated as a tuple, as required by backwards compatibility, that's an inconsistency between subscripts and function calls, and the whole point of your proposal is to remove that inconsistency. Rock (you are here) Hard Place. Do you break existing code, or fail in your effort to remove the inconsistencies? I don't care two hoots about the inconsistencies, I just want to use keywords in my subscripts, so for me the answer is obvious: keep backwards compatibility, and there is no need to add new dunders to only partially fix something which isn't a problem. Another inconsistency: function call syntax looks like this: call ::= primary "(" [argument_list [","] | comprehension] ")" which means we can write generator comprehensions inside function calls without additional parentheses: func(expr for x in items) # unambiguously a generator comprehension This is nice because the round brackets of the function call match the round brackets used in generator comprehensions, so it is perfctly consistent and unambiguous. But if you do that in a subscript, we currently get a syntax error. If we allowed it, it would be pretty weird for the square brackets of the subscript to create a *generator* comprehension rather than a list comprehension. But we surely don't want a list comprehension by default: obj[(expr for x in items)] # unambiguously a generator comprehension obj[[expr for x in items]] # unambiguously a list comprehension obj[expr for x in items] # and this is... what? It looks like it should be a list comprehension (it has square brackets, right?) but we probably don't want it to be a list comp, we'd prefer it to be a generator comp because they are more flexible. Only that would look weird and would lead to all sorts of questions about why list comprehension syntax sometimes gives a list and sometimes a generator. But if we leave it out, we have an inconsistency between subscripting and function calls, and for those who are motivated by removing that inconsistency, that's a Bad Thing. For me, again, the answer is obvious: we don't have to support this for the sake of consistency, because consistency isn't the motivation. I just want keywords. -- Steve

On 27/08/20 3:56 pm, Steven D'Aprano wrote:
Yes, I could have worded that better. What I meant was that no existing __getitem__ method expects to get keywords given to it via indexing notation, and under my proposal that wouldn't change.
Nobody disputes that it *could* be made to work that way. But I'm not convinced that it's the *best* way for it to work. The killer argument in my mind is what you would have to do to make an object where all of the following are equivalent: a[17, 42] a[time = 17, money = 42] a[money = 42, time = 17] With a fresh new dunder, it's dead simple: def __getindex__(self, time, money): ... With a __getitem__ that's been enhanced to take keyword args, but still get positional args packed into a tuple, it's nowhere near as easy.
It doesn't seem like a serious problem to me. Type objects are typically created once at program startup, and they're already quite big. If it's really a concern, the new slots could be put in a substructure, so the type object would only be bigger by one pointer if they weren't being used. Another possibility would be to have them share the slots for the existing methods, with flags indicating which variant is being used for each one. A given type is only going to need either __getindex__ or __getitem__, etc., not both.
I don't follow this one. How can there be both old and new protocols without adding new dunder methods?
See my earlier post about that.
As I said earlier, I don't mind if the contents of the square brackets don't behave exactly like a function argument list. I expect the use cases for passing a generator expression as an index to be sufficiently rare that I won't mind having to put parens around it.
obj[expr for x in items] # and this is... what?
I'm fine with it being a syntax error. -- Greg

On Thu, Aug 27, 2020 at 11:13:38PM +1200, Greg Ewing wrote:
I don't think that something like that is exactly the intended use-case for the feature, at least not intended by *me*, that looks more like it should be a function API. But I'm sure people will want to do it, so we should allow it even if it is an (ever-so-slight) abuse of subscript notation. (But I note that one of the PEP co-authors, Joseph Martinot-Lagarde, has now suggested that we bypass this question by simply requiring an index, always.)
Okay, that's an advantage if you're writing this sort of code. I'm not sure its a big enough advantage to require such changes, new dunders, new byte codes, etc, but YMMV. [...]
*shrug* I don't know enough about type slots, except I am confident that I have seen dunders rejected because it would require adding new type slots. But maybe I have the details wrong and am thinking of something else.
In which case, why do we need new names for the methods? Effectively they're the same method, just with different behaviour and a different name, but since you can only use one even if you define both, why would you define both? I would expect it is less confusing and error prone to just say "Starting in version 3.10 subscript parsing changes and item dunders should change their signature ..." than to have two sets of dunders, especially when the names are so similar that invariably people will typo the names: class X: def __getindex__(self, ...): # later def __setitem__(self, ...): # oops I'd rather a conditional: class X: if sys.version_info >= (3, 10): # New signatures def __getitem__(self, ...): else: # Old signatures Or conditionally inherit the dunders from a mixin: if sys.version_info >= (3, 10): from this_is_the_future import item_mixin: else: from legacy import item_mixin class X(item_mixin): pass Having to remember which name goes with which version is going to be annoying and tricky, especially since for many simple cases like lists and dicts the getitem and getindex versions will be identical. [...]
I didn't say anything about inventing new protocols without new dunders. All five of my alternatives were under the assumption that the new protocol and new dunders was going ahead. But as I mention above, if we did introduce this, I'd rather we keep the dunders and change the name rather than introduce confusable names. -- Steve

On 30/08/20 3:06 pm, Steven D'Aprano wrote:
I don't see why we need to pick one use case to bless as the official "true" use case. If keyword indexing becomes a thing, this is something that people will naturally *expect* to be able to do with it, and it will be confusing and frustrating if it can't be done, or is unnecessarily difficult to do. Plus we have at least one real-world use case, i.e. xarray, where it would be a very obvious thing to do.
How is it an abuse? To me it seems less of an abuse than other uses where the keyword args *don't* represent index values.
It's a cost-benefit tradeoff. I'm just pointing out that what we're discussing can be implemented efficiently, if the benefits are judged worth the effort of doing so.
I'm not really serious about that suggestion, I was just saying that it *could* be done if we were really paranoid about making the type object slightly bigger. I would consider that level of paranoia to be excessive. As for why we need new names, it's for backwards compatibility with code that expects to be able to call an object's __xxxitem__ methods directly, e.g. for delegation. It's not that an object can't or won't *have* both __getitem__ and __getindex__, it's that we don't strictly need to *optimise* access to both of them. Dunders can exist without having type slots, they're just slower to invoke then. As part of this, if a class only defines one of __getitem__ and __getindex__, the code that sets up the type object would have to provide a wrapper for the other one that delegates to the defined one.
Are you volunteering to rewrite all existing Python code that defines item dunders? :-)
Having to remember which name goes with which version is going to be annoying and tricky,
I'm not wedded to the name __getindex__. It can be called __newgetitem__ or __getitemwithfancyparemeterpassing__ if people thought that would be less confusing. -- Greg

On Sun, Aug 30, 2020 at 05:49:50PM +1200, Greg Ewing wrote:
We do that for every other operator and operator-like function. Operator overloading is a thing, so if you want `a + b` to mean looking up a database for `a` and writing it to file `b`, you can. But the blessed use-cases for the `+` operator are numeric addition and sequence concatenation. Anything else is an abuse of the operator: permitted, possibly a *mild* abuse, but at best it's a bit wiffy. (There are a few gray areas, such as array/matrix/vector addition, which sneak into the "acceptable" area by virtue of their long usage in mathematics.) If you don't like the term "abuse", I should say I am heavily influenced by its use in mathematics where "abuse of notation" is not considered a bad thing: https://en.wikipedia.org/wiki/Abuse_of_notation In any case, as you say (and agreeing with me) people will want to do something like this, and it is not my intent to stop them.
Plus we have at least one real-world use case, i.e. xarray, where it would be a very obvious thing to do.
Indeed, and I can think of at least two use-cases where I would immediately use it myself, if I could.
Compares to the majority of realistic examples given in this discussion and the PEP, your example doesn't look to me like an item or key lookup. It looks more like a function call of some sort. I know that, in a sense, subscript lookups are like function calls, and I also acknowledge that I don't know your intended semantics of that example. It might even be that you intended it as a lookup of some kind of record with time=17 and money=42. But let's not get too bogged down into not-picking. We agree that people should be allowed to do this. [...]
Of course not. This is why we ought to be cautious about introducing backwards-incompatible changes. I don't think that we should care about this "problem" that subscripting only takes a single subscript. But if we do "fix" it, I'd rather the straight-forward future directive (and a long, long deprecation period for the old behaviour) than two sets of confusingly similar getters and setters.
The trouble with naming anything "new" is that in ten years time it's not new any more, and you could end up needing a "new new" method too. -- Steve

On 30/08/20 3:06 pm, Steven D'Aprano wrote:
But yes, it's just another example. And when I teach it I always say "this is a funny but intuitive overloading." -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

very related to this conversation -- isn't the use of [] for indexing and its use in Type hinting functionally very different as well? -CHB On Tue, Sep 1, 2020 at 9:55 AM David Mertz <mertz@gnosis.cx> wrote:
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

A digression on abuse of operators, notation, and "abuse of notation". Steven D'Aprano writes:
On Sun, Aug 30, 2020 at 05:49:50PM +1200, Greg Ewing wrote:
Of course the example is abusive. But not being "blessed" isn't the reason. (And shouldn't the notation be "b += a" for that operation? ;-)
Anything else is an abuse of the operator: permitted, possibly a *mild* abuse, but at best it's a bit wiffy.
I don't agree. The reasons for picking a "blessed" use case for operators do not include "to exclude other uses". Rather, they are to help the language designer decide things like precedence and associativity, to give developers some guidance in choosing an operator to overload for a two-argument function, and to provide readers with a familiar analog for the behavior of an operator. For example, sequence concatenation is not "blessed because it's blessed". It's blessed because it just works. What does "5 * 'a' + 'b'" mean to the reader? Is the result 'aaaaab', or 'ababababab'? The reader knows it's the former, and that is the right answer for the developer because 'ababababab' == 5 * 'ab' -- there's another way to construct that without parenthesis. Why is 5 * 'a' == 'aaaaa'? Because 5 * 'a' == 'a' + 'a' + 'a' + 'a' + 'a'. None of this is an accident. There's nothing wiffy about it. Recall that some people don't like the use of '+' for sequence concatenation because the convention in abstract algebra is that '+' denotes an commutative associative binary operation, while '*' need not be commutative. But I think you don't want to use '*' for sequence concatenation in Python because 5 * 'a' * 'b' is not associative: it matters which '*' you execute first. Where things get wiffy is when you have conflicting requirements, such as if for some reason you want '*' to bind more tightly than '+' which in turn should bind at least as tightly as '/'. You can swap the semantics of '+' and '/' but that may not "look right" for the types involved. You may want operator notation badly for concise expression of long programs, but it will always be wiffy in this circumstance. And use of "a + b" to denote "looking up a database for `a` and writing it to file `b`" is abusive, but not of notation. It's an abuse because it doesn't help the reader remember the semantics, and I don't think that operation is anywhere near as composable as arithmetic expressions are.
I think operator overloading and abuse of notation are quite different things, though. Using '*' to denote matrix multiplication or "vertical composition of morphisms in a category" is not an abuse of notation. It's overloading, using the same symbol for a different operation in a different domain. Something like numpy broadcasting *is* an abuse of notation, at least until you get used to it: you have nonconforming matrices, so the operation "should" be undefined (in the usual treatment of matrices). If you take "undefined" *literally*, however, there's nothing to stop you from *defining* the operation for a broader class of environments, and it's not an abuse to do so. So if you have a class Vector and a function mean(x: Vector) -> Vector, v = Vector(1, 2, 3) m = mean(v) # I Feel Like a Number, but I am not variance = (v - m) * (v - m) / len(v) "looks like" an abuse of notation to the high school student targeted by the "let's have a Matrix class in stdlib" thread. But as Numpy arrays show, that's implementable, it's well-defined, and once you get used to it, the meaning is clear, even in the general case. You could do the same computation with lists v = [1, 2, 3] m = sum(v)/len(v) variance = (v - len(v) * [m]) * (v - len(v) * [m]) / len(v) but I don't think that's anywhere near as nice. To call definitions extending the domain of an operator "abuse of notation" is itself an abuse of language, because abuse of notation *can't happen* in programming! Abuse of notation is synecdoche (use of a part to denote the whole) or metonymy (use of a thing to represent something related but different). But do either in programming, and you get an incorrect program. No matter how much you yell at Guido, the Python translator is never going to "get" synecdoche or metonymy -- it's just going to do what you don't want it to do.

On Wed, Sep 2, 2020 at 3:48 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Possibly a better example would be path division in Python, or stream left/right shift in C++. What does it mean to divide a path by a string? What does it mean to left-shift std::cout by 5? Those uses don't make a lot of sense based on the operations involved, but people accept them because the symbols look right:
pathlib.Path("/") / "foo" / "bar" / "quux" PosixPath('/foo/bar/quux')
Is that abuse of notation or something else? Whatever it is, it's not "operator overloading" in its normal sense; division is normally the inverse of multiplication, but there's no way you can multiply that by "quux" to undo that last operation, and nobody would expect so. Maybe we need a different term for this kind of overloading, where we're not even TRYING to follow the normal semantics for that operation, but are just doing something because it "feels right". ChrisA

Chris Angelico writes:
Possibly a better example would be path division in Python,
I would have chosen '+' for that operation for the same reason '+' "just works" for sequences. It's a very specialized use case but 5 * pathlib.Path('..') / "foo" makes sense as an operation but oh, the cognitive dissonance! It's so specialized that <0.99 wink>. '/' is fine for pathlib, and I trust Antoine's intuition that it will "work" for (rather than against ;-) pathlib users.
or stream left/right shift in C++.
I'm very sympathetic to your argument for this one.
Is that abuse of notation or something else?
As I wrote earlier, I don't think "abuse of notation" is a useful analogy here.
But if you spell it '+', pathlib.Path("/") + "foo" + "bar" + "quux" - "bar" makes some sense, modulo the definition of the case with multiple "bar". I don't argue it's useful, just interpretable without head explosion.
"Overloading" for this case doesn't bother me, but if you want to introduce a new term, Greg Ewing's "operator repurposing" WFM. Steve

On 2/09/20 2:24 am, Steven D'Aprano wrote:
It was a reference to earlier discussions of pandas and xarray, where there is a notion of axes having names, and a desire to be able to specify index values by name, for the same reasons that we sometimes want to specify function arguments by name. It's true that you can't tell just by looking at an indexing expression what the semantics of the keywords are, but the same is true of function calls. We rely on contextual knowledge about what the function does in order to interpret the keywords. Likewise here. If you know from context that a is an array-like object with axes named 'time' and 'money', then I think the meaning of the indexing expression will be quite clear. I also think that it's something people will naturally expect to be able to use index keywords for -- to the point that if they can't, their reaction will be "Why the flipping heck not?" Which is why I was somewhat perplexed when it was suggested that we should discount this use case purely because it wasn't cited in the original proposal (which turned out to be wrong anyway). -- Greg

On Tue, Sep 1, 2020 at 4:57 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I agree it's a fine use case. Using the currently prevailing proposal (which I steadfastly will refer to as "Steven's proposal") it's quite possible to implement this. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On Tue, Sep 1, 2020, 8:35 PM Guido van Rossum <guido@python.org> wrote:
-- --Guido van Rossum (python.org/~guido)
Can someone tell me: what will be the canonical way to allow both named or unnamed arguments under Steven's proposal? With function call syntax, it is trivial: def f(time, money): ... The interpreter handles it all for us:
But what is the way when we can't fully benefit from python function signature syntax? Here's my attempt, it is probably lacking. I'm five years into a self-taught venture in python... If I can't get this right the first time, it worries me a little. MISSING=object() def __getitem__(self, key=MISSING, time=MISSING, money=MISSING): if time is MISSING or money is MISSING: if time is not MISSING and key is not MISSING: money = key elif money is not MISSING and key is not MISSING: time = key else: time, money = key Is this right? Wrong? The hard way? The slow way? What about when there are three arguments? What then?

On Tue, Sep 1, 2020, 9:06 PM Ricky Teachey <ricky@teachey.org> wrote:
Is this right? Wrong? The hard way? The slow way?
Actually just realized I definitely did get it wrong: def __getitem__(self, key=MISSING, time=MISSING, money=MISSING): if time is MISSING or money is MISSING: if time is not MISSING and key is not MISSING: money, key = key, MISSING elif money is not MISSING and key is not MISSING: time, key = key, MISSING elif time is MISSING and money is MISSING: (time, money), key = key, MISSING if key is not MISSING: raise TypeError() Still not sure if that is right.

On Tue, Sep 1, 2020, 9:20 PM Ricky Teachey <ricky@teachey.org> wrote:
Actually I suppose this is the best way to do it: def _real_signature(time, money): ... class C: def __getitem__(self, key, **kwargs): try: _real_signature(*key, **kwargs) except TypeError: try: _real_signature(key, **kwargs) except TypeError: raise TypeError() from None Sorry for all the replies but I'm honestly very unsure how do this correctly under Steven's proposal.

I may get a chance to look carefully at this in a bit, but for now: On Tue, Sep 1, 2020 at 6:38 PM Ricky Teachey <ricky@teachey.org> wrote:
Sorry for all the replies but I'm honestly very unsure how do this correctly under Steven's proposal.
That's OK. It's going to be tricky, and keeping backward compatibility makes that necessary. And, in fact, the current sytax is pretty tricky as well. Say you want to make a class that takes multiple indices -- like a Mtraix, for instance. your __getitem__ looks like: def __getitem__(self, index): as they all do. But what might index be? It could be one of: * an object with an __index__ method -- so a single index, simple * a slice object * a tuple, where each item in the tuple could be both of the above. Or, of course, it could be invalid, and you need to raise an Exception with a meaningful message. So there's a lot of logic there and we haven't even started in the keywords :-) But that's all OK, as the number of people that need to write these methods is small, and the number of people that can use the feature is large. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Wed, 2 Sep 2020 at 05:20, Christopher Barker <pythonchb@gmail.com> wrote:
And this is the reason why personally I would prefer to add a new dunder with more defined semantics. This poor sod of a method has already a lot to handle, exactly because it's behaving nothing like a function. As a programmer developing that method, you are essentially doing the job of the interpreter.
But that's all OK, as the number of people that need to write these methods is small, and the number of people that can use the feature is large.
True but there's a point where one should question if the direction getitem has taken is the right one. -- Kind regards, Stefano Borini

On Wed, Sep 2, 2020, 12:19 AM Christopher Barker <pythonchb@gmail.com> wrote:
I really appreciate that thanks. Good to know this is a hard thing it's ok to struggle with. But that's all OK, as the number of people that need to write these methods
is small, and the number of people that can use the feature is large.
-CHB
A point I haven't seen stated elsewhere yet so I'll state it: shouldn't we expect the number of people wanting to write them to go up quite a bit once the subscript syntax becomes more flexible/capable/expressive with the addition of kwargs, and it becomes pretty obvious that function-like subscript calls are now possible... Perhaps they might even appear by many to be encouraged.

On Tue, Sep 1, 2020, at 21:06, Ricky Teachey wrote:
I'm using () rather than a MISSING as the sentinel here because it makes the code simpler. For the same reason, if we need to choose a key to pass in by default for a[**kwargs only] without defining a () in __getitem__, I have consistently advocated using () for this. Likewise, *not* having a default [or requiring the method to define its own default] key presents a slight problem for __setitem__, illustrated below. def __getitem__(self, key=(), /, **kwargs): return self._getitem(*(key if isinstance(key, tuple) else (key,)), **kwargs) def __setitem__(self, key=(), value=???, /, **kwargs): return self._setitem(value, *(key if isinstance(key, tuple) else (key,)), **kwargs) def _getitem(self, time, money): ... def _setitem(self, value, time, money): ... [delitem the same as getitem] Basically, you can easily write code that leverages the existing function argument parsing, simply by performing a function call.

On 2020-09-01 07:24, Steven D'Aprano wrote:
I don't agree with that at all. I see the operators as abstract symbols indicating vague areas of semantic space. Any operation that involves "combining" of some sort is a fair and non-wiffy use of `+`. Any operation that involves "removal" of some sort is a fair and non-wiffy use of `-`. And so on. For instance, some parsing libraries use `+` or `>>` to indicate sequences of syntactic constructs, and that's great. (I actually think that this flexibility should be more clearly expressed in the Python docs, but that's a separate "idea" that I'll have to post about another time.) For me the equally important (or more important) question is not what an individual operator in isolation means, but how a given type coherently organizes the operators. In other words, it's not enough to have `+` defined to mean something sensible for a given type; the type must assign its operations to operators in such a way that you don't get confused about which ones is `+` and which one is `*` and which one is `>>` and so on. Similarly, even if a particular use of `+` seems odd conceptually on its own, it can still be excellent if it fits in with the overall scheme of how operators work on that type. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Sun, 30 Aug 2020 at 04:09, Steven D'Aprano <steve@pearwood.info> wrote:
The above feature is well intended and part of the original PEP, mostly for two reasons: 1. During my career I encountered issues where people confused the axis of a matrix. This is especially problematic when the matrix is symmetric. 2. pandas has this exact problem with column names, and while they have a workaround, in my opinion it is suboptimal
But as I mention above, if we did introduce this, I'd rather we keep the dunders and change the name rather than introduce confusable names.
that is something it's hard to fight against. unless we call it something like _extended_getitem_? getItemEx was proposed by GvR for the C API... it seems to be a frequent convention.

On Sun, Aug 30, 2020 at 6:19 PM Stefano Borini <stefano.borini@gmail.com> wrote:
2. pandas has this exact problem with column names, and while they have a workaround, in my opinion it is suboptimal
What is the workaround? <http://python.org/psf/codeofconduct/> --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

On Tue, Aug 25, 2020 at 09:23:18PM +0100, Stefano Borini wrote:
This is not another option, it's just a variant on Jonathan Fine's "key object" idea with a new name.
Where named is an object kind like slice, and it evaluates to the pure value, but also has a .name like slice() has .start.
This is contradictory, and not possible in Python without a lot of work, if at all. You want `named("k", 3)` to evaluate to 3, but 3 has no `.name` attribute. So you can only have one: either `named("k", 3)` evaluates to a special key object with a .name attribute "k", or it evaluates to 3. Pick one. -- Steve
participants (16)
-
2QdxY4RzWzUUiLuE@potatochowder.com
-
Brendan Barnwell
-
Bruce Leban
-
Chris Angelico
-
Christopher Barker
-
David Mertz
-
Edwin Zimmerman
-
Greg Ewing
-
Guido van Rossum
-
Random832
-
Ricky Teachey
-
Stefano Borini
-
Stephan Hoyer
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Todd