Proposed new syntax for subscripting (was PEP 472)
This is slightly revised version of something I sent to Stefano two weeks ago. I hope he is planning to use this, or something similar, in the PEP, but for what it's worth here it is for discussion. This is, as far as I can tell, the minimum language change needed to support keywords in subscripts, and it will support all the desired use-cases. * * * (1) An empty subscript is still illegal, regardless of context. obj[] # SyntaxError (2) A single subscript remains a single argument: obj[index] # calls type(obj).__getitem__(index) obj[index] = value # calls type(obj).__setitem__(index, value) del obj[index] # calls type(obj).__delitem__(index) (This remains the case even if the index is followed by keywords; see point 5 below.) (3) Comma-seperated arguments are still parsed as a tuple and passed as a single positional argument: obj[spam, eggs] # calls type(obj).__getitem__((spam, eggs)) obj[spam, eggs] = value # calls type(obj).__setitem__((spam, eggs), value) del obj[spam, eggs] # calls type(obj).__delitem__((spam, eggs)) Points (1) to (3) mean that classes which do not want to support keyword arguments in subscripts need do nothing at all. (Completely backwards compatible.) (4) Keyword arguments, if any, must follow positional arguments. obj[1, 2, spam=None, 3) # SyntaxError This is like function calls, where intermixing positional and keyword arguments give a SyntaxError. (5) Keyword subscripts, if any, will be handled like they are in function calls. Examples: # Single index with keywords: obj[index, spam=1, eggs=2] # calls type(obj).__getitem__(index, spam=1, eggs=2) obj[index, spam=1, eggs=2] = value # calls type(obj).__setitem__(index, value, spam=1, eggs=2) del obj[index, spam=1, eggs=2] # calls type(obj).__delitem__(index, spam=1, eggs=2) # Comma-separated indices with keywords: obj[foo, bar, spam=1, eggs=2] # calls type(obj).__getitem__((foo, bar), spam=1, eggs=2) and *mutatis mutandis* for the set and del cases. (6) The same rules apply with respect to keyword subscripts as for keywords in function calls: - the interpeter matches up each keyword subscript to a named parameter in the appropriate method; - if a named parameter is used twice, that is an error; - if there are any named parameters left over (without a value) when the keywords are all used, they are assigned their default value (if any); - if any such parameter doesn't have a default, that is an error; - if there are any keyword subscripts remaining after all the named parameters are filled, and the method has a `**kwargs` parameter, they are bound to the `**kwargs` parameter as a dict; - but if no `**kwargs` parameter is defined, it is an error. (7) Sequence unpacking remains a syntax error inside subscripts: obj[*items] Reason: unpacking items would result it being immediately repacked into a tuple. Anyone using sequence unpacking in the subscript is probably confused as to what is happening, and it is best if they receive an immediate syntax error with an informative error message. (8) Dict unpacking is permitted: items = {'spam': 1, 'eggs': 2} obj[index, **items] # equivalent to obj[index, spam=1, eggs=2] (9) Keyword-only subscripts are permitted: obj[spam=1, eggs=2] # calls type(obj).__getitem__(spam=1, eggs=2) del obj[spam=1, eggs=2] # calls type(obj).__delitem__(spam=1, eggs=2) but note that the setter is awkward since the signature requires the first parameter: obj[spam=1, eggs=2] = value # wants to call type(obj).__setitem__(???, value, spam=1, eggs=2) Proposed solution: this is a runtime error unless the setitem method gives the first parameter a default, e.g.: def __setitem__(self, index=None, value=None, **kwargs) Note that the second parameter will always be present, nevertheless, to satisfy the interpreter, it too will require a default value. (Editorial comment: this is undoubtably an awkward and ugly corner case, but I am reluctant to prohibit keyword-only assignment.) Comments -------- (a) Non-keyword subscripts are treated the same as the status quo, giving full backwards compatibility. (b) Technically, if a class defines their getter like this: def __getitem__(self, index): then the caller could call that using keyword syntax: obj[index=1] but this should be harmless with no behavioural difference. But classes that wish to avoid this can define their parameters as positional-only: def __getitem__(self, index, /): (c) If the method is declared with no positional arguments (aside from self), only keyword subscripts can be given: def __getitem__(self, *, index) # requires obj[index=1] not obj[1] Although this is unavoidably awkward for setters: # Intent is for the object to only support keyword subscripts. def __setitem__(self, i=None, value=None, /, *, index) if i is not None: raise TypeError('only keyword arguments permitted') Gotchas ------- If the subscript dunders are declared to use positional-or-keyword parameters, there may be some surprising cases when arguments are passed to the method. Given the signature: def __getitem__(self, index, direction='north') if the caller uses this: obj[0, 'south'] they will probably be surprised by the method call: # expected type(obj).__getitem__(0, direction='south') # but actually get: obj.__getitem__((0, 'south'), direction='north') Solution: best practice suggests that keyword subscripts should be flagged as keyword-only when possible: def __getitem__(self, index, *, direction='north') The interpreter need not enforce this rule, as there could be scenarios where this is the desired behaviour. But linters may choose to warn about subscript methods which don't use the keyword-only flag. -- Steve
Thanks Steve, it's good to see this so clearly laid out. And indeed, v=fior backward compatibility, while also being as similar to function call behavior as possible, this is pretty much the only option. One request: # Comma-separated indices with keywords:
obj[foo, bar, spam=1, eggs=2] # calls type(obj).__getitem__((foo, bar), spam=1, eggs=2)
and *mutatis mutandis* for the set and del cases.
The set case is non-trival (which you do discuss later), but could you provide an example or two here as well? I think: obj[foo, bar, spam=1, eggs=2] = a_value # calls type(obj).__getitem__((foo, bar), a_value, spam=1, eggs=2) Is that right? good to lay it out. The setitem case is particularly awkward, but fortunately, only for the writers of classes that use these features, not the users of those classes. So manageable complexity. I like your gotchas section -- and we do want to have this well documented, along with the "best practices" that you suggest. In fact, I've never been able to find a good source of best practices for writing a class that takes multiple positional indexes -- and it's pretty awkward. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
Hi Steven Thank you, for reviewing and combining your previous statements into a single message. This is a great help, for the creation and editing of the revised PEP. I intend to do something similar myself. Perhaps others might also want to do something similar. -- Jonathan
On Tue, Sep 1, 2020 at 9:34 AM Jonathan Fine <jfine2358@gmail.com> wrote:
Hi Steven
Thank you, for reviewing and combining your previous statements into a single message. This is a great help, for the creation and editing of the revised PEP.
I intend to do something similar myself. Perhaps others might also want to do something similar.
A point of order here: I dislike PEPs that lay out multiple options for the reader. (This was one of the flaws of PEP 472, and IMO it's held up PEP 505.) If you have a competing proposal, you should write it up in a separate PEP and find a separate sponsor. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On Tue, Sep 01, 2020 at 09:04:31AM -0700, Christopher Barker wrote:
The set case is non-trival (which you do discuss later), but could you provide an example or two here as well?
I think:
obj[foo, bar, spam=1, eggs=2] = a_value # calls type(obj).__getitem__((foo, bar), a_value, spam=1, eggs=2)
Is that right? good to lay it out.
Yes, that's what I had in mind.
The setitem case is particularly awkward, but fortunately, only for the writers of classes that use these features, not the users of those classes. So manageable complexity.
There is unfortunately no interpreter support for making an earlier parameter optional while a later one is mandatory (like the range builtin). We can write this signature: range(start=0, end, step=1) but we can't implement it directly..
I like your gotchas section -- and we do want to have this well documented, along with the "best practices" that you suggest.
Thank you. -- Steve
On Tue, Sep 1, 2020 at 8:45 AM Steven D'Aprano <steve@pearwood.info> wrote:
This is slightly revised version of something I sent to Stefano two weeks ago. I hope he is planning to use this, or something similar, in the PEP, but for what it's worth here it is for discussion.
Excellent. Could you add some cases that show how `d[1]` differs from `d[1,]`? (And perhaps explain the reason why `d[1, k=2]` follows `d[1]` instead of `d[1,]`. I know the answer, but it's worth being clear about this particular edge case because it has tripped up various attempts.) -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
Thank you Steven, This exactly matches what my goal would be, except the below. On Tue, Sep 1, 2020 at 11:45 AM Steven D'Aprano <steve@pearwood.info> wrote:
(8) Dict unpacking is permitted:
items = {'spam': 1, 'eggs': 2} obj[index, **items] # equivalent to obj[index, spam=1, eggs=2]
I would prefer to disallow this, at least initially. None of the use cases I've seen have an actual need for dict unpacking, and it generally just seems to follow from the analogy with function calls. I think not allowing that encourages use of index-related operations rather than "another spelling of `.__call__()`. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
On Tue, Sep 1, 2020 at 9:59 AM David Mertz <mertz@gnosis.cx> wrote:
On Tue, Sep 1, 2020 at 11:45 AM Steven D'Aprano <steve@pearwood.info> wrote:
(8) Dict unpacking is permitted:
items = {'spam': 1, 'eggs': 2} obj[index, **items] # equivalent to obj[index, spam=1, eggs=2]
I would prefer to disallow this, at least initially. None of the use cases I've seen have an actual need for dict unpacking, and it generally just seems to follow from the analogy with function calls. I think not allowing that encourages use of index-related operations rather than "another spelling of `.__call__()`.
I think we need this for the same reason why we need **kwargs in normal function calls: it makes it possible to forward dictionaries of arguments to another method. In particular, indexing methods with a signature like __getitem__(self, value=None, /, **kwargs) will be useful for indexing on "labeled arrays", where dimension names are determined dynamically. Then assuredly someone (e.g., in library code) is going to want to build up these kwargs with dynamic keys. If we don't have dict unpacking, you'd have to write obj.__getitem__(**kwargs) rather than obj[**kwargs], which is much more verbose and goes against the principle that you shouldn't need to call double-underscore methods in normal Python code. It's also more error prone, particularly in the mixed positional/keyword argument case, because you have to reverse engineer Python's syntax to figure out tuple packing.
-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/DZLDC5... Code of Conduct: http://python.org/psf/codeofconduct/
On 2/09/20 5:14 am, Stephan Hoyer wrote:
On Tue, Sep 1, 2020 at 9:59 AM David Mertz <mertz@gnosis.cx I think we need this for the same reason why we need **kwargs in normal function calls: it makes it possible to forward dictionaries of arguments to another method.
If we don't have dict unpacking, you'd have to write obj.__getitem__(**kwargs) rather than obj[**kwargs], which is much more verbose and goes against the principle that you shouldn't need to call double-underscore methods in normal Python code.
It's also not even quite correct! As another message posted recently pointed out, if 'other' happens to be a class object, __class_getitem__ should be called instead, if it exists: def __getitem__(self, index, **kwds): g = None if isinstance(other, type): g = getattr(other, '__class_getitem__', None) if g is None: g = other.__getitem__ return g(index, **kwds) This is getting pretty tricky and convoluted. (I'm not sure it's completely correct even now, since it doesn't check whether __class_getitem__ is a class method.) It certainly won't remain correct if the rules for indexing method lookup change in the future. If a[**kwds] is allowed, on the other hand, it's easy to write index forwarding code that is straightforward, obvious and future-proof. -- Greg
On 01.09.20 17:44, Steven D'Aprano wrote:
(9) Keyword-only subscripts are permitted:
obj[spam=1, eggs=2] # calls type(obj).__getitem__(spam=1, eggs=2)
del obj[spam=1, eggs=2] # calls type(obj).__delitem__(spam=1, eggs=2)
but note that the setter is awkward since the signature requires the first parameter:
obj[spam=1, eggs=2] = value # wants to call type(obj).__setitem__(???, value, spam=1, eggs=2)
Proposed solution: this is a runtime error unless the setitem method gives the first parameter a default, e.g.:
def __setitem__(self, index=None, value=None, **kwargs)
Note that the second parameter will always be present, nevertheless, to satisfy the interpreter, it too will require a default value.
(Editorial comment: this is undoubtably an awkward and ugly corner case, but I am reluctant to prohibit keyword-only assignment.)
Why does the signature require the first `index` parameter? When I see `obj[spam=1, eggs=2] = value` there's no positional index and so I wouldn't expect one to be passed. Similar to how the following works: >>> def foo(): ... pass ... >>> foo(*()) If there's nothing to unpack nothing will be assigned to any of the parameters. So the following signature would work with keyword-only subscripts: def __setitem__(self, value, **kwargs): I don't how the `[] =` operator is translated to `__setitem__` at the implementation level, so perhaps the no-positional-index case would require yet another opcode, but thinking of it in the following way it would definitely be possible: obj.__setitem__(*(pos_index + (value,)), **kwargs) where `pos_index` is the positional index collected from the `[]` operator as a tuple (and if no such index is given it defaults to the empty tuple). This matches the above no-index signature of `__setitem__`. This is also the signature of the corresponding `__getitem__` and `__delitem__` methods.
On 2/09/20 3:44 am, Steven D'Aprano wrote:
(9) Keyword-only subscripts are permitted:
obj[spam=1, eggs=2] # calls type(obj).__getitem__(spam=1, eggs=2)
del obj[spam=1, eggs=2] # calls type(obj).__delitem__(spam=1, eggs=2)
An alternative is to pass an empty tuple as the index when there are no positional args. That would make the following less of a special case:
but note that the setter is awkward since the signature requires the first parameter:
obj[spam=1, eggs=2] = value # wants to call type(obj).__setitem__(???, value, spam=1, eggs=2)
Gotchas -------
def __getitem__(self, index, direction='north')
if the caller uses this:
obj[0, 'south']
they will probably be surprised by the method call:
obj.__getitem__((0, 'south'), direction='north')
Solution: best practice suggests that keyword subscripts should be flagged as keyword-only when possible:
def __getitem__(self, index, *, direction='north')
Note that this will only help if the user looks at and takes careful notice of the signature of the __getitem__ method -- it won't actually prevent them from attempting to do obj[0, 'south'] and won't affect the result if they do. A better best practice would be to use a decorator such as the one I posted earlier to convert the packed positional args into real ones. Then they will get the expected result in most cases. -- Greg
On Wed, Sep 02, 2020 at 12:55:43PM +1200, Greg Ewing wrote:
On 2/09/20 3:44 am, Steven D'Aprano wrote:
(9) Keyword-only subscripts are permitted:
obj[spam=1, eggs=2] # calls type(obj).__getitem__(spam=1, eggs=2)
del obj[spam=1, eggs=2] # calls type(obj).__delitem__(spam=1, eggs=2)
An alternative is to pass an empty tuple as the index when there are no positional args. That would make the following less of a special case:
I think that will interfere with people who need to choose their own default for the index. Unifying the behaviour of the getter and the setter only makes sense if you have both; for immutable objects (hence no setter) that need only keyword subscripts, why bother declaring an index you're never going to use when you actually want this signature? __getitem__(self, *, keyword=default) Having to declare a positional index parameter I don't want, simply for consistency with a setitem I'm not even using, might be a tad annoying.
Solution: best practice suggests that keyword subscripts should be flagged as keyword-only when possible:
def __getitem__(self, index, *, direction='north')
Note that this will only help if the user looks at and takes careful notice of the signature of the __getitem__ method -- it won't actually prevent them from attempting to do obj[0, 'south'] and won't affect the result if they do.
People who are determined enough to write buggy code will write buggy code no matter what you do. Depending on what your object subscript is supposed to actually do, there are ways to fix this: - make the direction mandatory rather than give it a default; - require that index is only a single value (say, an int) and raise if you receive a tuple (int, string); but ultimately the caller is responsible for providing good input to the methods they call. If the caller knows enough that the subscript needs a direction, they should know enough to know that it has to be passed as keyword. We can't expect the interpreter itself to prevent all PEBCAK errors. -- Steve
On Wed, Sep 2, 2020 at 4:23 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Sep 02, 2020 at 12:55:43PM +1200, Greg Ewing wrote:
On 2/09/20 3:44 am, Steven D'Aprano wrote:
(9) Keyword-only subscripts are permitted:
obj[spam=1, eggs=2] # calls type(obj).__getitem__(spam=1, eggs=2)
del obj[spam=1, eggs=2] # calls type(obj).__delitem__(spam=1, eggs=2)
An alternative is to pass an empty tuple as the index when there are no positional args. That would make the following less of a special case:
I think that will interfere with people who need to choose their own default for the index.
Unifying the behaviour of the getter and the setter only makes sense if you have both; for immutable objects (hence no setter) that need only keyword subscripts, why bother declaring an index you're never going to use when you actually want this signature?
__getitem__(self, *, keyword=default)
Having to declare a positional index parameter I don't want, simply for consistency with a setitem I'm not even using, might be a tad annoying.
The whole backwards compatibility thing is more than a bit annoying, isn't it? So maybe accepting that you'll always get a key, and it may be `()`, isn't the worst part. OTOH, maybe we can cook up a scheme where we have two API families, `__getitem__/__setitem__` and `__getindex__/__setindex__`. The signature of `__setindex__` has the value first (after `self`). The rules could be something like - If keywords are present, always use `__getindex__/__setindex__` (fail if they aren't present); if multiple positional indices are present, these become multiple arguments. - Otherwise, if only `__getindex__/__setindex__` are present, call those, and lose the distinction between `d[1]` and `d[1,]`; `d[1, 2]` becomes two positional arguments. - Otherwise, if only `__getitem__/__setitem__` are present, call those in a 100% backwards compatible way. - Otherwise, if both are present, call `__getitem__/__setitem__` if no outer-level comma is present (so for `d[1]` as well as for `d[(1, 2)]`), and call `__getindex__/__setindex__` if an outer-level comma is present (so for `d[1,]` as well as for `d[1, 2]`). This can be done relatively cleanly with changes at both the bytecode level and the C API level. I'm not sure I like this better than the "pure `__getitem__/__setitem__` scheme, because the API duplication is troublesome, -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
participants (8)
-
Christopher Barker
-
David Mertz
-
Dominik Vilsmeier
-
Greg Ewing
-
Guido van Rossum
-
Jonathan Fine
-
Stephan Hoyer
-
Steven D'Aprano