On Sat, Sep 26, 2020 at 01:43:18AM -0400, Ricky Teachey wrote:
On Fri, Sep 25, 2020 at 11:50 PM Steven D'Aprano <steve@pearwood.info> wrote:
TL;DR:
1. We have to pass a sentinel to the setitem dunder if there is no positional index passed. What should that sentinel be?
Isn't there a problem with this starting assumption?
It's not an assumption, it's a conclusion. I can't see any other clean, and *easy*, way for this to work with setitem. If you can, I'm listening. The problem is that there's no simple way to get an optional first positional argument and mandatory second positional argument: def __setitem__(self, [index,] value, [*, spam]): but adding this to the language is out of scope for the PEP. But see below.
Say that I have item dunder code of the signature that is common today, and I have no interest in providing direct support for kwd args in my item dunders. Say also, kwd unpacking is supported.
Okay. Existing code doesn't change.
If a person unpacks an empty dictionary (something that surely will happen occasionally), and a SENTIAL is provided, this poses a pretty good possibility to lead to unintended behavior in many cases:
Correct, and well-spotted. The solution is: *don't do that*. If you, the caller, do so, that's on your own head. We cannot expect to protect users from their own errors. Don't insert arbitrary unpacking into any function or method call, and that includes subscripts, if you don't know what it will do. Consider: obj = {} kwargs = {'spam': 1} obj[**kwargs] = value Right now this is a syntax error, so nobody is doing it. If the PEP is accepted, it will become a TypeError since dict won't support keywords. Either way, nobody is going to get into the habit of doing this in dicts, or lists, or any other class that doesn't support keyword subscripts. The only way to get something surprising is: obj = {} kwargs = {} obj[**kwargs] = value which is equivalent to: obj[SENTINEL] = value That's another argument for choosing NotImplemented over None or the empty tuple. We can't expect subscriptable classes to reject a subscript of () or None, since these are legitimate keys and have existing uses, but we can say to class creators "if you care about this, just reject NotImplemented if it is the subscript". I expect most classes won't bother. I wouldn't expect `dict` to bother either.
Who knows what the effect of c[SENTINEL] will be for so much existing code out there? Bugs are surely to be created, aren't they? So a lot of people will have to make changes to existing code to handle this with a sentinel. This seems like a big ugly problem to me that needs to be avoided.
If this PEP is accepted, it won't magically insert keyword unpacking into existing code. So it's not going to break existing code. The only code that might be broken is future code, and even then, we can can just declare that its not broken, it's working as designed: obj[**{}] # equivalent to obj[NotImplemented] by definition If you don't want that, don't do it. I don't think that's the perfect solution, but I think it is good enough.
Maybe there's a middle way option that could avoid the particular problem outlined above:
# Option 4: do use a sentinel at all, user provides own (if desired) [...]
How do we write the dunder item to allow for this? Simple: just require that if it is desired to support kwd-only item setting, the writer of the code has to provide a default-- ANY default-- to the value argument in setitem (and this default will never be used).
Simple to say, but I don't think it will be simple to implement. Currently, the interpreter treats all methods alike. Given some arguments, it binds them to the parameters from left to right, then handles keyword arguments. That's why it can't just skip the first positional argument (the index) and bind to the second (the value from the right hand side). Currently the interpreter doesn't know or care what method is being called when it packs and unpacks arguments, all method and function calls get unpacked more or less in the same way: positional arguments get bound from the left to the right. If we were willing to add a second set of rules for parameter binding, we could avoid this problem. Something roughly like this. For every other function and method: * bind positional arguments from left to right; * if too many positional arguments, raise * handle keyword arguments; * if too many keyword arguments, or duplicate, raise * any parameter that doesn't have a value, fetch its default * if any parameter still doesn't have a value, raise * call the function For `__setitem__` only: * if there is a non-keyword subscript, use the procedure above * otherwise, bind the RHS value to the second positional parameter * handle keywords as above * fetch the default for the first positional argument * if there is no default, raise * call the function I've probably skipped a few steps, but you get the drift. We would need two distinct ways of packing arguments into parameters, one of which is specialised for setitem alone. Is it worth it? In my opinion, I don't think so, but I'd like to hear from somebody who knows more about the parameter binding code and can comment on how hard this would be. I assume it would be tedious but not impossible. To me it seems like deploying a giant hammer to squash a tiny insect, but maybe my instinct for the difficulty of this change is way off. Maybe it's trivially easy and we should just do it :-)
I suggest the ellipses object as the standard convention:
Ellipsis doesn't make a good choice, because it is heavily used by numpy. In fact is was specifically added to Python for numpy. -- Steve