PEP 472 -- Support for indexing with keyword arguments
Hello! I'm not sure I'm addressing the right audience here, so please direct me to the appropriate channel if that's the case... My name is Andras Tantos and I'm working on a Python library desscribing HW designs. I came across this problem of __getitem__ and co. not supporting kwargs. Apparently this extension was proposed and rejected as PEP 472. Apart from my use-case, which is arguably a corner-case and not worth modifying the language for, I believe there are two important use-cases that are worth considering with the latest improvements in the language: 1. With the recent type-hint support, the feature could be made way more descriptive if this PEP got implemented. For example, instead of doing the following: def func(in: Dict[str, int]) one could write: def func(in: Dict[key=str, value=int]) 2. It would also make 'generic classes' much cleaner to implement, similar to the way type-hints look. Consider the following code: class_Generic(object): Specializations = [] @classmethod def__getitem__(cls, *args): name = f"Generic_{len(cls.Specializations)}" Specialized = type(name, (cls,), {"specials": tuple(args)}) cls.Specializations.append(Specialized) returnSpecialized def__init__(self, value= None): self.value = value def__str__(self): ifhasattr(self, "specials"): return(f"[{type(self)}- "+ ",".join(str(special) forspecial inself.specials) + f"] - {self.value}") else: return(f"[{type(self)}- GENERIC"+ f"] - {self.value}") Generic = _Generic() #g = Generic() - fails because of no specialization is given s1 = Generic[12]() s2 = Generic[42]("Hi!") print(s1) print(s2) Running this simple example results in: python3 -i python_test.py [<class '__main__.Generic_0'> - 12] - None [<class '__main__.Generic_1'> - 42] - Hi! You can see how the specialized parameters got passed as well as the ones to '__init__'. Obviously, in real code the idea would be to filter generic parameters and set up 'Specialized' with the right set of methods and arguments. Now, without kwargs support for __getitem__, it's impossible to pass named arguments to the specialization list, which greatly limits the usability of this notation. I don't know how convincing these arguments and use-cases are for you, but could you advise me about how to start the 'ball rolling' to drum-up support for re-activating this PEP? Thanks again, Andras Tantos
On Sun, 3 May 2020 14:58:41 -0700 Andras Tantos <andras@tantosonline.com> wrote:
1. With the recent type-hint support, the feature could be made way more descriptive if this PEP got implemented.
For example, instead of doing the following:
def func(in: Dict[str, int])
one could write:
def func(in: Dict[key=str, value=int])
Of course, that's why I originally suggested that `Dict[...]` should be spelled `Dict(...)` instead. Regards Antoine.
I am one of the authors of the PEP. My problem was to deal with natural notation in quantum chemistry mostly. It had no technical purpose, but I still think it would open some interesting options. The PEP was rejected mostly because of lack of interest. On Mon, 4 May 2020 at 00:07, Andras Tantos <andras@tantosonline.com> wrote:
Hello!
I'm not sure I'm addressing the right audience here, so please direct me to the appropriate channel if that's the case...
My name is Andras Tantos and I'm working on a Python library desscribing HW designs. I came across this problem of __getitem__ and co. not supporting kwargs. Apparently this extension was proposed and rejected as PEP 472.
Apart from my use-case, which is arguably a corner-case and not worth modifying the language for, I believe there are two important use-cases that are worth considering with the latest improvements in the language:
1. With the recent type-hint support, the feature could be made way more descriptive if this PEP got implemented.
For example, instead of doing the following:
def func(in: Dict[str, int])
one could write:
def func(in: Dict[key=str, value=int])
2. It would also make 'generic classes' much cleaner to implement, similar to the way type-hints look. Consider the following code:
class _Generic(object): Specializations = [] @classmethod def __getitem__(cls, *args): name = f"Generic_{len(cls.Specializations)}" Specialized = type(name, (cls,), {"specials": tuple(args)}) cls.Specializations.append(Specialized) return Specialized def __init__(self, value = None): self.value = value def __str__(self): if hasattr(self, "specials"): return(f"[{type(self)} - " + ",".join(str(special) for special in self.specials) + f"] - {self.value}") else: return(f"[{type(self)} - GENERIC" + f"] - {self.value}") Generic = _Generic() #g = Generic() - fails because of no specialization is given s1 = Generic[12]() s2 = Generic[42]("Hi!") print(s1) print(s2)
Running this simple example results in:
python3 -i python_test.py [<class '__main__.Generic_0'> - 12] - None [<class '__main__.Generic_1'> - 42] - Hi!
You can see how the specialized parameters got passed as well as the ones to '__init__'. Obviously, in real code the idea would be to filter generic parameters and set up 'Specialized' with the right set of methods and arguments.
Now, without kwargs support for __getitem__, it's impossible to pass named arguments to the specialization list, which greatly limits the usability of this notation.
I don't know how convincing these arguments and use-cases are for you, but could you advise me about how to start the 'ball rolling' to drum-up support for re-activating this PEP?
Thanks again, Andras Tantos
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/6OGAFD... Code of Conduct: http://python.org/psf/codeofconduct/
-- Kind regards, Stefano Borini
What the future of this? I looked at type annotations in networkx recently ( https://github.com/networkx/networkx/pull/4014), and I wanted to keep things simple, so I proposed and implemented Graph[NodeType] However, I knew that they may ultimately want Graph[NodeType, EdgeTypedDict, NodeTypedDict] but no one is going to want to replace their calls with Graph[str, dict[str, Any], dict[str, Any]] That's what too noisy. This proposal would allow you to have default parameters. But what's the future looking like now? Do we expect to have a type constructor? class Graph: def T(node_type, edge_type_dict=..., node_type_dict=...) -> a type annotation And then g: Graph.T(whatever) = Graph(....) Does that work? On Friday, July 10, 2020 at 4:20:58 AM UTC-4, Stefano Borini wrote:
I am one of the authors of the PEP. My problem was to deal with natural notation in quantum chemistry mostly. It had no technical purpose, but I still think it would open some interesting options. The PEP was rejected mostly because of lack of interest.
On Mon, 4 May 2020 at 00:07, Andras Tantos <and...@tantosonline.com <javascript:>> wrote:
Hello!
I'm not sure I'm addressing the right audience here, so please direct me
to the appropriate channel if that's the case...
My name is Andras Tantos and I'm working on a Python library desscribing
HW designs. I came across this problem of __getitem__ and co. not supporting kwargs. Apparently this extension was proposed and rejected as PEP 472.
Apart from my use-case, which is arguably a corner-case and not worth
modifying the language for, I believe there are two important use-cases that are worth considering with the latest improvements in the language:
1. With the recent type-hint support, the feature could be made way more
descriptive if this PEP got implemented.
For example, instead of doing the following:
def func(in: Dict[str, int])
one could write:
def func(in: Dict[key=str, value=int])
2. It would also make 'generic classes' much cleaner to implement,
similar to the way type-hints look. Consider the following code:
class _Generic(object): Specializations = [] @classmethod def __getitem__(cls, *args): name = f"Generic_{len(cls.Specializations)}" Specialized = type(name, (cls,), {"specials": tuple(args)}) cls.Specializations.append(Specialized) return Specialized def __init__(self, value = None): self.value = value def __str__(self): if hasattr(self, "specials"): return(f"[{type(self)} - " + ",".join(str(special) for special in
self.specials) + f"] - {self.value}")
else: return(f"[{type(self)} - GENERIC" + f"] - {self.value}") Generic = _Generic() #g = Generic() - fails because of no specialization is given s1 = Generic[12]() s2 = Generic[42]("Hi!") print(s1) print(s2)
Running this simple example results in:
python3 -i python_test.py [<class '__main__.Generic_0'> - 12] - None [<class '__main__.Generic_1'> - 42] - Hi!
You can see how the specialized parameters got passed as well as the ones to '__init__'. Obviously, in real code the idea would be to filter generic parameters and set up 'Specialized' with the right set of methods and arguments.
Now, without kwargs support for __getitem__, it's impossible to pass named arguments to the specialization list, which greatly limits the usability of this notation.
I don't know how convincing these arguments and use-cases are for you, but could you advise me about how to start the 'ball rolling' to drum-up support for re-activating this PEP?
Thanks again, Andras Tantos
_______________________________________________ Python-ideas mailing list -- python...@python.org <javascript:> To unsubscribe send an email to python-id...@python.org <javascript:> https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/6OGAFD... Code of Conduct: http://python.org/psf/codeofconduct/
-- Kind regards,
Stefano Borini _______________________________________________ Python-ideas mailing list -- python...@python.org <javascript:> To unsubscribe send an email to python-id...@python.org <javascript:> https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/UHVZLO... Code of Conduct: http://python.org/psf/codeofconduct/
Hi All SUMMARY This is a longish post. It looks at the idea in general terms, and outlines a way to get the desired semantics (by not syntax) with Python as it is today. And this would be forward compatible with the new syntax, if provided later. PRESENT I like the idea of allowing >>> d[1, 2, 3, a=4, b=5] and will explore it further. First, we can already write >>> f(1, 2, 3, a=4, b=5) but that only works for the get operation. For set the present behaviour is >>> f(1, 2, 3, a=4, b=5) = None SyntaxError: can't assign to function call and I see no good reason to change that. Going further, I'd say that allowing both >>> d[something] = value >>> value = d[something] is essential to the difference between f(something) and d[something]. Both are expressions, but only one of them can be assigned to. The proposal therefore is to retain this essential difference, and remove what the proposer regards as an inessential difference. (And I also regard this as an inessential difference.) Now let's use this class class Dummy: def __getitem__(self, *argv, **kwargs): return argv, kwargs to peek into how >>> value = d[something] works. Here goes: >>> d = Dummy() >>> d[1] ((1,), {}) >>> d[1, 2] (((1, 2),), {}) >>> d[:] ((slice(None, None, None),), {}) >>> d['a':'b'] ((slice('a', 'b', None),), {}) We see that the interpreter passes to __getitem__ a single positional argument (which is usually called the key). This is another difference between d[something] and f(something). By the way, the slice syntax items[a:b:c] is so often convenient and widespread, that it's been long enabled in Python. I see no good reason to change that. FUTURE Let's proceed. We continue to use d = Dummy(). Given that >>> key = d[1, 2, 3, a=4, b=5] is allowed, what should we be able to say about the key. Clearly it should be an instance of a class, and there should be a way of creating such an instance without going via d. Let's call the class K (for key). It is natural to say that >>> key_1 = d[1, 2, 3, a=4, b=5] >>> key_2 = K(1, 2, 3, a=4, b=5) should define two K objects, that are identical. Once we have designed and implemented the class K, we can achieve many of the benefits of this proposal within existing Python. Here goes. First syntax. >>> value = d[K(1, 2, 3, a=4, b=5)] >>> d[K(1, 2, 3, a=4, b=5)] = value Next, the implementation of such K-mappings. Here, K.set and K.get decorators will help. Something like (not tested): class MyMap: @K.adjust_get def __getitem__(self, x1, x2, x3, a, b): pass where K.adjust_get interfaces between the K-object and the definition of __getitem__. Aside. Here, not tested, is an implementation of K.adjust_get. class K: @staticmethod def adjust_get(fn): def __getitem__(self, k): return fn(self, *k.argv, **k.kwargs) return __getitem__ This introduction of K allows collections to be created, that can be used today with the syntax >>> value = d[K(1, 2, 3, a=4, b=5)] >>> d[K(1, 2, 3, a=4, b=5)] = value Further, if the K()-less syntax >>> value = d[1, 2, 3, a=4, b=5] >>> d[1, 2, 3, a=4, b=5] = value is added to Python, the existing K-collections could continue to be used, with only a change to the decorator K.adjust_get (and also of course K.adjust_set). I think that is enough for now. I hope it helps some of us, sometimes. And does no harm elsewhere. -- Jonathan
On Fri, Jul 10, 2020, 6:54 AM Jonathan Fine <jfine2358@gmail.com> wrote:
Hi All
SUMMARY This is a longish post. It looks at the idea in general terms, and outlines a way to get the desired semantics (by not syntax) with Python as it is today. And this would be forward compatible with the new syntax, if provided later.
This post was filled with inspiring ideas for me. Thank you.
PRESENT I like the idea of allowing >>> d[1, 2, 3, a=4, b=5] and will explore it further.
First, we can already write >>> f(1, 2, 3, a=4, b=5) but that only works for the get operation. For set the present behaviour is >>> f(1, 2, 3, a=4, b=5) = None SyntaxError: can't assign to function call and I see no good reason to change that.
Going further, I'd say that allowing both >>> d[something] = value >>> value = d[something] is essential to the difference between f(something) and d[something]. Both are expressions, but only one of them can be assigned to.
Here goes. First syntax. >>> value = d[K(1, 2, 3, a=4, b=5)] >>> d[K(1, 2, 3, a=4, b=5)] = value
My mind instantly went to the idea of using this syntax as a way write single line mathematical function definitions: f[x, y] = x + y The example function doesn't even require the suggested K() object since no kwargs or defaults are used. Of course one would need to instantiate any these single line functions using a little bit of boilerplate up top. But this could be when you provide the docstring: f = MathFunction("Simple math function") f[x, y] = x + y And calling them would use a different bracket type (parentheses):
f(1,2) 3
...but these are surmountable hurdles.
On Fri, Jul 10, 2020 at 11:52:19AM +0100, Jonathan Fine wrote:
FUTURE Let's proceed. We continue to use d = Dummy(). Given that >>> key = d[1, 2, 3, a=4, b=5] is allowed, what should we be able to say about the key. Clearly it should be an instance of a class
That's not clear at all. Subscripting is just syntactic sugar for a method call, `__getitem__`, and we already know how to associate positional and keyword arguments to method calls. No special magic class is needed. Slicing today is somewhat special, using a built-in class to map the parts of the slice to a single parameter of the `__getitem__` method, but that's not how it worked originally. In Python 1 and Python 2, slicing with a single colon calls a dunder method `__getslice__` with two positional arguments: seq[1:5] --> seq.__getslice__(1, 5) not a single slice object argument. Only if `__getslice__` doesn't exist is a slice object passed to `__getitem__` instead. So in principle, if we agreed that the syntax was desirable, we could map keyword arguments in slice syntax to ordinary parameters: obj[3, spam=4] # call obj.__getitem__(3, spam=4) and likewise for setitem and delitem. There would be difficulty with positional arguments, since they are already parsed as a tuple: py> {(1,2): 999}[1,2] 999 so this may rule out adding multiple positional arguments to subscripting syntax. But I don't think there is any backwards compatibility issue with adding keyword arguments. -- Steven
I believe that one of the most popular Python domains that benefit from "abusing" indexes is data analysis in the numpy/Pandas world. I am not familiar enough with Pandas to make useful speculation on how named indexes could enhance the usage of dataframes, - maybe someone more familiar can come up with suggestions on how this syntax could be useful? On Fri, 10 Jul 2020 at 09:24, Ricky Teachey <ricky@teachey.org> wrote:
On Fri, Jul 10, 2020, 6:54 AM Jonathan Fine <jfine2358@gmail.com> wrote:
Hi All
SUMMARY This is a longish post. It looks at the idea in general terms, and outlines a way to get the desired semantics (by not syntax) with Python as it is today. And this would be forward compatible with the new syntax, if provided later.
This post was filled with inspiring ideas for me. Thank you.
PRESENT I like the idea of allowing >>> d[1, 2, 3, a=4, b=5] and will explore it further.
First, we can already write >>> f(1, 2, 3, a=4, b=5) but that only works for the get operation. For set the present behaviour is >>> f(1, 2, 3, a=4, b=5) = None SyntaxError: can't assign to function call and I see no good reason to change that.
Going further, I'd say that allowing both >>> d[something] = value >>> value = d[something] is essential to the difference between f(something) and d[something]. Both are expressions, but only one of them can be assigned to.
Here goes. First syntax. >>> value = d[K(1, 2, 3, a=4, b=5)] >>> d[K(1, 2, 3, a=4, b=5)] = value
My mind instantly went to the idea of using this syntax as a way write single line mathematical function definitions:
f[x, y] = x + y
The example function doesn't even require the suggested K() object since no kwargs or defaults are used.
Of course one would need to instantiate any these single line functions using a little bit of boilerplate up top. But this could be when you provide the docstring:
f = MathFunction("Simple math function") f[x, y] = x + y
And calling them would use a different bracket type (parentheses):
f(1,2) 3
...but these are surmountable hurdles.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QKDZ4Y... Code of Conduct: http://python.org/psf/codeofconduct/
On Fri, Jul 10, 2020 at 08:23:09AM -0400, Ricky Teachey wrote:
My mind instantly went to the idea of using this syntax as a way write single line mathematical function definitions:
f[x, y] = x + y
This won't work, because the right hand side will be evaluated first. The above is legal syntax today, and roughly equivalent to: value = x + y index = (x, y) f.__setitem__(index, value) except of course no such local variables value and index are created. But you can see why that cannot be used to define a function: the right hand side is evaluated, and all your `__setitem__` method will see is the value of x + y, not the fact that it is "x + y". -- Steven
On Fri, Jul 10, 2020 at 2:33 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Jul 10, 2020 at 11:52:19AM +0100, Jonathan Fine wrote:
FUTURE Let's proceed. We continue to use d = Dummy(). Given that >>> key = d[1, 2, 3, a=4, b=5] is allowed, what should we be able to say about the key. Clearly it should be an instance of a class
That's not clear at all.
Subscripting is just syntactic sugar for a method call, `__getitem__`, and we already know how to associate positional and keyword arguments to method calls. No special magic class is needed. ... There would be difficulty with positional arguments, since they are already parsed as a tuple:
py> {(1,2): 999}[1,2] 999
so this may rule out adding multiple positional arguments to subscripting syntax. But I don't think there is any backwards compatibility issue with adding keyword arguments.
Jonathan already pointed out that positional arguments are passed as a tuple: Here goes: >>> d = Dummy()
>>> d[1] ((1,), {}) >>> d[1, 2] (((1, 2),), {}) >>> d[:] ((slice(None, None, None),), {}) >>> d['a':'b'] ((slice('a', 'b', None),), {}) We see that the interpreter passes to __getitem__ a single positional argument (which is usually called the key). This is another difference between d[something] and f(something).
I believe he was saying it would be weird if `d[1, 2]` called `d.__getitem__((1, 2))` but `d[1, 2, x=3]` called `d.__getitem__(1, 2, x=3)`. More generally, if `__getitem__` always receives a single positional argument now, it should probably stay that way for consistency.
Hi I will clarify my previous post, by making a stronger statement. Every Python object is an instance of a class. I wrote: Let's proceed. We continue to use d = Dummy(). Given that >>> key = d[1, 2, 3, a=4, b=5] is allowed, what should we be able to say about the key. Clearly it should be an instance of a class, and there should be a way of creating such an instance without going via d. Let's call the class K (for key). I'll now expand on this, as it's not clear to everyone that the key should be an instance of a class. So far I know, every Python object that is accessible to the Python user is an instance of a class (or type). In fact, the type command tells us the object's type. >>> [type(obj) for obj in (None, True, (), {}, [], type)] [<class 'NoneType'>, <class 'bool'>, <class 'tuple'>, <class 'dict'>, <class 'list'>, <class 'type'>] Using d = Dummy(), where d's __getitem__(self, k) simply returns k, we write using the proposed new syntax >>> key = d[1, 2, 3, a=4, b=5] and give this I suggest that >>> key2 = K(1, 2, 3, a=4, b=5) should produce a key2 that is equal to key. Here >>> K = type(key) should provide K. I hope this helps clarify what I wrote. I also thank Alex Hall for providing helpful clarification, from a different point of view. -- Jonathan
On Fri, 10 Jul 2020 at 11:54, Jonathan Fine <jfine2358@gmail.com> wrote:
Let's proceed. We continue to use d = Dummy(). Given that >>> key = d[1, 2, 3, a=4, b=5] is allowed, what should we be able to say about the key. Clearly it should be an instance of a class, and there should be a way of creating such an instance without going via d. Let's call the class K (for key).
It is natural to say that >>> key_1 = d[1, 2, 3, a=4, b=5] >>> key_2 = K(1, 2, 3, a=4, b=5) should define two K objects, that are identical.
Accepted.
Once we have designed and implemented the class K, we can achieve many of the benefits of this proposal within existing Python.
There's a flaw in your extrapolation here - see below.
Here goes. First syntax. >>> value = d[K(1, 2, 3, a=4, b=5)] >>> d[K(1, 2, 3, a=4, b=5)] = value
Correct, but *only* if the type of d accepts objects of type K in __getitem__. Your arguments are starting to get tricky to follow because it's not clear if you intend "d" to still be an instance of Dummy. If it is, of course, then d[K(1,2,3, a=4, b=5)] is a very different value than d[1, 2, 3, a=4, b=5]. But maybe you'll patch this up. You've not made an explicitly false statement yet.
Next, the implementation of such K-mappings. Here, K.set and K.get decorators will help. Something like (not tested): class MyMap: @K.adjust_get def __getitem__(self, x1, x2, x3, a, b): pass where K.adjust_get interfaces between the K-object and the definition of __getitem__.
I have no idea how this works, or what it even means.
Aside. Here, not tested, is an implementation of K.adjust_get. class K: @staticmethod def adjust_get(fn): def __getitem__(self, k): return fn(self, *k.argv, **k.kwargs) return __getitem__
I'm still confused. This code is (of course) valid, but I have no idea how you think it would be used. So it doesn't really explain anything, I'm still not clear what you are trying to say.
This introduction of K allows collections to be created, that can be used today with the syntax >>> value = d[K(1, 2, 3, a=4, b=5)] >>> d[K(1, 2, 3, a=4, b=5)] = value
Yes. Obviously. This is the same mechanism slice() uses.
Further, if the K()-less syntax >>> value = d[1, 2, 3, a=4, b=5] >>> d[1, 2, 3, a=4, b=5] = value is added to Python, the existing K-collections could continue to be used, with only a change to the decorator K.adjust_get (and also of course K.adjust_set).
So you're suggesting that like slice, the d[1, 2, 3, a=4, b=5] syntax could be translated into a K-object. Sure. But that doesn't do any good unless classes implement a __getitem__ that recognises and handles objects of type K. And you've not demonstrated how that would happen, unless you were trying to describe that in your adjust_get discussion. So as far as I can see, all you've said is that a syntax like d[1, 2, 3, a=4, b=5] could be translated into indexing with a custom object. Well yes, of course it could. And it probably would be. But that's just describing an implementation approach, it doesn't really give any argument for why we'd *want* to do this, or how (or if) we'd extend existing (built in or user defined) types to understand this syntax. Maybe I'm unusual in that I didn't see any difficulty in implementing this proposal. But as a general principle, I don't tend to worry about "how will this be implemented" when discussing new features. If it's impossible to implement it, that will become obvious in due course :-) What I'm interested in is always "why is this a good idea". And that's where I've yet to see any really compelling use cases in this thread.
I think that is enough for now. I hope it helps some of us, sometimes. And does no harm elsewhere.
It's certainly a useful exposition of how you'd arrive at the implementation mechanism that slice uses, and extend it for this proposal (excluding that confusing adjust_get bit) which may well be useful to some of the readers here. And yes, it certainly doesn't do any harm clarifying it. But I'd recommend focusing on design, not implementation, for now. Paul
Hi Paul I'm writing in response to one of your comments, all of which are useful. Please forgive me, Paul, for not responding to the others in this message. I wrote:
Next, the implementation of such K-mappings. Here, K.set and K.get decorators will help. Something like (not tested): class MyMap: @K.adjust_get def __getitem__(self, x1, x2, x3, a, b): pass where K.adjust_get interfaces between the K-object and the definition of __getitem__.
About this you wrote:
I have no idea how this works, or what it even means.
With hindsight, I can see your difficulty. The original idea is to extend the syntax so that >>> m = MyMap() >>> m[1, 2, 3, a=4, b=5] = 'foobar' is allowed in some future version of Python. Python's syntax already allows: >>> m = MyMap() >>> m[K(1, 2, 3, a=4, b=5)] = 'foobar' My idea is to implement this, so that we can better explore the original idea. (In particular, this would allow the idea to be tested in practice, and perhaps acquire a community of users.) For simplicity, assume MyMap = type(m). For >>> m[K(1, 2, 3, a=4, b=5)] = 'foobar' to execute successfully, MyMap must have a __setitem__ method. Aside: Here's an example of failure. >>> True[1] = 2 TypeError: 'bool' object does not support item assignment Let's assume MyMap does have a __setitem__ method. Extending what you pointed out, the assignment >>> m[K(1, 2, 3, a=4, b=5)] = 'foobar' will then be equivalent to >>> MyMap.__setitem__(m, k, 'foobar') where >>> k = K(1, 2, 3, a=4, b=5) The code I wrote, which wasn't clear enough, sketched an adapter between two points of view (although for get rather than set). The basic idea is that >>> MyMap.__setitem__(k, 'foobar') is equivalent to something like >>> func(m, 'foobar', 1, 2, 3, a=4, b=5) where func is defined in the class body of MyMap. Hence the adapter decorator, which is to hide from the author of func many of the details of the implementation of K. I hope this helps. -- Jonathan
Let's assume MyMap does have a __setitem__ method. Extending what you pointed out, the assignment >>> m[K(1, 2, 3, a=4, b=5)] = 'foobar' will then be equivalent to >>> MyMap.__setitem__(m, k, 'foobar') where >>> k = K(1, 2, 3, a=4, b=5)
This is confusing to me because `my_mapping[1,2,3]` is already valid syntax equivalent to `my_mapping.__setitem__(my_mapping, (1,2,3))`. Wouldn't a change to translate the tuple into this K object lead to big problems for a lot of existing code...?
On Fri, 10 Jul 2020 at 15:23, Jonathan Fine <jfine2358@gmail.com> wrote:
With hindsight, I can see your difficulty. The original idea is to extend the syntax so that >>> m = MyMap() >>> m[1, 2, 3, a=4, b=5] = 'foobar' is allowed in some future version of Python.
Python's syntax already allows: >>> m = MyMap() >>> m[K(1, 2, 3, a=4, b=5)] = 'foobar'
My idea is to implement this, so that we can better explore the original idea. (In particular, this would allow the idea to be tested in practice, and perhaps acquire a community of users.) [...] The code I wrote, which wasn't clear enough, sketched an adapter between two points of view (although for get rather than set).
OK, thanks. I'm probably not the target for your comment in that case, as (now that you've clarified) the approach you suggest seems straightforward enough to me, but (as I said) looks more at implementation than design. I get what you're saying, that this approach would allow people to try out the idea in current Python, with minimally-intrusive extra syntax needed. So maybe it'll encourage people to try the idea out for their use cases. On Fri, 10 Jul 2020 at 15:32, Ricky Teachey <ricky@teachey.org> wrote:
This is confusing to me because `my_mapping[1,2,3]` is already valid syntax equivalent to `my_mapping.__setitem__(my_mapping, (1,2,3))`. Wouldn't a change to translate the tuple into this K object lead to big problems for a lot of existing code...?
This is much more the sort of question I think we should be exploring at this stage. You can't change how currently-valid syntax is interpreted, so you get a big change of behaviour between d[1,2] and d[1, 2, k=3]. This is where a real-world use case is essential, to try to explore whether that behaviour change is minor, or a showstopper. There's no way to tell with made up examples. Thanks, Ricky, for pointing this out explicitly. Paul
Hi Ricky Thank you for your very helpful comment, which Paul has usefully amplified. You wrote: This is confusing to me because `my_mapping[1,2,3]` is already valid syntax
equivalent to `my_mapping.__setitem__(my_mapping, (1,2,3))`. Wouldn't a change to translate the tuple into this K object lead to big problems for a lot of existing code...?
Yes it would, if we don't take care to maintain backwards compatibility. At present we have: >>> class Dummy: ... def __getitem__(self, *argv, **kwargs): ... return argv, kwargs >>> d = Dummy() >>> d[1, 2, 3] (((1, 2, 3),), {}) This behaviour must be preserved. Only when the new syntax is used should d.__getitem__ get an instance of the new class K. Otherwise, it should as before get a tuple, exactly as before. (Therefore, it might be good that K(1, 2, 3) returns the tuple (1, 2, 3), rather than an instance of K. As I recall, this can be done by using a suitable K.__new__ method.) And this would include the use of colon to obtain slices.: >>> d[0:1,2:3,4:5] (((slice(0, 1, None), slice(2, 3, None), slice(4, 5, None)),), {}) This solution to backwards compatibility is another reason for using a K.adjust_get decorator, when defining the __getitem__ for a map that relies on the new syntax. That way, if the programmer wants the new feature, the appropriate marshalling of positional (and keyword) arguments is done by a dedicated piece of code. I hope this helps, and thank you again for your query (and Paul for his constructive critical interest). -- Jonathan
On Fri, Jul 10, 2020 at 01:59:06PM +0100, Jonathan Fine wrote:
I wrote: Let's proceed. We continue to use d = Dummy(). Given that >>> key = d[1, 2, 3, a=4, b=5] is allowed, what should we be able to say about the key. Clearly it should be an instance of a class and there should be a way of creating such an instance without going via d. Let's call the class K (for key).
I'll now expand on this, as it's not clear to everyone that the key should be an instance of a class.
Of course it is clear that if `key` *exists at all*, then it will be an instance of a class, and furthermore that there will be ways to instantiate that class independently of subscript syntax. This is fundamental to Python's design. *If* it exists though. And I'm not convinced that there's any need for this K class, which seems to be analogous to slice(). Or at least, it needs to be justified, and not just assumed. The alternative model is to treat subscripting analogously to function calls, and bind the arguments to parameters in the method. In that case, your "key" object won't even exist. That's how slicing used to work, and there's no reason why we couldn't do the same thing in the future for any enhancement. Before getting bogged down in the question of how to implement this, we ought to firstly establish what this is, and whether we need it. Having read your posts repeatedly, I think it is safe to say that you are thinking along the lines of a something analogous to the slice() builtin. It might have made your explanation easier to understand if you had made that analogy explicit: # Current Python myobj[a:b:c] --> myobj.__getitem__(slice(a, b, c)) # Proposed enhancement: myobj[a, b, c, spam=d, eggs=e] --> myobj.__getitem__(lump(a, b, c, spam=d, eggs=e)) # and similar for assignment and deletion where "lump" (for lack of a better name) is a helper object analogous to "slice", but accepting arbitrary positional and keyword arguments. (Note: to be precise, dunder methods aren't looked up on the instance themselves, but on the class. But as shorthand, I'll continue to pretend otherwise.) As Paul says, this is pretty obvious stuff so far. Also obvious is that we can define our own "lump" class and experiment with it: # Much like your Dummy class. class lump: def __init__(self, *args, **kwargs): self.args = args self.kwargs = kwargs In your `myobj.__getitem__` method, we can handle the "lump": # myobj's class def __getitem__(self, item): print(item.args) print(item.kwargs) which is enough for experimentation. You can even alias `K=lump` if you prefer :-) Some additional issues: For getters, why not just make the object callable and write `myobj(spam=1)`? We need to justify the use of square brackets. For setters, if we're going to change the syntax to allow keywords in subscripts, perhaps it is better/easier to just allow function calls on the left hand side of assignments? We need to consider why we are doing this, what we hope to do, and whether square or round brackets are preferable. For both: there are serious backwards compatibility concerns regarding the use of multiple parameters, since they are currently legal and bound up in a tuple argument: # Current Python myobj[1, 2] # calls __getitem__ with a single tuple argument (1, 2) # How to distinguish the above from this? myobj[1, 2] # call __getitem__ with two int arguments? However, there is no such concern regarding keyword arguments.
So far I know, every Python object that is accessible to the Python user is an instance of a class (or type).
Correct. Even type itself is an instance of itself. -- Steven
On Fri, Jul 10, 2020 at 02:48:31PM +0200, Alex Hall wrote:
I believe he was saying it would be weird if `d[1, 2]` called `d.__getitem__((1, 2))` but `d[1, 2, x=3]` called `d.__getitem__(1, 2, x=3)`. More generally, if `__getitem__` always receives a single positional argument now, it should probably stay that way for consistency.
Right -- I already said as much too :-) I think Jonathan's K class is a red herring (except to the degree that it could be used for experimentation). For backwards compatibility, it would be difficult or impossible to support multiple positional arguments. But we could add keyword only args to subscripting (if there is a compelling need for them) easily. Existing signatures would stay the same: def __getitem__(self, item) def __setitem__(self, item, value) but those who wanted keyword args could add them, with or without defaults: def __getitem__(self, item, *, spam=None) def __setitem__(self, item, value, *, spam, eggs=None) I trust the expected behaviour is obvious, but in case it's not: myobj[1, 2, spam=3] # calls __getitem__((1, 2), spam=3) myobj[1, 2, spam=3] = 999 # calls __setitem__((1, 2), 999, spam=3, eggs=None) (And similarly for delitem of course.) I must admit I like the look of this, but I don't know what I would use it for. -- Steven
On Fri, 10 Jul 2020 at 17:45, Steven D'Aprano <steve@pearwood.info> wrote:
I must admit I like the look of this, but I don't know what I would use it for.
It feels very much like the sort of "here's some syntax that might match someone's mental model of something" that is common in languages that focus on allowing users to build their own DSLs¹ (Lua and Groovy are two examples of the type of language I'm thinking of, although I don't know if either has this particular syntax). Python typically doesn't encourage DSL-style programming, so this type of "syntax looking for a use case" isn't very popular. Paul ¹ DSL = Domain Specific Language, in case anyone isn't familiar with the term.
I have wanted this and suggested it before for use with typing. Defining protocols is obnoxiously verbose for "struct" like data and keyword arguments to subscript could help alleviate that. I often want to write type hint like this: ``` def foo(x: Protocol[id=int, name=str]): bar(x) baz(x) def bar(x: Protocol[name=str]): ... def baz(x: Protocol[id=int]): ... ``` So I either need to specify more restrictive types than necessary (which often is not possible because I reuse my functions), or generate a combinatorial number of Protocols. Beyond the obvious annoyances that come with having to generate many protocols, simply naming them is cognitively expensive. I don't need to bind an identifier when declaring a Union or specializing a generic but if I want to say I have a type with some attribute it MUST be named. On Fri, Jul 10, 2020 at 10:00 AM Paul Moore <p.f.moore@gmail.com> wrote:
On Fri, 10 Jul 2020 at 17:45, Steven D'Aprano <steve@pearwood.info> wrote:
I must admit I like the look of this, but I don't know what I would use it for.
It feels very much like the sort of "here's some syntax that might match someone's mental model of something" that is common in languages that focus on allowing users to build their own DSLs¹ (Lua and Groovy are two examples of the type of language I'm thinking of, although I don't know if either has this particular syntax).
Python typically doesn't encourage DSL-style programming, so this type of "syntax looking for a use case" isn't very popular.
Paul
¹ DSL = Domain Specific Language, in case anyone isn't familiar with the term. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HUELPG... Code of Conduct: http://python.org/psf/codeofconduct/
Hi Caleb You wrote (call it FUTURE)
def foo(x: Protocol[id=int, name=str]): bar(x) baz(x)
As you know, at present this causes a SyntaxError. However (call it NOW) def foo(x: Protocol[o(id=int, name=str])): bar(x) baz(x) has no syntax error. I'll now state some goals. 1. Define 'o' and Protocol so that NOW gives the semantics you wish for. 2. Extend Python so that FUTURE give the semantics you wish for. 3. And the NOW syntax continues to work as expected (without changing 'o' and Protocol). 4. And all current use of container[key] continues to work as before. I believe that it is possible to achieve these goals. My previous posts to this discussion outline some of the key ideas. My next step, when I have time, is to implement and publish general purpose code for the NOW part of this list of goals. I hope this will help you and others. with best regards Jonathan
Hello, On Wed, 15 Jul 2020 23:09:42 -0700 Caleb Donovick <donovick@cs.stanford.edu> wrote:
I have wanted this and suggested it before for use with typing.
Defining protocols is obnoxiously verbose for "struct" like data and keyword arguments to subscript could help alleviate that. I often want to write type hint like this:
``` def foo(x: Protocol[id=int, name=str]): bar(x) baz(x)
Just write them as: --- from __future__ import annotations def foo(x: Protocol(id=int, name=str)): --- So, if using in annotations is the usecase, no new language syntax is required, just update your "Protocol" definition to https://www.python.org/dev/peps/pep-0563/ [] -- Best regards, Paul mailto:pmiscml@gmail.com
On Fri, Jul 10, 2020, 12:44 Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Jul 10, 2020 at 02:48:31PM +0200, Alex Hall wrote:
I believe he was saying it would be weird if `d[1, 2]` called `d.__getitem__((1, 2))` but `d[1, 2, x=3]` called `d.__getitem__(1, 2, x=3)`. More generally, if `__getitem__` always receives a single positional argument now, it should probably stay that way for consistency.
Right -- I already said as much too :-)
I think Jonathan's K class is a red herring (except to the degree that it could be used for experimentation). For backwards compatibility, it would be difficult or impossible to support multiple positional arguments. But we could add keyword only args to subscripting (if there is a compelling need for them) easily.
Existing signatures would stay the same:
def __getitem__(self, item) def __setitem__(self, item, value)
but those who wanted keyword args could add them, with or without defaults:
def __getitem__(self, item, *, spam=None) def __setitem__(self, item, value, *, spam, eggs=None)
I trust the expected behaviour is obvious, but in case it's not:
myobj[1, 2, spam=3] # calls __getitem__((1, 2), spam=3)
myobj[1, 2, spam=3] = 999 # calls __setitem__((1, 2), 999, spam=3, eggs=None)
(And similarly for delitem of course.)
I must admit I like the look of this, but I don't know what I would use it for.
This would be EXTREMELY useful for xarray, which uses labelled dimensions. So you can index and slice based on dimension position, but also based on dimension name. It is currently very awkward to index or slice by dimension name, though (particularly for slicing).
I think it’s a reasonable idea and encourage you to start working on a design for the API and then a PRP. It would help if someone looked into a prototype implementation as well (once a design has been settled on). On Thu, Jul 16, 2020 at 03:31 Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Jul 15, 2020 at 11:09:42PM -0700, Caleb Donovick wrote:
I have wanted this and suggested it before for use with typing.
Was Guido interested or did he reject the idea?
-- Steven _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/4OJ7OX... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido (mobile)
Guido wrote:
I think it’s a reasonable idea and encourage you to start working on a design for the API and then a PRP.
In this post, I explore how the new API might interact with dict objects. (I think PRP is a typo for PEP.) Here is an example of the present behaviour >>> d = dict() >>> d[x=1, y=2] = 3 SyntaxError: invalid syntax with the syntax error occurring at the first '=' symbol. So what should be the new behaviour of: >>> d = dict() >>> d[x=1, y=2] = 3 To me, it is clear that we won't get a syntax error. This is because the proposal implies that >>> d[x=1, y=2] = 3 executes without error, for some values of 'd'. If these two lines >>> d = dict() >>> d[x=1, y=2] = 3 execute without error, then I would the expect >>> len(d) 1 >>> d[x=1, y=2] 3 I would further expect >>> k = list(d.keys())[0] to give an object 'k', with the property >>> d[k] 3 This object 'k' corresponds to the source code fragment x=1, y=2 in d[x=1, y=2] and I would expect 'k' to work as a key in any instance of dict. Here is something that might help the reader. It is that current Python gives >>> d = dict() >>> d['x=1', 'y=2'] = 3 >>> list(d.keys())[0] ('x=1', 'y=2') and we are looking for an analogous object for >>> d[x=1, y=2] = 3 To summarize and slightly extend the above. If >>> d = dict() >>> d[x=1, y=2] = 3 executes without error, then there is an object 'k' such that >>> d[x=1, y=2] = 5 >>> d[k] = 5 are equivalent. (This object 'k' is the key in the collection of (key, value) pairs that are the entries in the dict.) Aside: The key object, if it exists, has a type. Whether this type is a new type or an existing type is an implementation question. This message is focussed on the user experience. To conclude: If d[x=1, y=2] is allowed for 'd' a dict then consistency with the current dict behaviour requires the existence of a key object that corresponds to the fragment 'x=1, y=2'. Finally, I have tried to be both clear and concise. I hope my contribution helps. -- Jonathan
On Fri, Jul 17, 2020 at 11:14:21AM +0100, Jonathan Fine wrote:
So what should be the new behaviour of: >>> d = dict() >>> d[x=1, y=2] = 3
TypeError: dict subscripting takes no keyword arguments Just because something is syntactically allowed doesn't mean it has to be given a meaning in all circumstances. Just as not all functions accept keyword arguments: py> len(x=1) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: len() takes no keyword arguments so not all subscriptable objects will accept keywords. -- Steven
Steve and I have different opinions, as to what the new behaviour of: >>> d = dict() >>> d[x=1, y=2] = 3 should be. He prefers that the assignment fail with TypeError: dict subscripting takes no keyword arguments I prefer that the assignment succeed (and hence a new key-value pair is added to 'd'). I think this is a important question to get right. And a pressing question. If we choose unwisely now, it might be difficult to change our mind later (particularly after the feature has been released). To support his opinion, Steven wrote:
Just because something is syntactically allowed doesn't mean it has to be given a meaning in all circumstances.
I completely agree with this statement. And I hope that Steven agrees with the statement: Just because something can be semantically forbidden doesn't mean that it should to be. We have an opportunity to improve Python for the benefit of its users. I think it would be helpful to have some concise statements on the benefits of >>> d = dict() >>> d[x=1, y=2] = 3 TypeError: dict subscripting takes no keyword arguments And similarly, concise statements on the benefits of >>> d = dict() >>> d[x=1, y=2] = 3 >>> d[x=1, y=2] 3 By the way, I'd prefer that a problem seen in the one choice is instead expressed as a benefit of the other. To conclude, I'm pleased that this difference of opinion has emerged sooner rather than later. I hope we have a discussion that leads to an improved shared understanding. I hope this message helps. -- Jonathan
On Fri, Jul 17, 2020, 6:17 AM Jonathan Fine <jfine2358@gmail.com> wrote:
Guido wrote:
I think it’s a reasonable idea and encourage you to start working on a design for the API and then a PRP.
In this post, I explore how the new API might interact with dict objects. (I think PRP is a typo for PEP.)
...
To summarize and slightly extend the above. If >>> d = dict() >>> d[x=1, y=2] = 3 executes without error, then there is an object 'k' such that >>> d[x=1, y=2] = 5 >>> d[k] = 5 are equivalent. (This object 'k' is the key in the collection of (key, value) pairs that are the entries in the dict.)
Aside: The key object, if it exists, has a type. Whether this type is a new type or an existing type is an implementation question. This message is focussed on the user experience.
It seems to me that the validity of this key-object paradigm is directly tied to the decision of whether or not to change the get/set item dunder signatures. *If it turns out* the desired way forward is to not change the signature for __getitem__ and __setitem__, only then does a key object exists. If not, I suggest it means the key-object does not exist. On the other hand, if it were determined we'd rather have **kwargs become part of these signatures, there is no "key object". There is more of an unpacked kwargs, or unpacked namespace, object. So it isn't obvious to me that this object *must be* some key object. It certainly CAN be. But the syntax looks so much like kwargs being supplied to a function, a (currently ephemeral) namespace object, or unpacked kwargs object, might be more of a valid way to think about it:
x = dict(a="foo", b="bar") d[**x] # unpack kwargs here, rather than d[x] d[a="foo", b="bar"]
If one approaches it this way, then just as there isn't some namespace object that you can supply to a function that becomes kwd args in all situations, like:
f(x) == f(a="foo", b="bar") # this obviously doesn't make sense, since a and b are kwargs
...I'd be more than a little surprised if this worked:
d[x] == d[a="foo", b="bar"] # a and b certainly LOOK LIKE kwargs, don't they?
But! On the other hand, the huge difference here is that currently, the signature of the get/set item dunders only accepts a single key argument. This is a very big difference from arbitrary function signatures, which CAN accept kwd args. So maybe this *key object paradigm* does make sense, but what that REALLY means is we are committing ourselves to maintaining the __getitem__ and __setitem__ signatures and not, for example, letting them accept **kwargs. On Fri, Jul 17, 2020, 6:49 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Jul 17, 2020 at 11:14:21AM +0100, Jonathan Fine wrote:
So what should be the new behaviour of: >>> d = dict() >>> d[x=1, y=2] = 3
TypeError: dict subscripting takes no keyword arguments
Just because something is syntactically allowed doesn't mean it has to be given a meaning in all circumstances.
Another way forward could be to expand the syntax, but delay the decision on what it means. It would be a multi-step process: 1. allow usage of Dict[x=str, y=int] (including a TypeError on assignment, and TypeError for dictionaries, and TypeError for all other mappings) 2. delay the decision of which paradigm (key-object, or unpacked/namespace) makes sense (a TypeError everywhere else in the meantime) 3. once it is seen the new syntax is allowed but produces TypeError in all situations other than on `type` objects, let the people tell the steering council what they want the syntax to do in other situations So INITIALLY you only teach`type` objects-- and NOT regular dicts or mappings-- how to interpret this syntax:
Dict[x=str, y=int] # k, cool d[a="foo", b="bar"] # TypeError
If people later demand that you be able to do this (with real world code examples):
xy = dict(x=str, y=int) Dict[**xy] == Dict[x=str, y=int]
...that tells you people want the namespace/unpacked paradigm. On the other hand, if people start asking (with real world code examples) for some kind of key object so they can do this:
d[x] == d[a="foo", b="bar"]
...that tells you people are thinking about it more as a key-object paradigm. Rick.
On Fri, Jul 17, 2020, 8:16 AM Jonathan Fine <jfine2358@gmail.com> wrote:
Steve and I have different opinions, as to what the new behaviour of: >>> d = dict() >>> d[x=1, y=2] = 3 should be.
He prefers that the assignment fail with TypeError: dict subscripting takes no keyword arguments
I prefer that the assignment succeed (and hence a new key-value pair is added to 'd').
I definitely agree with Steven. It is obviously *possible* to create some brand new type of object that is "multi-assignment fragment." But why?! No clearly useful semantics comes to mind for this new object. Well, it would need to be hashable. Lots of things are though, so it's not like we have nothing to use as dict keys now. We don't lose anything if we add the feature but intake don't support it for dictionaries. If someone comes up with a really useful reason to have that MultiAssignmentType, is not usually considered a breaking change to go from "this raises am exception" to "this does something worthwhile" (but obviously, in all such cases, you can artificially construct code that will break without a certain exception).
Fwiw, I'm probably -0 on the feature itself. Someone suggested it could be useful for xarray, but I'm not sure now what that would look like. If someone had an example, I could easily be moved. I'm not against the original suggested use with type annotations, but I also don't really care about it. I don't think 'd[K(1, 2, 3, a=4, b=5)]' is bad as an existing spelling. On Fri, Jul 17, 2020, 12:08 PM David Mertz <mertz@gnosis.cx> wrote:
On Fri, Jul 17, 2020, 8:16 AM Jonathan Fine <jfine2358@gmail.com> wrote:
Steve and I have different opinions, as to what the new behaviour of: >>> d = dict() >>> d[x=1, y=2] = 3 should be.
He prefers that the assignment fail with TypeError: dict subscripting takes no keyword arguments
I prefer that the assignment succeed (and hence a new key-value pair is added to 'd').
I definitely agree with Steven. It is obviously *possible* to create some brand new type of object that is "multi-assignment fragment." But why?!
No clearly useful semantics comes to mind for this new object. Well, it would need to be hashable. Lots of things are though, so it's not like we have nothing to use as dict keys now.
We don't lose anything if we add the feature but intake don't support it for dictionaries. If someone comes up with a really useful reason to have that MultiAssignmentType, is not usually considered a breaking change to go from "this raises am exception" to "this does something worthwhile" (but obviously, in all such cases, you can artificially construct code that will break without a certain exception).
On Fri, Jul 17, 2020 at 9:20 AM David Mertz <mertz@gnosis.cx> wrote:
Fwiw, I'm probably -0 on the feature itself. Someone suggested it could be useful for xarray, but I'm not sure now what that would look like. If someone had an example, I could easily be moved.
agreed -- I can imagine the use case, but am not enough of an xarray user to com eup with a compelling example. But if it IS useful for xarray (and maybe Pandas, and who knows what other nifty packages -- maybe a database wrapper? -- that doesn't mean it has to be used for Mappings. Jonathan -- I'm still quite confused about what your proposed syntax with dicts means, could you maybe both: describe it in words and Give complete examples -- that is, what the dict looks like before an after the operation, and what is returned.
I'm not against the original suggested use with type annotations, but I also don't really care about it.
me neither, and I'm a little against it -- I really don't ike all the typing creeping into python these days ... -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Fri, Jul 17, 2020 at 12:30 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Fri, Jul 17, 2020 at 9:20 AM David Mertz <mertz@gnosis.cx> wrote:
Fwiw, I'm probably -0 on the feature itself. Someone suggested it could be useful for xarray, but I'm not sure now what that would look like. If someone had an example, I could easily be moved.
agreed -- I can imagine the use case, but am not enough of an xarray user to com eup with a compelling example.
But if it IS useful for xarray (and maybe Pandas, and who knows what other nifty packages -- maybe a database wrapper? -- that doesn't mean it has to be used for Mappings.
Jonathan -- I'm still quite confused about what your proposed syntax with dicts means, could you maybe both:
describe it in words
and
Give complete examples -- that is, what the dict looks like before an after the operation, and what is returned.
It seemed clear to me that it would assign some key object key to the supplied value:
key_object = K(a=1, b=2) # where K is some new key object type d1 = {key_object: 3} d2 = {} d2[a=1, b=2] = 3 assert d1==d2
It will be interesting to see if my intuition on it lines up with what Jonathan had in mind.
Jonathan and all! Thanks for picking up on this thread, I almost given up hope that anyone would be interested. Then it suddenly blew up :)! Jonathan, your suggestion makes sense as a stop-gap measure for current Python, but I'm unclear on the way forward to the new syntax: When you say we introduce a new syntax where:
value = d[K(1, 2, 3, a=4, b=5)] d[K(1, 2, 3, a=4, b=5)] = value
becomes:
value = d[1, 2, 3, a=4, b=5] d[1, 2, 3, a=4, b=5] = value
somehow the interpreter would need to know that the intent is to create an instance of 'K'. How is that association expressed? Or is the idea that - the same was as slices - there would be a predefined object within the language that is always the thing that gets created in these instances? It appears to me that the current behavior is: d[32] becomes d.__getitem__(32) d[2,4,"boo"] becomes d.__getitem__( (2,4,"boo") ) d[2:3:"boo"] becomes d.__getitem__( slice(2,3,"boo") ) So then maybe d[2,4,scare="boo"] could become d.__getitem__( {0:2, 1:4, "scare":"boo"} )? That is of course quite a bit different from the args, kwargs syntax, but I don't think it's invalid as positional arguments would get an integer key while keyword arguments would get a string key. With this change, of course your previous (current python) syntax would make K into a function as opposed to a class: def K(*args, **kwargs): ret_val = {} for idx, value in enumerate(args): ret_val[idx] = value for key, value in kwargs.items(): assert isinstance(key, str) ret_val[key] = value return ret_val
value = d[K(1, 2, 3, a=4, b=5)] d[K(1, 2, 3, a=4, b=5)] = value
Thanks again, Andras
key_object = K(a=1, b=2) # where K is some new key object type d1 = {key_object: 3} d2 = {} d2[a=1, b=2] = 3 assert d1==d2 was what I had in mind. The answer is YES, and in some ways better expressed. (And the new key object type is introduced only if necessary for
Thank you all, for your useful contributions. I particularly value the insight you've given me regarding the experience that underlies your views. I'll respond to your comments tomorrow. I am able today to respond to one question. Ricky asked if the desired user experience.) The use of dict literals makes the role of the key_object even clearer. For clarity, when we write >>> d[1, 2, 3] = 4 the associated key_object, for backwards compatibility, must be (1, 2, 3), which is a tuple. Again for clarity, in today's Python the statements >>> d[1, 2, 3] = 4 >>> d[(1, 2, 3)] = 4 are equivalent. I willingly accept this as a constraint. In my own mind, I require it. Once again, thank you all, and I'll say more tomorrow (about 18 hours from now). -- Jonathan
On Fri, Jul 17, 2020 at 11:49 AM Jonathan Fine <jfine2358@gmail.com> wrote:
key_object = K(a=1, b=2) # where K is some new key object type d1 = {key_object: 3} d2 = {} d2[a=1, b=2] = 3 assert d1==d2 was what I had in mind. The answer is YES, and in some ways better expressed. (And the new key object type is introduced only if necessary for
I am able today to respond to one question. Ricky asked if the desired user experience.)
Maybe I’m being dense here, but I need more explainatikn of what this “key object” is, how it works, or what its purpose is. The use of dict literals makes the role of the key_object even clearer.
Not to me :-( For clarity, when we write
>>> d[1, 2, 3] = 4 the associated key_object, for backwards compatibility, must be (1, 2, 3), which is a tuple.
Yes, and tuples are perfectly valid dict keys, so nothing new here.
Again for clarity, in today's Python the statements >>> d[1, 2, 3] = 4 >>> d[(1, 2, 3)] = 4 are equivalent. I willingly accept this as a constraint. In my own mind, I require it.
Which is due to tuples being created by the commas ... So what would the “key object” be I the proposed case: d2[a=1, b=2] A namedtuple? Or namedtuple-like object? The other trick here is that currently in indexes, the comma currently creates a tuple in the general case, for instance: Something[1:2, 3:4] creates a tuple with two slice objects in it. (Heavily used by numpy. So if d2[a=1, b=2] creates single object (which should not be dict-specific, so not a key per se), how would you created a tuple of more than one such object? Maybe: Something[(a=5, b=6), (c=7, d=8)] Anyway, I think a prototype is in order. -CHB Once again, thank you all, and I'll say more tomorrow (about 18 hours from
now). -- Jonathan
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/UFXWGF... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Fri, Jul 17, 2020, 12:10 David Mertz <mertz@gnosis.cx> wrote:
On Fri, Jul 17, 2020, 8:16 AM Jonathan Fine <jfine2358@gmail.com> wrote:
Steve and I have different opinions, as to what the new behaviour of: >>> d = dict() >>> d[x=1, y=2] = 3 should be.
He prefers that the assignment fail with TypeError: dict subscripting takes no keyword arguments
I prefer that the assignment succeed (and hence a new key-value pair is added to 'd').
I definitely agree with Steven. It is obviously *possible* to create some brand new type of object that is "multi-assignment fragment." But why?!
No clearly useful semantics comes to mind for this new object. Well, it would need to be hashable. Lots of things are though, so it's not like we have nothing to use as dict keys now.
We don't lose anything if we add the feature but intake don't support it for dictionaries. If someone comes up with a really useful reason to have that MultiAssignmentType, is not usually considered a breaking change to go from "this raises am exception" to "this does something worthwhile" (but obviously, in all such cases, you can artificially construct code that will break without a certain exception).
I think there are a few different options, with different advantages and disadvantages. Some have been touched on, but I don't think they are the only, or necessarily even best, options. I think it would be worthwhile looking at them all explicitly since I think they have different advantages and disadvantages: 1. Use keyword args. This has the advantage that it automatically handles putting things into a dict or limiting the potential keys, and will fail reliably if keyword indices aren't handled. But it has a bunch of issues. First, it is inconsistent with the existing indexing, which is a single tuple. Second, it only allows valid identifiers, which may be too limiting. Third, the position of positional and keyword indices are independent. 2. A single additional argument, containing a tuple of name/value tuples or a dict. This is consistent with the existing indexing, allows arbitrary identifiers, and will fail reliably if not handled. It still has the issue with positional and keyword indices having independent orders. 3. Use the existing tuple, but put keyword indices inside it as single-item dicts or named tuples. This has the advantage of keeping track of all the positions together, but you can already use these as indices so it has more potential for backwards compatibility issues. 4. Use the existing tuple, but put keyword indices in a new class. This keeps track of positions and doesn't have the same backwards compatibility issue, but would probably need either a new built-in or at least something in a stdlib module, and I am not sure where that would go. Care also needs to be made that it will fail properly in all cases where keyword indices aren't supported. 5. Use an entire new class for all indices when keyword indices are provided. This still requires a new class, and care has to be made to make sure it isn't accidentally handled improperly by existing classes. Overall I think option 2 is the best. It has pretty much no possibility of backwards-incompatibility, can handle the widest variety of values, and doesn't require any new classes or new behavior on the implementation side.
On Fri, Jul 17, 2020 at 12:19 PM David Mertz <mertz@gnosis.cx> wrote:
Fwiw, I'm probably -0 on the feature itself. Someone suggested it could be useful for xarray, but I'm not sure now what that would look like. If someone had an example, I could easily be moved.
Here is what it currently looks like to assign values to indices in xarray (adapted from a tutorial): ds["empty"].loc[dict(lon=5, lat=6)] = 10 This could be changed to: ds["empty"][lon=5, lat=6] = 10 This becomes even a bigger advantage if we include slicing, which I think we should: ds["empty"].loc[dict(lon=slice(1, 5), lat=slice(3, None))] = 10 to ds["empty"][lon=1:5, lat=6:] = 10
On Fri, Jul 17, 2020 at 4:12 PM Todd <toddrjen@gmail.com> wrote:
ds["empty"][lon=1:5, lat=6:] = 10
I agree that looks really nice. I think this is the first suggestion to allow slices on the RHS of keyword indexing. That's the part that was missing in me understanding how this would help xarray greatly. It's also somewhat of an additional novelty added to the proposal that is somewhat independent.
On 17/07/2020 21:11, Todd wrote:
On Fri, Jul 17, 2020 at 12:19 PM David Mertz <mertz@gnosis.cx> wrote:
Fwiw, I'm probably -0 on the feature itself. Someone suggested it could be useful for xarray, but I'm not sure now what that would look like. If someone had an example, I could easily be moved.
Here is what it currently looks like to assign values to indices in xarray (adapted from a tutorial):
ds["empty"].loc[dict(lon=5, lat=6)] = 10
This could be changed to:
ds["empty"][lon=5, lat=6] = 10
This becomes even a bigger advantage if we include slicing, which I think we should:
ds["empty"].loc[dict(lon=slice(1, 5), lat=slice(3, None))] = 10
to
ds["empty"][lon=1:5, lat=6:] = 10
Thanks for posting this. I had been really struggling to see a use case that made any kind of sense to me, possibly because I am not a data scientist and have no interest in becoming one. This helped a lot. I particularly like the slice notation, there is a clear win there. -- Rhodri James *-* Kynesim Ltd
On Fri, Jul 17, 2020 at 11:11:17AM -0400, Ricky Teachey wrote:
It seems to me that the validity of this key-object paradigm is directly tied to the decision of whether or not to change the get/set item dunder signatures.
But note that even if we allow keyword args in subscripts, we still don't have to change the signatures of existing objects. If keyword args aren't meaningful to a specific class, it will just change from a SyntaxError to a runtime TypeError.
*If it turns out* the desired way forward is to not change the signature for __getitem__ and __setitem__, only then does a key object exists. If not, I suggest it means the key-object does not exist.
On the other hand, if it were determined we'd rather have **kwargs become part of these signatures, there is no "key object". There is more of an unpacked kwargs, or unpacked namespace, object.
So it isn't obvious to me that this object *must be* some key object. It certainly CAN be. But the syntax looks so much like kwargs being supplied to a function, a (currently ephemeral) namespace object, or unpacked kwargs object, might be more of a valid way to think about it:
Indeed. And we need not require the method to unpack a kwargs dict themselves. We can allow (but not require!) keyword parameters in the method signature. Now the interpreter will handle the hard work of matching up keyword arguments to parameters. class MyClass: def __getitem__(self, item=None, *, x, y=0, **kw): # requires keyword arg x # optional keyword arg y # permits any other arbitrary keyword args instance = MyClass() instance[2:3, x=1, y=2, z=3] # receives arguments item=slice(2, 3), x=1, y=2, kw={'z': 3} For backwards-compatibility, there will only ever be a single positional argument passed into the method. That's because comma-separated values in a subscript are already passed as a tuple: # this calls __getitem__ with a single tuple argument obj[a,b:c,d] ==> (1, slice(2, 3), 4) So that's not going to change (at least not without a long and painful deprecation process). But adding support for keyword arguments requires no changes to any existing class or a new builtin "key object" type.
But! On the other hand, the huge difference here is that currently, the signature of the get/set item dunders only accepts a single key argument.
To be precise, you can put any signature you like into a `__getitem__` method, but only the first positional argument will be called from subscripting syntax. py> class Demo: ... def __getitem__(self, item, more=None, *args, x=0, y=1, **kw): ... return (item, more, args, x, y, kw) ... py> Demo()['item'] ('item', None, (), 0, 1, {}) Even if you pass comma-separated values, including slices, they all get packed into a tuple and passed as the first parameter `item`. But there is no need to emulate that for keyword args. They can, and I think should, simply be unpacked into keyword parameters exactly the same as function call syntax does. If, by chance, some class wants Jonathan's "keyword object" semantics, it's easy to get: def __getitem__(self, **kwargs): All the keyword arguments will be handed to you as a dict. You can then convert it to whatever special keyword object that suits your purposes. -- Steven
On Fri, Jul 17, 2020 at 7:21 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Jul 17, 2020 at 11:11:17AM -0400, Ricky Teachey wrote:
...
For backwards-compatibility, there will only ever be a single positional argument passed into the method. That's because comma-separated values in a subscript are already passed as a tuple:
# this calls __getitem__ with a single tuple argument obj[a,b:c,d] ==> (1, slice(2, 3), 4)
So that's not going to change (at least not without a long and painful deprecation process). But adding support for keyword arguments requires no changes to any existing class or a new builtin "key object" type.
This strikes me as problematic for having a consistent mental model of how stuff works in python. I think that for many the difference in the meaning of the syntax between item-getting/setting and function-calling would be... glaring. The left-hand-side syntax below looks pretty much identical in both cases (except for the sugar of the slice object), but it feels surprisingly different meanings for the positional arguments:
a, b, c, d = (1, 2, 3, 4) args = a, slice(b, c), d # LEFT HAND SIDE RIGHT HAND SIDE obj[a, b:c, d, e=5, f=6] == obj.__getitem__(args, e=5, f=6) f(a, slice(b,c), d, e=5, f=6) != f(args, d, e=5, f=6)
On the one hand, a fairly experienced person (who is familiar with the history of the item dunders, and a preexisting mental model that they are always being supplied a single positional argument ) should not have too much of a problem understanding WHY these would behave differently. But on the other hand, even as an experienced person, this really messes with my brain, looking at it. It's hard for me to believe this isn't going to be a painful distinction for a large number of people to hold in their head-- especially beginners (but not only beginners). A potentially elegant way around this glaring difference in the meaning of the syntax might be the key-object paradigm Jonathan Fine has suggested. However, that only works if you *disallow mixing together* positional arguments and kwd args inside the [ ]:
d[a="foo", b="bar"] == d[KeyObject(a="foo", b="bar")]
If you specifically desire to mix together positional and kwd arguments, the key-object paradigm isn't nearly as elegant... there's still magical stuff happening to "split off" the positional arguments from the keyword arguments:
# The positional arguments aren't part of the KeyObject d[a, b:c, d, e=5, f=6] == d.__getitem__((a, b:c, d), KeyObject(e=5, f=6))
This raises a question that needs to be answered, then: what would be the utility of mixing together positional and kwd arguments in this way? Even the xarray examples given so far don't seem to make use of this mixture. From my knowledge of pandas I am not sure what the meaning of this would be, either.
But! On the other hand, the huge difference here is that currently, the
signature of the get/set item dunders only accepts a single key argument.
To be precise, you can put any signature you like into a `__getitem__` method, but only the first positional argument will be called from subscripting syntax.
I appreciate you pointing that out, it's a meaningful distinction because it could mean that rewriting the default __getitem__ and __setitem__ signatures-- as you seem to be arguing for over the key-object paradigm-- won't be as nearly a painful/breaking change as it would be otherwise.
Even if you pass comma-separated values, including slices, they all get packed into a tuple and passed as the first parameter `item`.
But there is no need to emulate that for keyword args. They can, and I think should, simply be unpacked into keyword parameters exactly the same as function call syntax does.
Yup. Personally I really like the idea that unpacking some dict x will work the same as unpacking in a function call:
# Pretty! d[**x] == d[a=1, b=2] f(**x] == f(a=1, b=2)
Getting to the end here, I guess I'm really just wondering whether mixing positional and kwd args is worth doing. If it isn't, then the key-object paradigm seems like might be a nicer solution to me for the sole reason that the mental model gets confused otherwise.
On Sat, Jul 18, 2020 at 12:18:38AM -0400, Ricky Teachey wrote:
On Fri, Jul 17, 2020 at 7:21 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Jul 17, 2020 at 11:11:17AM -0400, Ricky Teachey wrote:
...
For backwards-compatibility, there will only ever be a single positional argument passed into the method. That's because comma-separated values in a subscript are already passed as a tuple:
# this calls __getitem__ with a single tuple argument obj[a,b:c,d] ==> (1, slice(2, 3), 4)
So that's not going to change (at least not without a long and painful deprecation process). But adding support for keyword arguments requires no changes to any existing class or a new builtin "key object" type.
This strikes me as problematic for having a consistent mental model of how stuff works in python. I think that for many the difference in the meaning of the syntax between item-getting/setting and function-calling would be... glaring.
Yes, but what are we going to do about it? Break a million existing scripts, applications and libraries that rely on `__getitem__` only receiving a single tuple argument when passed comma-separated values? I don't think the core devs will accept that, I think the numpy devs will object strongly, and I'm pretty sure that the Steering Council will say no. But if you disagree, then feel free to start writing a PEP. The fact that multiple comma-separated subscripts are passed to the method as a single tuple argument is a historical fact we (almost certainly) cannot change now. But that is orthogonal to how we choose to proceed with keyword arguments. We aren't obliged to repeat the same design. We have a few choices: (1) There is a minor inconsistency between subscripts and function calls, so let's just forget all about the whole idea. If we cannot agree on a decision, this is the default. (Status quo wins a stalement.) (2) Let the Perfect be the enemy of the Good. No compromises! Insist on breaking the entire Python ecosystem for the sake of fixing this minor inconsistency between subscripting and function calls. (3) Reinforce that inconsistency, and continue to obfuscate the similarities, by handling keyword arguments in the same fashion as comma-separated subscripts. This will require a new builtin "key-object" class, and it will require every class that cares about keyword arguments in their subscripts to parse them themselves. We'll also need to decide how to combine subscripts and keywords: obj[a, b:c, x=1] # is this a tuple argument (a, slice(b, c), key(x=1)) # or key argument key(a, slice(b, c), x=1) (4) Or keep the subscript processing as-is, for backwards-compatibility, but pass keyword arguments as normal for functions. Both (3) and (4) would get the job done, but (3) requires everyone who needs keyword arguments to parse the tuple and/or key object by hand to extract them. Having done something similiar in the past (emulating keyword-only arguments in Python 2), I can tell you this is painful. With (4), the interpreter automatically matches up passed keyword arguments to my `__getitem__` parameters, filling in defaults if needed, and I can concentrate on using the arguments, not parsing them.
On the one hand, a fairly experienced person (who is familiar with the history of the item dunders, and a preexisting mental model that they are always being supplied a single positional argument ) should not have too much of a problem understanding WHY these would behave differently.
But on the other hand, even as an experienced person, this really messes with my brain, looking at it. It's hard for me to believe this isn't going to be a painful distinction for a large number of people to hold in their head-- especially beginners (but not only beginners).
I think you are overthinking this. Inside a subscript, multiple positional arguments are collected into a tuple and passed as a single argument. (Vaguely similar to the way `*args` positional arguments are collected.) Why? Because of historical reasons and backwards compatibility. If someone wants to trawl the archives looking for a discussion, I look forward to hearing the result, but we don't need to care about the past reason to learn it. If you define your getitem like this: def __getitem__(self, item, more): then you'll get a TypeError when you try to subscript, because `more` doesn't get a value. This is already the case! So anyone writing getitem methods already knows that positional arguments aren't handled the same way as function calls. If you give `more` a default, then you won't get an error... but even the tiniest bit of testing will reveal that `item` receives a tuple, and `more` always gets the default. In practice, anyone writing getitem methods only ever gives it a single argument (aside from self of course :-) so if they add keyword arguments, the most natural way to do so is to make them keyword only: def __getitem__(self, item, *, more, keyword, arguments): (with or without defaults). Problem solved.
A potentially elegant way around this glaring difference in the meaning of the syntax might be the key-object paradigm Jonathan Fine has suggested. However, that only works if you *disallow mixing together* positional arguments and kwd args inside the [ ]:
No, we can still mix them. We just have to decide whether to mix them together into a tuple, or into a key-object: obj[a,b:c, x=1, y=2] # tuple (a, slice(b, c), key(x=1, y=2)) # or key-object key(a, slice(b, c), x=1, y=2) Either way, it means that the getitem method itself has to pull the object (tuple or key-object) apart, parsing keyword arguments, filling in defaults, and dealing with missing values. Why would we choose to do that when the interpreter can do it for us? If you do want to do it yourself, you can always just use `**kwargs`` like you would in any other method. Likewise, if you want an atomic "keyword object", just pass your kwargs to something like SimpleNamespace: py> from types import SimpleNamespace py> kwargs = {'spam': 1, 'eggs': 2} py> SimpleNamespace(**kwargs) namespace(spam=1, eggs=2)
This raises a question that needs to be answered, then: what would be the utility of mixing together positional and kwd arguments in this way?
That's easy to answer. Positional subscripts represent a key or index or equivalent; keyword arguments can represent *modifiers*. So I might index into a tree: tree[18, order='preorder'] # or postorder, inorder or a two-dimensional array: matrix[18, order='row'] # row-column order rather than column-row I don't think builtin dicts should support this concept, but third-party mappings might allow you to modify what happens if the key already exists: # add or replace if the key already exists? mapping[key, if_exist='add'] = 5 [...]
Getting to the end here, I guess I'm really just wondering whether mixing positional and kwd args is worth doing. If it isn't, then the key-object paradigm seems like might be a nicer solution to me for the sole reason that the mental model gets confused otherwise.
Here is an exercise for you. Let's pretend that function calls existed with the same limitation that Jonathan is sujecting for subscripting. Go through your code and find any method or function that currently uses keyword arguments (that will be nearly all of them, if we include "positional-or-keyword arguments"). Now imagine that instead of receiving named keyword parameters, all of your functions received an opaque namespace "key" object, which you can pretend is just a dict. Re-write your methods to have this signature: def method(self, **key): That's Jonathan's model. If you pass keyword args, they all get packed into a single parameter. Now you get to pull it apart, test for unwanted keywords, deal with missing keywords, assign defaults, etc. Go through the exercise. I have -- I've written Python 2 code that needed to handle-keyword only arguments, and this was the only way to do so. The "only one parameter, which may receive a keyobject" design will have us writing code something like this: # I want this: def __getitem__(self, item, * a, b, c, d=0) # but have to write this: def def __getitem__(self, item): # Determine whether we got any keyword arguments. if isinstance(item, keyobject): keys = item item = () elif isinstance(item, tuple): # Assuming that all keyword args are at the end; # if there could be more than one keyobject, or if # they could be anywhere in the tuple, this becomes # even more complex. I don't even want to think # about that case. if item and isinstance(item[-1], keyobject): keys = item[-1] item = item[:-1] else: keys = keyobject() # Now extract the parameters from the key object. if 'a' in keys: a = keys.pop('a') else: raise TypeError('missing keyword argument "a"') # same for b and c d = keys.pop('d', 0) # Check for unexpected keywords. if keys: raise TypeError('unexpected keyword') (Any bugs in the above are not intentional.) And now finally we can actually use the keyword parameters and write the method. -- Steven
Andras Tantos wrote
Thanks for picking up on this thread, I almost given up hope that anyone would be interested. Then it suddenly blew up :)!
to thank all who expressed interest and support. (Andras means 'blew up' in a good way!) And I thank you, Andras. It's nice to be appreciated. I now have more confidence that this discussion will result in changes to Python that will help some users, and hinder none. I'll now turn to next steps. First, something that I have a special responsibility for. Then some technical remarks, on opaque and transparent use of keys. Finally, some suggestions for others. MY SPECIAL RESPONSIBILITY ========================= On 16 July I wrote:
I'll now state some goals.
1. Define 'o' and Protocol so that NOW gives the semantics you wish for. 2. Extend Python so that FUTURE give the semantics you wish for. 3. And the NOW syntax continues to work as expected (without changing 'o' and Protocol). 4. And all current use of container[key] continues to work as before.
I believe that it is possible to achieve these goals. My previous posts to this discussion outline some of the key ideas. My next step, when I have time, is to implement and publish general purpose code for the NOW part of this list of goals.
I hope to have this done by the end of this month. It won't be my immediate priority. I'm attending the online TeX Users Group conference next week, and haven't written my talks yet. http://tug.org/tug2020/ TECHNICAL REMARKS =================== Here are some remarks related to this goal which, I hope, will help you. I hope it helps both in your discussions, and in any responsibilities you choose to take on. My remarks concern the coexistence of what I call the opaque and transparent points of view, regarding the syntax >>> d[1, 2, x=3, y=4] A dict accepts any hashable object as a key. >>> class A: __hash__ = lambda self: 0 >>> {A(): 1} {<__main__.A object at 0x7f221087fa90>: 1} My proposal is that >>> d[1, 2, x=3, y=4] = 5 results in the call >>> dict.__setitem__(d, k, 5) where k is a hashable object, perhaps K(1, 2, x=3, y=4) for some new class K. Let's call this an OPAQUE use of k = K(1, 2, x=3, y=4). We don't look into k, and the dict class checks that it is hashable. In some other situations we wish for >>> d[1, 2, x=3, y=4] = 5 to result in a call such as >>>> __setitem__(d, (1, 2), 5, x=3, y=4) where __setitem__ is a function defined in the implementation of D = type(d). I fully support this goal, although not the implementation details in the example above. It is my opinion that this goal is best achieved by making easier TRANSPARENT use of k = K(1, 2, x=3, y=4) Here's how it goes. First we write class D: @wibble def __setitem__(self, val, u, v, x, y): pass # Or do something. Next, we define wibble. It will be a SIGNATURE CHANGING ADAPTER. Those who know how to make decorators will, I hope, have little difficulty in defining wibble to do what is required. For this exercise, assume that k.argv = (1, 2), and k.kwargs = dict(x=3, y=4). The main idea is that each class will make an opaque use of the key, unless it uses a signature changing adapter to enable a transparent use of the key. Thus, by default key use is opaque, but if a class wishes it can make transparent use. Without examples and working code (which I've promised for the end of the month), this might be hard to understand. However this is I hope clear enough for now. SUGGESTIONS FOR OTHERS ======================== Arising out of the discussion, it seems to me there are several different mental models for >>> d[...] = x >>> x = d[...] >>> del d[...] particularly when keyword arguments are allowed in k[...]. I think clarifying the various mental models would be very helpful. By the way, I don't think any one model will be better than all the others, for all purposes. This is particularly true for beginners and experts. And also when there is already an existing mental model, such as in xarray. The implementation of the resulting PEP would require change to user documentation, and in particular to the tutorial. https://docs.python.org/3/ https://docs.python.org/3/tutorial/index.html I think it would be helpful to review the docs, to find some of the places where the docs will need change. Some say that to really understand something, you have to explain it to someone else, who doesn't share your background. I'm sure there are other useful things to do, such as exploring how all this might help users of xarray. But I'm even less qualified to make suggestions in that area. I hope this post is helpful, particularly as I spent an hour writing it, and perhaps a similar time thinking about these matters. I think that's enough for now. When I have some code for you to look at, I'll let you know. -- Jonathan
On 2020-07-18 10:04, Steven D'Aprano wrote:
On Sat, Jul 18, 2020 at 12:18:38AM -0400, Ricky Teachey wrote:
On Fri, Jul 17, 2020 at 7:21 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Jul 17, 2020 at 11:11:17AM -0400, Ricky Teachey wrote:
...
For backwards-compatibility, there will only ever be a single positional argument passed into the method. That's because comma-separated values in a subscript are already passed as a tuple:
# this calls __getitem__ with a single tuple argument obj[a,b:c,d] ==> (1, slice(2, 3), 4)
So that's not going to change (at least not without a long and painful deprecation process). But adding support for keyword arguments requires no changes to any existing class or a new builtin "key object" type.
This strikes me as problematic for having a consistent mental model of how stuff works in python. I think that for many the difference in the meaning of the syntax between item-getting/setting and function-calling would be... glaring.
Yes, but what are we going to do about it?
Break a million existing scripts, applications and libraries that rely on `__getitem__` only receiving a single tuple argument when passed comma-separated values? I don't think the core devs will accept that, I think the numpy devs will object strongly, and I'm pretty sure that the Steering Council will say no.
But if you disagree, then feel free to start writing a PEP.
The fact that multiple comma-separated subscripts are passed to the method as a single tuple argument is a historical fact we (almost certainly) cannot change now. But that is orthogonal to how we choose to proceed with keyword arguments. We aren't obliged to repeat the same design.
We have a few choices:
(1) There is a minor inconsistency between subscripts and function calls, so let's just forget all about the whole idea. If we cannot agree on a decision, this is the default. (Status quo wins a stalement.)
(2) Let the Perfect be the enemy of the Good. No compromises! Insist on breaking the entire Python ecosystem for the sake of fixing this minor inconsistency between subscripting and function calls.
(3) Reinforce that inconsistency, and continue to obfuscate the similarities, by handling keyword arguments in the same fashion as comma-separated subscripts. This will require a new builtin "key-object" class, and it will require every class that cares about keyword arguments in their subscripts to parse them themselves.
We'll also need to decide how to combine subscripts and keywords:
obj[a, b:c, x=1] # is this a tuple argument (a, slice(b, c), key(x=1)) # or key argument key(a, slice(b, c), x=1)
(4) Or keep the subscript processing as-is, for backwards-compatibility, but pass keyword arguments as normal for functions.
Both (3) and (4) would get the job done, but (3) requires everyone who needs keyword arguments to parse the tuple and/or key object by hand to extract them. Having done something similiar in the past (emulating keyword-only arguments in Python 2), I can tell you this is painful.
With (4), the interpreter automatically matches up passed keyword arguments to my `__getitem__` parameters, filling in defaults if needed, and I can concentrate on using the arguments, not parsing them.
[snip] I haven't followed this thread for a while, but, to me, it seems that the simplest option would be to pass the keyword arguments as a dict: obj[a, b:c, x=1] does obj.__getitem__((a, slice(b, c)), dict(x=1)) If there are no keyword arguments, then there's no dict. This does mean that __getitem__ could be called with 1 or 2 arguments and __setitem__ could be called with 2 or 3 arguments, so it would be advisable to make the additional argument optional: def __getitem__(self, args, kwargs=None): Additionally, __setitem__ would look a little odd: def __setitem__(self, args, value, kwargs=None): It would raise a TypeError if there were keyword arguments, but __getitem__, etc, didn't accept any.
On Sat, Jul 18, 2020 at 05:30:40PM +0100, MRAB wrote:
I haven't followed this thread for a while, but, to me, it seems that the simplest option would be to pass the keyword arguments as a dict:
What are you going to do with that keyword argument dict? Most use-cases I can think of will have to unpack the dict into named parameters. (Just as the interpreter does for us, in function calls.) When I'm writing functions, for every one use of `**kwargs`, I have about a hundred uses of named parameters. I'm pretty sure most people are similar. I don't think that keyword args in subscripts will be different. I'm pretty sure that nearly everyone will want to unpack the `**kwargs` into named parameters nearly all of the time. So why force them to do the unpacking themselves when the interpreter already has all the machinery to do it? If you want kwargs to collect arbitrary keyword arguments, you can just declare your getitem method (and setitem and delitem if needed) to take `**kwargs`, and the interpreter will oblige. If you want no keyword arguments at all, you don't have to change a thing. Your getitem (etc) methods have no keyword parameters, so using keywords in the subscript will fail with TypeError.
obj[a, b:c, x=1] does obj.__getitem__((a, slice(b, c)), dict(x=1))
I don't think that is either simpler or more useful than a straight- forward binding of arguments to parameters, just as function calls already do: obj[a, b:c, x=1] ==> obj.__getitem__((a, slice(b, c)), x=1) It's unfortunate that positional subscripts are bundled together into a tuple. I have resisted calling that design a "mistake" because I don't know the reason for the design. There was probably a good reason for it, back in the ancient history of Python when getslice and getitem were unified. But I am sure that it will be a horrible mistake to emulate that decision for keyword arguments. If anyone wants or needs their keyword arguments to be bundled into a single kwargs parameter, you can have it. All you need do is declare your method with a `**kwargs` parameter, and the interpreter will do the rest. -- Steven
Yes please. FWIW, IIRC the “bundle values in a single parameter” predates the demise of __getslice__. It probably was meant for dict keys primarily (no surprise there). The bundling would have been easier for the C API — __getitem__ is as old as Python there. On Sat, Jul 18, 2020 at 10:49 Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Jul 18, 2020 at 05:30:40PM +0100, MRAB wrote:
I haven't followed this thread for a while, but, to me, it seems that the simplest option would be to pass the keyword arguments as a dict:
What are you going to do with that keyword argument dict?
Most use-cases I can think of will have to unpack the dict into named parameters. (Just as the interpreter does for us, in function calls.)
When I'm writing functions, for every one use of `**kwargs`, I have about a hundred uses of named parameters. I'm pretty sure most people are similar. I don't think that keyword args in subscripts will be different. I'm pretty sure that nearly everyone will want to unpack the `**kwargs` into named parameters nearly all of the time.
So why force them to do the unpacking themselves when the interpreter already has all the machinery to do it?
If you want kwargs to collect arbitrary keyword arguments, you can just declare your getitem method (and setitem and delitem if needed) to take `**kwargs`, and the interpreter will oblige.
If you want no keyword arguments at all, you don't have to change a thing. Your getitem (etc) methods have no keyword parameters, so using keywords in the subscript will fail with TypeError.
obj[a, b:c, x=1] does obj.__getitem__((a, slice(b, c)), dict(x=1))
I don't think that is either simpler or more useful than a straight- forward binding of arguments to parameters, just as function calls already do:
obj[a, b:c, x=1] ==> obj.__getitem__((a, slice(b, c)), x=1)
It's unfortunate that positional subscripts are bundled together into a tuple. I have resisted calling that design a "mistake" because I don't know the reason for the design. There was probably a good reason for it, back in the ancient history of Python when getslice and getitem were unified. But I am sure that it will be a horrible mistake to emulate that decision for keyword arguments.
If anyone wants or needs their keyword arguments to be bundled into a single kwargs parameter, you can have it. All you need do is declare your method with a `**kwargs` parameter, and the interpreter will do the rest.
-- Steven _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/5MAWLG... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido (mobile)
On Sat, Jul 18, 2020 at 1:43 PM Guido van Rossum <guido@python.org> wrote:
Yes please.
Yes to what, exactly? -CHB FWIW, IIRC the “bundle values in a single parameter” predates the demise of
__getslice__. It probably was meant for dict keys primarily (no surprise there). The bundling would have been easier for the C API — __getitem__ is as old as Python there.
On Sat, Jul 18, 2020 at 10:49 Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Jul 18, 2020 at 05:30:40PM +0100, MRAB wrote:
I haven't followed this thread for a while, but, to me, it seems that the simplest option would be to pass the keyword arguments as a dict:
What are you going to do with that keyword argument dict?
Most use-cases I can think of will have to unpack the dict into named parameters. (Just as the interpreter does for us, in function calls.)
When I'm writing functions, for every one use of `**kwargs`, I have about a hundred uses of named parameters. I'm pretty sure most people are similar. I don't think that keyword args in subscripts will be different. I'm pretty sure that nearly everyone will want to unpack the `**kwargs` into named parameters nearly all of the time.
So why force them to do the unpacking themselves when the interpreter already has all the machinery to do it?
If you want kwargs to collect arbitrary keyword arguments, you can just declare your getitem method (and setitem and delitem if needed) to take `**kwargs`, and the interpreter will oblige.
If you want no keyword arguments at all, you don't have to change a thing. Your getitem (etc) methods have no keyword parameters, so using keywords in the subscript will fail with TypeError.
obj[a, b:c, x=1] does obj.__getitem__((a, slice(b, c)), dict(x=1))
I don't think that is either simpler or more useful than a straight- forward binding of arguments to parameters, just as function calls already do:
obj[a, b:c, x=1] ==> obj.__getitem__((a, slice(b, c)), x=1)
It's unfortunate that positional subscripts are bundled together into a tuple. I have resisted calling that design a "mistake" because I don't know the reason for the design. There was probably a good reason for it, back in the ancient history of Python when getslice and getitem were unified. But I am sure that it will be a horrible mistake to emulate that decision for keyword arguments.
If anyone wants or needs their keyword arguments to be bundled into a single kwargs parameter, you can have it. All you need do is declare your method with a `**kwargs` parameter, and the interpreter will do the rest.
-- Steven _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/5MAWLG... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido (mobile) _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/NAWEMP... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Sat, Jul 18, 2020 at 9:11 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Sat, Jul 18, 2020 at 1:43 PM Guido van Rossum <guido@python.org> wrote:
Yes please.
Yes to what, exactly?
-CHB
FWIW, IIRC the “bundle values in a single parameter” predates the demise
of __getslice__. It probably was meant for dict keys primarily (no surprise there). The bundling would have been easier for the C API — __getitem__ is as old as Python there.
On Sat, Jul 18, 2020 at 10:49 Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Jul 18, 2020 at 05:30:40PM +0100, MRAB wrote:
I haven't followed this thread for a while, but, to me, it seems that the simplest option would be to pass the keyword arguments as a dict:
What are you going to do with that keyword argument dict?
Most use-cases I can think of will have to unpack the dict into named parameters. (Just as the interpreter does for us, in function calls.)
When I'm writing functions, for every one use of `**kwargs`, I have about a hundred uses of named parameters. I'm pretty sure most people are similar. I don't think that keyword args in subscripts will be different. I'm pretty sure that nearly everyone will want to unpack the `**kwargs` into named parameters nearly all of the time.
So why force them to do the unpacking themselves when the interpreter already has all the machinery to do it?
If you want kwargs to collect arbitrary keyword arguments, you can just declare your getitem method (and setitem and delitem if needed) to take `**kwargs`, and the interpreter will oblige.
If you want no keyword arguments at all, you don't have to change a thing. Your getitem (etc) methods have no keyword parameters, so using keywords in the subscript will fail with TypeError.
obj[a, b:c, x=1] does obj.__getitem__((a, slice(b, c)), dict(x=1))
Not to this... I don't think that is either simpler or more useful than a straight-
forward binding of arguments to parameters, just as function calls already do:
obj[a, b:c, x=1] ==> obj.__getitem__((a, slice(b, c)), x=1)
But to this.
It's unfortunate that positional subscripts are bundled together into a
tuple. I have resisted calling that design a "mistake" because I don't know the reason for the design. There was probably a good reason for it, back in the ancient history of Python when getslice and getitem were unified. But I am sure that it will be a horrible mistake to emulate that decision for keyword arguments.
If anyone wants or needs their keyword arguments to be bundled into a single kwargs parameter, you can have it. All you need do is declare your method with a `**kwargs` parameter, and the interpreter will do the rest.
These two paragraphs make it clear what Steven was proposing. I am supporting him in this.
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On 17.07.20 22:11, Todd wrote:
On Fri, Jul 17, 2020 at 12:19 PM David Mertz <mertz@gnosis.cx <mailto:mertz@gnosis.cx>> wrote:
Fwiw, I'm probably -0 on the feature itself. Someone suggested it could be useful for xarray, but I'm not sure now what that would look like. If someone had an example, I could easily be moved.
Here is what it currently looks like to assign values to indices in xarray (adapted from a tutorial):
ds["empty"].loc[dict(lon=5, lat=6)] = 10
This could be changed to:
ds["empty"][lon=5, lat=6] = 10
This becomes even a bigger advantage if we include slicing, which I think we should:
ds["empty"].loc[dict(lon=slice(1, 5), lat=slice(3, None))] = 10
But this looks unnecessarily complicated. Why can't xarray allow the following: ds["empty"]["lon", 1:5, "lat", 3:] = 10 which looks very close to the proposed syntax below. Not that I'm against the proposal but I think that any use case involving *only* keyword arguments isn't a very strong one, because it can easily be solved that way without a change to existing syntax. Only when positional and keyword arguments are mixed, it becomes difficult to distinguish (for both the reader and the method).
to
ds["empty"][lon=1:5, lat=6:] = 10
On Sun, Jul 19, 2020 at 6:35 PM Dominik Vilsmeier <dominik.vilsmeier@gmx.de> wrote:
But this looks unnecessarily complicated. Why can't xarray allow the following:
ds["empty"]["lon", 1:5, "lat", 3:] = 10
which looks very close to the proposed syntax below. Not that I'm against the proposal but I think that any use case involving *only* keyword arguments isn't a very strong one, because it can easily be solved that way without a change to existing syntax.
Xarray already allows positional slices in multiple dimensions. The existing dict weirdness is to have a way to introduce the named dimensions. But suppose that the array in question had a first listed 'altitude'. I think nowadays, we can write: arr.loc[50:60, 1:5, 3:] If we only reference dimensions by number not by name. Under the "commas separate keys from values" this would be difficult: arr.loc[50:60, "lon", 1:5, "lat", 3:] Yes, I can imagine a rule like "If it is a slice that wasn't preceded by a string, treat it as positional, otherwise if it is a string treat it as a key, but treat the next thing after a string as the slice value corresponding to that key." That seems more error prone and harder to grok than the potential: arr.loc[50:60, lon=1:5, lat=3:] Where you'd still just have to know that "axis 0 is altitude" of your particular array.
On 7/19/2020 6:49 PM, David Mertz wrote:
On Sun, Jul 19, 2020 at 6:35 PM Dominik Vilsmeier <dominik.vilsmeier@gmx.de <mailto:dominik.vilsmeier@gmx.de>> wrote:
But this looks unnecessarily complicated. Why can't xarray allow the following:
ds["empty"]["lon", 1:5, "lat", 3:] = 10
which looks very close to the proposed syntax below. Not that I'm against the proposal but I think that any use case involving *only* keyword arguments isn't a very strong one, because it can easily be solved that way without a change to existing syntax.
Xarray already allows positional slices in multiple dimensions. The existing dict weirdness is to have a way to introduce the named dimensions. But suppose that the array in question had a first listed 'altitude'. I think nowadays, we can write:
arr.loc[50:60, 1:5, 3:]
If we only reference dimensions by number not by name. Under the "commas separate keys from values" this would be difficult:
arr.loc[50:60, "lon", 1:5, "lat", 3:]
Yes, I can imagine a rule like "If it is a slice that wasn't preceded by a string, treat it as positional, otherwise if it is a string treat it as a key, but treat the next thing after a string as the slice value corresponding to that key."
That seems more error prone and harder to grok than the potential:
arr.loc[50:60, lon=1:5, lat=3:]
Where you'd still just have to know that "axis 0 is altitude" of your particular array.
In addition, what if you actually wanted: arr.loc["lon", "lon", 1:5, "lat", 3:] That is: what if you have a string argument whose value happens to be name of one of your "named" parameters? Eric
On Fri, Jul 17, 2020 at 9:22 PM Ricky Teachey <ricky@teachey.org> wrote:
# The positional arguments aren't part of the KeyObject d[a, b:c, d, e=5, f=6] == d.__getitem__((a, b:c, d), KeyObject(e=5, f=6))
This raises a question that needs to be answered, then: what would be the utility of mixing together positional and kwd arguments in this way?
Even the xarray examples given so far don't seem to make use of this mixture. From my knowledge of pandas I am not sure what the meaning of this would be, either.
One use case that comes up in xarray and pandas is support for indicating indexing "modes". For example, when indexing with floating point numbers it's convenient to be able to opt-in to approximate indexing, e.g., something like: array.loc[longitude, latitude, method='nearest', tolerance=0.001] (This use case is actually already mentioned in PEP 472, as "an optional contextual to the indexing": https://www.python.org/dev/peps/pep-0472/#use-cases)
On Sun, Jul 19, 2020 at 9:53 PM Stephan Hoyer <shoyer@gmail.com> wrote:
On Fri, Jul 17, 2020 at 9:22 PM Ricky Teachey <ricky@teachey.org> wrote:
# The positional arguments aren't part of the KeyObject d[a, b:c, d, e=5, f=6] == d.__getitem__((a, b:c, d), KeyObject(e=5, f=6))
This raises a question that needs to be answered, then: what would be the utility of mixing together positional and kwd arguments in this way?
Even the xarray examples given so far don't seem to make use of this mixture. From my knowledge of pandas I am not sure what the meaning of this would be, either.
One use case that comes up in xarray and pandas is support for indicating indexing "modes". For example, when indexing with floating point numbers it's convenient to be able to opt-in to approximate indexing, e.g., something like: array.loc[longitude, latitude, method='nearest', tolerance=0.001]
I had to stare at this for a good 30 seconds before I realized that this wasn't a function/method call. Except for the square brackets instead of parentheses, it would be. Honestly, this whole idea smells to me like just wanting another type of function call with different semantics. IMHO the above example would be better spelled as: array.loc.get(longitude, latitude, method='nearest', tolerance=0.001) Pros: Much more obvious, perfectly legal today, zero backward compatibility issues, probably the way many libraries with such functionality are doing it now. Cons: A few extra characters (depending on the method name; here it's only four) and a couple of taps on the Shift key (but you're already used to that). Whereas for the indexing example: Pros: Explicit about indexing a collection and returning an item (but a good method name like "get" also has this quality). Cons: Not immediately obvious to an uninitiated reader of the code, new syntax, code using it is not compatible with Python 3.9 or earlier. As for setting a value, the indexing approach has a bit more value (since a function call can't be an lvalue), but you can just change the method name from "get" to "set" and add an extra parameter to be the value to set, and you're done. The pros and cons still apply. TL;DR: -1 all the way. Just use ordinary methods.
On Sun, Jul 19, 2020 at 10:24:42PM -0400, Jonathan Goble wrote:
IMHO the above example would be better spelled as:
array.loc.get(longitude, latitude, method='nearest', tolerance=0.001)
One of the use-cases for this is for type hints, where you cannot use method calls. So that's right out. Minor issues with "use a method call" include that it makes using slices difficult, as slice syntax is not legal in a method call, and that item assignment and deletion are awkward (you have to use a setter and deleter method, like some sort of ~~caveman~~ Java developer, instead of writing what you mean). -- Steven
On Sun, Jul 19, 2020 at 10:25 PM Jonathan Goble <jcgoble3@gmail.com> wrote:
On Sun, Jul 19, 2020 at 9:53 PM Stephan Hoyer <shoyer@gmail.com> wrote:
On Fri, Jul 17, 2020 at 9:22 PM Ricky Teachey <ricky@teachey.org> wrote:
# The positional arguments aren't part of the KeyObject d[a, b:c, d, e=5, f=6] == d.__getitem__((a, b:c, d), KeyObject(e=5, f=6))
This raises a question that needs to be answered, then: what would be the utility of mixing together positional and kwd arguments in this way?
Even the xarray examples given so far don't seem to make use of this mixture. From my knowledge of pandas I am not sure what the meaning of this would be, either.
One use case that comes up in xarray and pandas is support for indicating indexing "modes". For example, when indexing with floating point numbers it's convenient to be able to opt-in to approximate indexing, e.g., something like: array.loc[longitude, latitude, method='nearest', tolerance=0.001]
I had to stare at this for a good 30 seconds before I realized that this wasn't a function/method call. Except for the square brackets instead of parentheses, it would be.
Honestly, this whole idea smells to me like just wanting another type of function call with different semantics.
IMHO the above example would be better spelled as:
array.loc.get(longitude, latitude, method='nearest', tolerance=0.001)
Pros: Much more obvious, perfectly legal today, zero backward compatibility issues, probably the way many libraries with such functionality are doing it now. Cons: A few extra characters (depending on the method name; here it's only four) and a couple of taps on the Shift key (but you're already used to that).
Whereas for the indexing example:
Pros: Explicit about indexing a collection and returning an item (but a good method name like "get" also has this quality). Cons: Not immediately obvious to an uninitiated reader of the code, new syntax, code using it is not compatible with Python 3.9 or earlier.
As for setting a value, the indexing approach has a bit more value (since a function call can't be an lvalue), but you can just change the method name from "get" to "set" and add an extra parameter to be the value to set, and you're done. The pros and cons still apply.
TL;DR: -1 all the way. Just use ordinary methods.
Another Con for the function call approach, you don't get to use the nice slicing syntax. Compare:
array.loc.get(slice(None), slice(None), method='nearest', tolerance=0.001) array.loc.get[:, :, method='nearest', tolerance=0.001]
You also can't delete using a function call:
del array.loc.get[:, :, method='nearest', tolerance=0.001]
On Sun, Jul 19, 2020 at 10:27 PM Jonathan Goble <jcgoble3@gmail.com> wrote:
One use case that comes up in xarray and pandas is support for indicating
indexing "modes". For example, when indexing with floating point numbers it's convenient to be able to opt-in to approximate indexing, e.g., something like: array.loc[longitude, latitude, method='nearest', tolerance=0.001]
I had to stare at this for a good 30 seconds before I realized that this wasn't a function/method call. Except for the square brackets instead of parentheses, it would be. Honestly, this whole idea smells to me like just wanting another type of function call with different semantics. IMHO the above example would be better spelled as: array.loc.get(longitude, latitude, method='nearest', tolerance=0.001)
The problem is that with Pandas, Xarray, and other data frame/data array libraries, using slices is typical. Continuing with the example: arr.loc[45:46, 69:70, altitude=300:400, tolerance=0.0001, projection=" WGS1984"] There's no reason to tack a function onto the .loc accessor, it could just be a method on the array/data frame itself. So there are no extra characters. But doing this with parentheses would need a different new feature of allowing slices directly as arguments. That said, pandas.IndexSlice and numpy.s_ both provide accessors to allow passing slices more easily. So this is possible now: arr.get(I[45:46], I[69:70], altitude=I[300:400], tolerance=0.0001, projection="WGS1984") ... I mean, assuming someone writes an appropriate .get() method for their favorite data array/frame library. But Python itself has everything needed.
On Sun, Jul 19, 2020 at 9:53 PM Stephan Hoyer <shoyer@gmail.com> wrote:
On Fri, Jul 17, 2020 at 9:22 PM Ricky Teachey <ricky@teachey.org> wrote:
# The positional arguments aren't part of the KeyObject d[a, b:c, d, e=5, f=6] == d.__getitem__((a, b:c, d), KeyObject(e=5, f=6))
This raises a question that needs to be answered, then: what would be the utility of mixing together positional and kwd arguments in this way?
Even the xarray examples given so far don't seem to make use of this mixture. From my knowledge of pandas I am not sure what the meaning of this would be, either.
One use case that comes up in xarray and pandas is support for indicating indexing "modes". For example, when indexing with floating point numbers it's convenient to be able to opt-in to approximate indexing, e.g., something like: array.loc[longitude, latitude, method='nearest', tolerance=0.001]
Thanks to those who pointed out that using kwargs to specify modes of key management is an obvious use case. Additionally I agree it is extremely likely that most use cases will involve specific kwd args, as Steven argues: On Sat, Jul 18, 2020 at 1:49 PM Steven D'Aprano <steve@pearwood.info> wrote:
...
Most use-cases I can think of will have to unpack the dict into named parameters. (Just as the interpreter does for us, in function calls.)
When I'm writing functions, for every one use of `**kwargs`, I have about a hundred uses of named parameters. I'm pretty sure most people are similar. I don't think that keyword args in subscripts will be different. I'm pretty sure that nearly everyone will want to unpack the `**kwargs` into named parameters nearly all of the time.
So why force them to do the unpacking themselves when the interpreter already has all the machinery to do it?
And Guido and Christopher Barker agree: On Sun, Jul 19, 2020 at 12:46 AM Guido van Rossum <guido@python.org> wrote:
On Sat, Jul 18, 2020 at 9:11 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Sat, Jul 18, 2020 at 1:43 PM Guido van Rossum <guido@python.org> wrote:
Yes please.
Yes to what, exactly?
-CHB
Not to this...
I don't think that is either simpler or more useful than a straight-
forward binding of arguments to parameters, just as function calls already do:
obj[a, b:c, x=1] ==> obj.__getitem__((a, slice(b, c)), x=1)
But to this.
It's unfortunate that positional subscripts are bundled together into a
tuple. I have resisted calling that design a "mistake" because I don't know the reason for the design. There was probably a good reason for it, back in the ancient history of Python when getslice and getitem were unified. But I am sure that it will be a horrible mistake to emulate that decision for keyword arguments.
If anyone wants or needs their keyword arguments to be bundled into a single kwargs parameter, you can have it. All you need do is declare your method with a `**kwargs` parameter, and the interpreter will do the rest.
These two paragraphs make it clear what Steven was proposing. I am supporting him in this.
So it's a very good point and I support the idea. Steven replied to my concern about the inconsistent mental model: On Sat, Jul 18, 2020 at 5:09 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Jul 18, 2020 at 12:18:38AM -0400, Ricky Teachey wrote:
This strikes me as problematic for having a consistent mental model of how stuff works in python. I think that for many the difference in the meaning of the syntax between item-getting/setting and function-calling would be... glaring.
Yes, but what are we going to do about it?
...
We have a few choices:
...
(3) Reinforce that inconsistency, and continue to obfuscate the similarities, by handling keyword arguments in the same fashion as comma-separated subscripts. This will require a new builtin "key-object" class, and it will require every class that cares about keyword arguments in their subscripts to parse them themselves.
We'll also need to decide how to combine subscripts and keywords:
obj[a, b:c, x=1] # is this a tuple argument (a, slice(b, c), key(x=1)) # or key argument key(a, slice(b, c), x=1)
So it's worth reiterating that parsing of the keyword arguments can be accomplished relatively easily using a decorator that groups the arguments into some sort of key object, as Jonathan Fine is suggesting. On Sat, Jul 18, 2020 at 7:50 AM Jonathan Fine <jfine2358@gmail.com> wrote:
...
In some other situations we wish for >>> d[1, 2, x=3, y=4] = 5 to result in a call such as >>>> __setitem__(d, (1, 2), 5, x=3, y=4) where __setitem__ is a function defined in the implementation of D = type(d).
I fully support this goal, although not the implementation details in the example above. It is my opinion that this goal is best achieved by making easier TRANSPARENT use of k = K(1, 2, x=3, y=4)
Here's how it goes. First we write class D: @wibble def __setitem__(self, val, u, v, x, y): pass # Or do something.
Next, we define wibble. It will be a SIGNATURE CHANGING ADAPTER. Those who know how to make decorators will, I hope, have little difficulty in defining wibble to do what is required. For this exercise, assume that k.argv = (1, 2), and k.kwargs = dict(x=3, y=4).
The main idea is that each class will make an opaque use of the key, unless it uses a signature changing adapter to enable a transparent use of the key. Thus, by default key use is opaque, but if a class wishes it can make transparent use.
Without examples and working code (which I've promised for the end of the month), this might be hard to understand. However this is I hope clear enough for now.
So anyone who wants to define their arguments ungrouped (ie, as separate positional args OR as kwd names) can group them into a key object the way the dunder methods currently expect like this: class C: @groupify def __getitem__(self, pos1, pos2="foo", *, kwd1, kwd2="bar"): ... return ... @groupify def __setitem__(self, value, pos1, pos2="foo", *, kwd1, kwd2="bar"): ... Such a decorator could be added to the standard lib. I'm not a C expert, but I'm assuming it could be written in C so that it is as fast as the regular function call machinery. The function generated by groupify/wibble would have the same signature as current __getitem__ or __setitem__ methods. And the positional key argument would be passed the KeyObject, or K, instance. The decorator machinery would break apart the K object and pass it into the decorated function the way Guido, Christopher B, Steven D, et al desire. No change to the current signatures needed, and no extremely odd mental model to deal with. I do not know if this is the best way forward or not. But I think the pros and cons ought to be thoroughly considered as an option. I don't feel qualified to do all the weighing, but it seems to make a lot of sense to me.
On 20/07/2020 06:11, Ricky Teachey wrote:
On Sun, Jul 19, 2020 at 9:53 PM Stephan Hoyer <shoyer@gmail.com> wrote:
One use case that comes up in xarray and pandas is support for indicating indexing "modes". For example, when indexing with floating point numbers it's convenient to be able to opt-in to approximate indexing, e.g., something like: array.loc[longitude, latitude, method='nearest', tolerance=0.001]
Thanks to those who pointed out that using kwargs to specify modes of key management is an obvious use case.
Ironically that example pushes me back to -1. It may look a lot like xarray and pandas working, but that just means it should be in xarray and/or pandas. It doesn't feel at all right for regular Python to be muddling indices and modal parameters to something that isn't a function call like that. -- Rhodri James *-* Kynesim Ltd
On Mon, Jul 20, 2020 at 3:17 AM Rhodri James <rhodri@kynesim.co.uk> wrote:
Ironically that example pushes me back to -1. It may look a lot like xarray and pandas working, but that just means it should be in xarray and/or pandas.
after following most of this discussion, I'm still not sure what we'd get with keywords in indexing. But I do think it would be nice of we could use slice syntax in other places. That would allow things like xarray and pandas to use slices in regular function calls. here's an example from the xarray docs: da.isel(space=0, time=slice(None, 2)) wouldn't that be nice as: da.isel(space=0, time=:2) or: da.sel(time=slice("2000-01-01", "2000-01-02")) could be: da.sel(time="2000-01-01":"2000-01-02") As far as I can tell, slicing syntax is currently a syntax error in these cases, and any others I thought to test. Is there a reason to not allow syntax for creating a slice object to be used anywhere (Or more places, anyway)? By the way, I just noticed this note in the xarray docs: """Note: We would love to be able to do indexing with labeled dimension names inside brackets, but unfortunately, Python does yet not support indexing with keyword arguments like da[space=0] """ So they would like it :-) -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
I am unsure of the process if there's interest. Should I revise the PEP and create a new one? On Tue, 21 Jul 2020 at 06:29, Christopher Barker <pythonchb@gmail.com> wrote:
On Mon, Jul 20, 2020 at 3:17 AM Rhodri James <rhodri@kynesim.co.uk> wrote:
Ironically that example pushes me back to -1. It may look a lot like xarray and pandas working, but that just means it should be in xarray and/or pandas.
after following most of this discussion, I'm still not sure what we'd get with keywords in indexing.
But I do think it would be nice of we could use slice syntax in other places. That would allow things like xarray and pandas to use slices in regular function calls. here's an example from the xarray docs:
da.isel(space=0, time=slice(None, 2))
wouldn't that be nice as:
da.isel(space=0, time=:2)
or:
da.sel(time=slice("2000-01-01", "2000-01-02"))
could be:
da.sel(time="2000-01-01":"2000-01-02")
As far as I can tell, slicing syntax is currently a syntax error in these cases, and any others I thought to test.
Is there a reason to not allow syntax for creating a slice object to be used anywhere (Or more places, anyway)?
By the way, I just noticed this note in the xarray docs:
"""Note: We would love to be able to do indexing with labeled dimension names inside brackets, but unfortunately, Python does yet not support indexing with keyword arguments like da[space=0] """ So they would like it :-)
-CHB
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VSQO7A... Code of Conduct: http://python.org/psf/codeofconduct/
-- Kind regards, Stefano Borini
On Mon, 20 Jul 2020 at 04:27, Jonathan Goble <jcgoble3@gmail.com> wrote:
One use case that comes up in xarray and pandas is support for indicating indexing "modes". For example, when indexing with floating point numbers it's convenient to be able to opt-in to approximate indexing, e.g., something like: array.loc[longitude, latitude, method='nearest', tolerance=0.001]
I had to stare at this for a good 30 seconds before I realized that this wasn't a function/method call. Except for the square brackets instead of parentheses, it would be.
Honestly, this whole idea smells to me like just wanting another type of function call with different semantics.
IMHO the above example would be better spelled as:
array.loc.get(longitude, latitude, method='nearest', tolerance=0.001)
Pros: Much more obvious, perfectly legal today, zero backward compatibility issues, probably the way many libraries with such functionality are doing it now. Cons: A few extra characters (depending on the method name; here it's only four) and a couple of taps on the Shift key (but you're already used to that).
Cons - cannot assign to method call - cannot use slicing syntax With PEP: array[lon=0:10, lat=0:10, method="nearest", tolerance=0.001] = 42 Without PEP (syntactically valid, although not valid xarray API): array[K(lon=slice(0, 10), lat=slice(0, 10), method="nearest", tolerance=0.001)] = 42 Gerrit.
On Sat, 18 Jul 2020 at 18:31, MRAB <python@mrabarnett.plus.com> wrote:
[snip] I haven't followed this thread for a while, but, to me, it seems that the simplest option would be to pass the keyword arguments as a dict:
obj[a, b:c, x=1] does obj.__getitem__((a, slice(b, c)), dict(x=1))
If there are no keyword arguments, then there's no dict.
Could the entire argument be turned into a namedtuple? obj[a, b:c] does obj.__getitem__((a, slice(b, c)) or obj.__getitem__(namedtuple(_0=a, _1=slice(b, c))) obj[a, b:c, d=e:f] does obj.__getitem__(namedtuple(_0=a, _1=slice(b, c), d=slice(e:f))) it would seem a namedtuple is a more natural extension of the current tuple, fully backward compatible (for sane code), while allowing for keywords that are valid identifiers (which should be an acceptable limitation). The other restriction would be that the keyword-indexing cannot use all-numeric identifiers prepended by an underscore. Gerrit.
On Tue, Jul 21, 2020, 5:48 AM Gerrit Holl <gerrit.holl@gmail.com> wrote:
On Sat, 18 Jul 2020 at 18:31, MRAB <python@mrabarnett.plus.com> wrote:
[snip] I haven't followed this thread for a while, but, to me, it seems that the simplest option would be to pass the keyword arguments as a dict:
obj[a, b:c, x=1] does obj.__getitem__((a, slice(b, c)), dict(x=1))
If there are no keyword arguments, then there's no dict.
Could the entire argument be turned into a namedtuple?
The original rejected PEP had some good reasons for not going with a namedtuple. I haven't read it in a while but I suggest considering them. The biggest one being, a new class would have to be created and instantiated for every indexing operation.
You have to find a core dev who is willing to act as a Sponsor. I recommend asking Steven d’Aprano (but I do not know if he’s interested). Until then, hash out the precise spec for the idea here. Coming up with a solid motivation is also important. On Tue, Jul 21, 2020 at 01:15 Stefano Borini <stefano.borini@gmail.com> wrote:
I am unsure of the process if there's interest. Should I revise the PEP and create a new one?
On Tue, 21 Jul 2020 at 06:29, Christopher Barker <pythonchb@gmail.com> wrote:
On Mon, Jul 20, 2020 at 3:17 AM Rhodri James <rhodri@kynesim.co.uk>
Ironically that example pushes me back to -1. It may look a lot like xarray and pandas working, but that just means it should be in xarray and/or pandas.
after following most of this discussion, I'm still not sure what we'd get with keywords in indexing.
But I do think it would be nice of we could use slice syntax in other
wrote: places. That would allow things like xarray and pandas to use slices in regular function calls. here's an example from the xarray docs:
da.isel(space=0, time=slice(None, 2))
wouldn't that be nice as:
da.isel(space=0, time=:2)
or:
da.sel(time=slice("2000-01-01", "2000-01-02"))
could be:
da.sel(time="2000-01-01":"2000-01-02")
As far as I can tell, slicing syntax is currently a syntax error in
these cases, and any others I thought to test.
Is there a reason to not allow syntax for creating a slice object to be
used anywhere (Or more places, anyway)?
By the way, I just noticed this note in the xarray docs:
"""Note: We would love to be able to do indexing with labeled dimension
names inside brackets, but unfortunately, Python does yet not support indexing with keyword arguments like da[space=0]
""" So they would like it :-)
-CHB
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VSQO7A... Code of Conduct: http://python.org/psf/codeofconduct/
-- Kind regards,
Stefano Borini _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/FVBXHE... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido (mobile)
On Mon, 2020-07-20 at 22:27 -0700, Christopher Barker wrote:
On Mon, Jul 20, 2020 at 3:17 AM Rhodri James <rhodri@kynesim.co.uk> wrote:
Ironically that example pushes me back to -1. It may look a lot like xarray and pandas working, but that just means it should be in xarray and/or pandas.
after following most of this discussion, I'm still not sure what we'd get with keywords in indexing.
But I do think it would be nice of we could use slice syntax in other places. That would allow things like xarray and pandas to use slices in regular function calls. here's an example from the xarray docs:
da.isel(space=0, time=slice(None, 2))
wouldn't that be nice as:
da.isel(space=0, time=:2)
or:
da.sel(time=slice("2000-01-01", "2000-01-02"))
could be:
da.sel(time="2000-01-01":"2000-01-02")
As far as I can tell, slicing syntax is currently a syntax error in these cases, and any others I thought to test.
Is there a reason to not allow syntax for creating a slice object to be used anywhere (Or more places, anyway)?
By the way, I just noticed this note in the xarray docs:
"""Note: We would love to be able to do indexing with labeled dimension names inside brackets, but unfortunately, Python does yet not support indexing with keyword arguments like da[space=0] """
This would be the thing I would think of first when indexing with keywords. But, there are a few points about named dimensions: First, using it for named dimensions, means you don't actually need to mix it with normal tuple indexing, mixing both seems rather confusing? (I cannot think of how to interpret it) Second, using keyword arguments for indexing `mode=` or `method=` switches as Stephan Hoyer mentioned as well seems neat. But I am worried that the two potential uses clash way too much and my gut feeling is to prefer the labeled use (which is why I would be extremely hesitant to add mode-switching things to NumPy or pandas). I might rather prefer mode switching to be spelled as:: temperature.loc(method="nearest")[longitude=longs, latitude=lats] even if that has to create an intermediate indexer object (twice, since `.loc` here is also an index helper object). (This means axis labels must be strings, that is likely no issue, but should maybe be mentioned.) Thus, for most containers, my knee jerk reaction would be to discourage the use of keywords in indexing for mode switching. But some of the use-cases seemed more like class factories, for which there is no clash of these two concepts/applications. That said, labeled dimensions/axis do seem like nice syntax with quite a bit of potential to me, even with 3 dimensions, remembering whether your coordinate order was x,y,z or z,x,y or z,y,x can be annoying (especially if you mix in a 1-D dataset with only a z axis). Cheers, Sebastian
So they would like it :-)
-CHB
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VSQO7A... Code of Conduct: http://python.org/psf/codeofconduct/
On Tue, Jul 21, 2020, 12:14 PM Sebastian Berg
First, using it for named dimensions, means you don't actually need to mix it with normal tuple indexing, mixing both seems rather confusing?
temperature.loc(method="nearest")[longitude=longs, latitude=lats]
I probably don't disagree on the API for xarray or similar library. But that choice would be up to the library, not Python developers. Whatever the way keyword argument are passed to .__getitem__()[*], they are available for the class to do whatever it likes with them. [*] In my opinion, passing the keyword argument using anything other than the standard **kws style sounds crazy. We have perfectly good ways to do that for every other function or method. Don't invent something new and different that only works with .__getitem__().
On Tue, Jul 21, 2020 at 9:15 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mon, 2020-07-20 at 22:27 -0700, Christopher Barker wrote:
On Mon, Jul 20, 2020 at 3:17 AM Rhodri James <rhodri@kynesim.co.uk> wrote:
Ironically that example pushes me back to -1. It may look a lot like xarray and pandas working, but that just means it should be in xarray and/or pandas.
after following most of this discussion, I'm still not sure what we'd get with keywords in indexing.
But I do think it would be nice of we could use slice syntax in other places. That would allow things like xarray and pandas to use slices in regular function calls. here's an example from the xarray docs:
da.isel(space=0, time=slice(None, 2))
wouldn't that be nice as:
da.isel(space=0, time=:2)
or:
da.sel(time=slice("2000-01-01", "2000-01-02"))
could be:
da.sel(time="2000-01-01":"2000-01-02")
As far as I can tell, slicing syntax is currently a syntax error in these cases, and any others I thought to test.
Is there a reason to not allow syntax for creating a slice object to be used anywhere (Or more places, anyway)?
By the way, I just noticed this note in the xarray docs:
"""Note: We would love to be able to do indexing with labeled dimension names inside brackets, but unfortunately, Python does yet not support indexing with keyword arguments like da[space=0] """
This would be the thing I would think of first when indexing with keywords. But, there are a few points about named dimensions:
First, using it for named dimensions, means you don't actually need to mix it with normal tuple indexing, mixing both seems rather confusing? (I cannot think of how to interpret it)
Second, using keyword arguments for indexing `mode=` or `method=` switches as Stephan Hoyer mentioned as well seems neat. But I am worried that the two potential uses clash way too much and my gut feeling is to prefer the labeled use (which is why I would be extremely hesitant to add mode-switching things to NumPy or pandas). I might rather prefer mode switching to be spelled as::
temperature.loc(method="nearest")[longitude=longs, latitude=lats]
even if that has to create an intermediate indexer object (twice, since `.loc` here is also an index helper object). (This means axis labels must be strings, that is likely no issue, but should maybe be mentioned.)
Thus, for most containers, my knee jerk reaction would be to discourage the use of keywords in indexing for mode switching. But some of the use-cases seemed more like class factories, for which there is no clash of these two concepts/applications.
That said, labeled dimensions/axis do seem like nice syntax with quite a bit of potential to me, even with 3 dimensions, remembering whether your coordinate order was x,y,z or z,x,y or z,y,x can be annoying (especially if you mix in a 1-D dataset with only a z axis).
For what it's worth, I (as the original author of xarray) totally agree with both Sebastian and Christopher. For indexing labeled arrays, the most compelling use-case is cleaner syntax for creating slice() objects along with keyword arguments for dimension names. I don't particularly care whether that's spelled with [] or (), e.g., da.sel(time="2000-01-01":"2000-01-02") or da.loc[time="2000-01-01":"2000-01-02"] neither of which is currently valid syntax. The further advantages of supporting keyword arguments in __getitem__/__setitem__ would be: 1. We wouldn't need separate methods for positional vs keyword argument indexing. Currently, xarray has both .loc[] and .sel(). 2. We could support matching syntax with keyword arguments in assignment. This is mostly relevant for inexperienced Python users, who will try something like "da.sel(x=0) = value" and encounter a SyntaxError. (This does come up with some regularity, because xarray's target audience includes scientists who often aren't experienced programmers.
Relevant to Jonathan's suggestion for a "key-object", Marco has been performing some experiments on an immutable dict: https://mail.python.org/archives/list/python-ideas@python.org/message/K7CRVW... -- Steven
On Sat, Jul 18, 2020, 00:21 Ricky Teachey <ricky@teachey.org> wrote:
This raises a question that needs to be answered, then: what would be the utility of mixing together positional and kwd arguments in this way?
Even the xarray examples given so far don't seem to make use of this mixture. From my knowledge of pandas I am not sure what the meaning of this would be, either.
For xarray an index can have a position, a label, or both. So being able to mix positional and label-based indexing is important. I just didn't include that in the examples because I was focusing on the advantage of label-based indexing.
On Tue, Jul 21, 2020 at 2:05 PM David Mertz <mertz@gnosis.cx> wrote:
On Tue, Jul 21, 2020, 12:14 PM Sebastian Berg
First, using it for named dimensions, means you don't actually need to mix it with normal tuple indexing, mixing both seems rather confusing?
temperature.loc(method="nearest")[longitude=longs, latitude=lats]
[*] In my opinion, passing the keyword argument using anything other than the standard **kws style sounds crazy. We have perfectly good ways to do that for every other function or method. Don't invent something new and different that only works with .__getitem__().
It would also be possible to pass __getitem__ exactly one other argument, a dict with all the named arguments. So instead of __getitem__(inds, **kwargs) simply __getitem__(inds, kwargs) "kwargs" would be the same in both cases, a dict with the key/value pairs. In most cases a dict is what people are going to want anyway. It avoids having to unpack then repack the dict. But it would make it slightly harder for adding parameters to indexing, although I am skeptical of that use-case to begin with. It would also make it feasible to use labels that aren't valid variable names, although it may not be desirable to support that.
As I said I would, I've now published a package on PyPI to support our exploration and development of this idea. You'll find more information on this new thread: Subject: Package kwkey and PEP 472 -- Support for indexing with keyword arguments URL: https://mail.python.org/archives/list/python-ideas@python.org/thread/ZAELJR4...
On Fri, 17 Jul 2020 at 20:58, Christopher Barker <pythonchb@gmail.com> wrote:
So what would the “key object” be I the proposed case:
d2[a=1, b=2]
A namedtuple? Or namedtuple-like object?
This was already discussed in PEP-472 as the "namedtuple" strategy, but among the various negative points were the relative lack of maturity of the namedtuple and the _n magic field that we assigned to it. I kind of feel they are still relevant, but I might be wrong. The PEP is 5 years old. -- Kind regards, Stefano Borini
On Mon, 20 Jul 2020 at 03:26, Jonathan Goble <jcgoble3@gmail.com> wrote:
One use case that comes up in xarray and pandas is support for indicating indexing "modes". For example, when indexing with floating point numbers it's convenient to be able to opt-in to approximate indexing, e.g., something like: array.loc[longitude, latitude, method='nearest', tolerance=0.001]
I had to stare at this for a good 30 seconds before I realized that this wasn't a function/method call. Except for the square brackets instead of parentheses, it would be.
Honestly, this whole idea smells to me like just wanting another type of function call with different semantics.
I agree and in fact it was a very weak point. The main point, in my opinion, would be axis naming. Whatever the term "axis" means. In the case of an array, it can be obvious. In the case of the starting mail, axis is basically the degree of freedom of customisation of a typing class. -- Kind regards, Stefano Borini
participants (24)
-
Alex Hall
-
Andras Tantos
-
Antoine Pitrou
-
Caleb Donovick
-
Christopher Barker
-
David Mertz
-
Dominik Vilsmeier
-
Eric V. Smith
-
Gerrit Holl
-
Guido van Rossum
-
Joao S. O. Bueno
-
Jonathan Fine
-
Jonathan Goble
-
MRAB
-
Neil Girdhar
-
Paul Moore
-
Paul Sokolovsky
-
Rhodri James
-
Ricky Teachey
-
Sebastian Berg
-
Stefano Borini
-
Stephan Hoyer
-
Steven D'Aprano
-
Todd