__iter__(), keys(), and the mapping protocol

This has been bouncing around in my head for a while regarding the requisite keys() method on mappings: How come the ** unpacking operator, a built-in language feature, relies on a non-dunder to operate? To me, I mean to say, requiring that classes implement keys() – a method whose name is totally undistinguished – in order to conform to the mapping protocol feels like a design running counter to Python's norm of using dunders for everything "hidden". I am not sure if it feels dirty to anybody else, however. Interestingly, the docs already say <https://docs.python.org/3/reference/datamodel.html#object.__iter__> that *[f]or mappings, [__iter__()] should iterate over the keys of the container*, but it of course is not enforced in any way at present. So, then — how about enforcing it? Should __iter__(), for the reasons above, replace the current purpose of keys() in mappings? I'm not properly equipped at the moment to mess around with CPython (sorry), but I assume at a minimum this would entail either replacing all instances of PyMapping_Keys() with PyObject_GetIter() or alternatively changing PyMapping_Keys() to call the latter. Does it sound like a reasonable change overall? Eli

I can see custom mapping types where iterating the keys() would be trivial, but items() could be expensive. I could use that as an argument, but I don't have to. The keys() method is part of the API, just like index() and count() are part of the sequence API. To be treated like a mapping everywhere, python requires that you define* a keys() method, so why not use it? I don't see anything wrong with python using "public" methods, in this context. * If you use ABCs, then you don't need to define keys(), but that’s a tangent.

I don't think switching to __iter__ will cause dict(a_list) to produce anything other than what it does now - a traceback if the list is anything but a list of pairs. I think if we were to go forward with switching map-unpacking to __iter__, it would produce confusing mappings like you show in your example. I don't think it’s a good idea to switch to __iter__, or even make a dunder method for keys. The dunder methods are dunder methods because they are not generally directly useful. I don't see a major problem with having the mapping api call keys() - this is not a next()/__next__() situation, where the method is not generally directly useful.

On Wed, Sep 12, 2018 at 4:42 PM Michael Selik <mike@selik.org> wrote:
You want to have a Mapping that does not supply a keys method? What use case motivated your proposal?
Yes, my proposal was to consider allowing __iter__() to subsume keys() entirely, for the reasons outlined in my second email -- which I'm just realizing was an accidental "reply one" to Alex Walters rather than a "reply all", yikes! Here it is, duplicated: Ahh, no, I phrased my question a bit badly -- I'm not proposing that the
However, Serihy's example of {**[0, 2, 1]} is so-plainly irreconcilable -- somewhat embarrassing for me to have missed it -- that I'm now reluctantly no longer in favor. (Well, really, I'm tempted to say *why not?*, but I do see that it wouldn't be a good thing overall.) And I still kind of feel that there should be a dunder involved somewhere in this, but nowhere near strongly enough to dispute that *"[t]he dunder methods are dunder methods because they are not generally directly useful. [There doesn't seem to be] a major problem with having the mapping api call keys() [...]"*, as it's reasonable and rationalizes the current system well enough. Thank you for bearing with ;) Eli On Wed, Sep 12, 2018 at 4:42 PM, Michael Selik <mike@selik.org> wrote:

Someone wrote: Granted, my only strong argument is that the ** unpacking operator depends on this method to do its job, and it's currently alone amongst Python's operators in depending on a non-dunder to do so I like this argument. And I think it's important. Here's some background facts class dict(object) | dict() -> new empty dictionary | dict(mapping) -> new dictionary initialized from a mapping object's | (key, value) pairs | dict(iterable) -> new dictionary initialized as if via: | d = {} | for k, v in iterable: | d[k] = v | dict(**kwargs) -> new dictionary initialized with the name=value pairs | in the keyword argument list. For example: dict(one=1, two=2) >>> list(zip('abc', range(3))) [('a', 0), ('b', 1), ('c', 2)] >>> dict(list(zip('abc', range(3)))) {'b': 1, 'a': 0, 'c': 2} >>> dict(zip('abc', range(3))) {'b': 1, 'a': 0, 'c': 2} >>> dict(**zip('abc', range(3))) TypeError: type object argument after ** must be a mapping, not zip >>> dict(**list(zip('abc', range(3)))) TypeError: type object argument after ** must be a mapping, not list Now for my opinions. (Yours might be different.) First, it is my opinion that it is not reasonable to insist that the argument after ** must be a mapping. All that is required to construct a dictionary is a sequence of (key, value) pairs. The dict(iterable) construction proves that point. Second, relaxing the ** condition is useful. Consider the following. >>> class NS: pass >>> ns = NS() >>> ns.a = 3 >>> ns.b = 5 >>> ns.__dict__ {'b': 5, 'a': 3} >>> def fn(**kwargs): return kwargs >>> fn(**ns) TypeError: fn() argument after ** must be a mapping, not NS >>> fn(**ns.__dict__) {'b': 5, 'a': 3} The Zen of Python says Namespaces are one honking great idea -- let's do more of those! I see many advantages in using a namespace to build up the keyword arguments for a function call. For example, it could do data validation (of both keys/names and values). And once we have the namespace, used for this purpose, I find it natural to call it like so >>> fn(**ns) I don't see any way to do this, other than defining NS.keys and NS.__getitem__. But why should Python itself force me to expose ns.__dict__ in that way. I don't want my users getting a value via ns[key]. By the way, in JavaScript the constructs obj.aaa and obj['aaa'] are always equivalent. POSTSCRIPT: Here are some additional relevant facts. >>> fn(**dict(ns)) TypeError: 'NS' object is not iterable >>> def tmp(self): return iter(self.__dict__.items()) >>> NS.__iter__ = tmp >>> fn(**dict(ns)) {'b': 5, 'a': 3} >>> list(ns) [('b', 5), ('a', 3)] I find allowing f(**dict(ns)) but forbidding f(**ns) to be a restriction of functionality removes, rather than adds, values. Perhaps (I've not thought it through), *args and **kwargs should be treated as special contexts. Just as bool(obj) calls obj.__bool__ if available. https://docs.python.org/3.3/reference/datamodel.html#object.__bool__ In other words, have *args call __star__ if available, and **kwargs call __starstar__ if available. But I've not thought this through. -- Jonathan

Le 13/09/2018 à 10:07, Jonathan Fine a écrit :
It's most likely what we'd want to achieve by unpacking a dataclass (or at least, to my opinion). I'm not sure about the internals and the weight of such a feature, but I guess a toy implementation would just be, whenever we should raise a TypeError because the variable is not a mapping, to check whether it's a dataclass instance, and if so, call asdict on it, and return its result. I'm not sure I'm not off-topic though...

Hi Brice Good comment. I liked it. Not badly off-topic I think, because it looks to be an interesting work-around for the original problem. You wrote
How about writing fn( ** stst( data_class_obj ) ) where stst() does whatever it is you consider to be the right thing. My suggestion would be something like def stst(obj): method = getattr(obj, '__stst') if method: return method() else: return obj And then it's your responsibility to add an '__stst' attribute to your data classes. -- Jonathan

I can see custom mapping types where iterating the keys() would be trivial, but items() could be expensive. I could use that as an argument, but I don't have to. The keys() method is part of the API, just like index() and count() are part of the sequence API. To be treated like a mapping everywhere, python requires that you define* a keys() method, so why not use it? I don't see anything wrong with python using "public" methods, in this context. * If you use ABCs, then you don't need to define keys(), but that’s a tangent.

I don't think switching to __iter__ will cause dict(a_list) to produce anything other than what it does now - a traceback if the list is anything but a list of pairs. I think if we were to go forward with switching map-unpacking to __iter__, it would produce confusing mappings like you show in your example. I don't think it’s a good idea to switch to __iter__, or even make a dunder method for keys. The dunder methods are dunder methods because they are not generally directly useful. I don't see a major problem with having the mapping api call keys() - this is not a next()/__next__() situation, where the method is not generally directly useful.

On Wed, Sep 12, 2018 at 4:42 PM Michael Selik <mike@selik.org> wrote:
You want to have a Mapping that does not supply a keys method? What use case motivated your proposal?
Yes, my proposal was to consider allowing __iter__() to subsume keys() entirely, for the reasons outlined in my second email -- which I'm just realizing was an accidental "reply one" to Alex Walters rather than a "reply all", yikes! Here it is, duplicated: Ahh, no, I phrased my question a bit badly -- I'm not proposing that the
However, Serihy's example of {**[0, 2, 1]} is so-plainly irreconcilable -- somewhat embarrassing for me to have missed it -- that I'm now reluctantly no longer in favor. (Well, really, I'm tempted to say *why not?*, but I do see that it wouldn't be a good thing overall.) And I still kind of feel that there should be a dunder involved somewhere in this, but nowhere near strongly enough to dispute that *"[t]he dunder methods are dunder methods because they are not generally directly useful. [There doesn't seem to be] a major problem with having the mapping api call keys() [...]"*, as it's reasonable and rationalizes the current system well enough. Thank you for bearing with ;) Eli On Wed, Sep 12, 2018 at 4:42 PM, Michael Selik <mike@selik.org> wrote:

Someone wrote: Granted, my only strong argument is that the ** unpacking operator depends on this method to do its job, and it's currently alone amongst Python's operators in depending on a non-dunder to do so I like this argument. And I think it's important. Here's some background facts class dict(object) | dict() -> new empty dictionary | dict(mapping) -> new dictionary initialized from a mapping object's | (key, value) pairs | dict(iterable) -> new dictionary initialized as if via: | d = {} | for k, v in iterable: | d[k] = v | dict(**kwargs) -> new dictionary initialized with the name=value pairs | in the keyword argument list. For example: dict(one=1, two=2) >>> list(zip('abc', range(3))) [('a', 0), ('b', 1), ('c', 2)] >>> dict(list(zip('abc', range(3)))) {'b': 1, 'a': 0, 'c': 2} >>> dict(zip('abc', range(3))) {'b': 1, 'a': 0, 'c': 2} >>> dict(**zip('abc', range(3))) TypeError: type object argument after ** must be a mapping, not zip >>> dict(**list(zip('abc', range(3)))) TypeError: type object argument after ** must be a mapping, not list Now for my opinions. (Yours might be different.) First, it is my opinion that it is not reasonable to insist that the argument after ** must be a mapping. All that is required to construct a dictionary is a sequence of (key, value) pairs. The dict(iterable) construction proves that point. Second, relaxing the ** condition is useful. Consider the following. >>> class NS: pass >>> ns = NS() >>> ns.a = 3 >>> ns.b = 5 >>> ns.__dict__ {'b': 5, 'a': 3} >>> def fn(**kwargs): return kwargs >>> fn(**ns) TypeError: fn() argument after ** must be a mapping, not NS >>> fn(**ns.__dict__) {'b': 5, 'a': 3} The Zen of Python says Namespaces are one honking great idea -- let's do more of those! I see many advantages in using a namespace to build up the keyword arguments for a function call. For example, it could do data validation (of both keys/names and values). And once we have the namespace, used for this purpose, I find it natural to call it like so >>> fn(**ns) I don't see any way to do this, other than defining NS.keys and NS.__getitem__. But why should Python itself force me to expose ns.__dict__ in that way. I don't want my users getting a value via ns[key]. By the way, in JavaScript the constructs obj.aaa and obj['aaa'] are always equivalent. POSTSCRIPT: Here are some additional relevant facts. >>> fn(**dict(ns)) TypeError: 'NS' object is not iterable >>> def tmp(self): return iter(self.__dict__.items()) >>> NS.__iter__ = tmp >>> fn(**dict(ns)) {'b': 5, 'a': 3} >>> list(ns) [('b', 5), ('a', 3)] I find allowing f(**dict(ns)) but forbidding f(**ns) to be a restriction of functionality removes, rather than adds, values. Perhaps (I've not thought it through), *args and **kwargs should be treated as special contexts. Just as bool(obj) calls obj.__bool__ if available. https://docs.python.org/3.3/reference/datamodel.html#object.__bool__ In other words, have *args call __star__ if available, and **kwargs call __starstar__ if available. But I've not thought this through. -- Jonathan

Le 13/09/2018 à 10:07, Jonathan Fine a écrit :
It's most likely what we'd want to achieve by unpacking a dataclass (or at least, to my opinion). I'm not sure about the internals and the weight of such a feature, but I guess a toy implementation would just be, whenever we should raise a TypeError because the variable is not a mapping, to check whether it's a dataclass instance, and if so, call asdict on it, and return its result. I'm not sure I'm not off-topic though...

Hi Brice Good comment. I liked it. Not badly off-topic I think, because it looks to be an interesting work-around for the original problem. You wrote
How about writing fn( ** stst( data_class_obj ) ) where stst() does whatever it is you consider to be the right thing. My suggestion would be something like def stst(obj): method = getattr(obj, '__stst') if method: return method() else: return obj And then it's your responsibility to add an '__stst' attribute to your data classes. -- Jonathan
participants (6)
-
Alex Walters
-
Brice Parent
-
Elias Tarhini
-
Jonathan Fine
-
Michael Selik
-
Serhiy Storchaka