I am trying to release comfortable dataclass unpacking using `**` operator. Now I have 5 different ways to do it. But not a single good one. Confused by the implementation of the unpacking operator. So when I try to unpack any custom class, I get the error: `type object argument after ** must be a mapping, not MyClass` Ok, nothing special. I need to use `collections.abc.Mapping` right? Now I need to implement: `__getitem__`, `__iter__`, `__len__`. Not a problem. But additionally I get: `keys`, `items`, `values`. Hey, I don't need them. I don't need the full mapping functionality. I only need the double asterisk to work. Right, we have a duck typing! We throw out `abc.Mapping`. What do we need to implement? It's `__getitem__` and `keys`. Wtf `keys`? I am looking at Python Data model: https://docs.python.org/3/reference/datamodel.html There many operators, and they depend on special double underscore methods. Hmm, I don't see unpack operators there, it's strange. But why it's `keys`? Because the historical is `dict`? I think a dependency on `__iter__` is more preferable and expectable over a userspace named `keys`. Actually, `items()` is more predictable. But this is not the end. The `__getitem__` overload is often used for additional checking. I think `__iter__` and `keys` should only return a valid keys. Therefore, we don't need to further check them when unpacking. At the very least, we must control this. And in the end. `Mapping` keys can be `Any` type. `Unpack` keys must be `str` type. Some `Mapping` can be unpackable and some `Unpack` can be mappable. My suggestion: * Add new `collections.abc.Unpack` abstract layout for `**` unpack. * Add new special method like: def __unpack__(self): if issubclass(self, collections.abc.Mapping): # Really overload this method in `Mapping` and `dict`. keys = self.keys() # or return self.items()? else: keys = iter(self) return ((k, self[k]) for k in keys) * Update the implementation of the unpack operator to use the `__unpack__` function. As a result: * We can make the class unpackable without the advanced `Mapping` functionality. * We can control the unpacking process separately. * We throw away userspace named dependencies. * I think we are making behavior more predictable. What do you think about it?
My first thought is that for dataclasses, you can use the asdict() method, and you're done. But sure -- why not make it more generic. It does seem like ** could be usable with any iterable that returns pairs of objects. However the trick is that when you iterate a dict, you get the keys, not the items, which makes me think that the only thing you should *need* is an items() method that returns an iterable (pf pairs of objects). Indeed, if I make a subclass of Mapping, and define everything but items() with dummy methods, then it does work, so only items() is being used. So all we need to do is not check for a Mapping, but rather simply try to call `.items()`, which seems in the spirit of Python Duck Typing to anyway. And sequence unpacking seems to be only calling __iter__ -- so why not something similar for ** unpacking? In [35]: class Seq: ...: def __iter__(self): ...: return iter([3, 4]) ...: In [36]: s = Seq() In [37]: x, y = s In [38]: x Out[38]: 3 In [39]: y Out[39]: 4 But in the spirit of Chesterton’s Fence: Why DOES the unpacking operator type check for Mapping? -CHB On Sat, Dec 26, 2020 at 6:18 AM Anton Abrosimov <abrosimov.a.a@gmail.com> wrote:
I am trying to release comfortable dataclass unpacking using `**` operator. Now I have 5 different ways to do it. But not a single good one. Confused by the implementation of the unpacking operator.
So when I try to unpack any custom class, I get the error:
`type object argument after ** must be a mapping, not MyClass`
Ok, nothing special. I need to use `collections.abc.Mapping` right? Now I need to implement: `__getitem__`, `__iter__`, `__len__`. Not a problem. But additionally I get: `keys`, `items`, `values`. Hey, I don't need them. I don't need the full mapping functionality. I only need the double asterisk to work.
Right, we have a duck typing! We throw out `abc.Mapping`. What do we need to implement? It's `__getitem__` and `keys`. Wtf `keys`?
I am looking at Python Data model: https://docs.python.org/3/reference/datamodel.html There many operators, and they depend on special double underscore methods. Hmm, I don't see unpack operators there, it's strange. But why it's `keys`? Because the historical is `dict`? I think a dependency on `__iter__` is more preferable and expectable over a userspace named `keys`. Actually, `items()` is more predictable.
But this is not the end. The `__getitem__` overload is often used for additional checking. I think `__iter__` and `keys` should only return a valid keys. Therefore, we don't need to further check them when unpacking. At the very least, we must control this.
And in the end. `Mapping` keys can be `Any` type. `Unpack` keys must be `str` type. Some `Mapping` can be unpackable and some `Unpack` can be mappable.
My suggestion: * Add new `collections.abc.Unpack` abstract layout for `**` unpack. * Add new special method like:
def __unpack__(self): if issubclass(self, collections.abc.Mapping): # Really overload this method in `Mapping` and `dict`. keys = self.keys() # or return self.items()? else: keys = iter(self) return ((k, self[k]) for k in keys)
* Update the implementation of the unpack operator to use the `__unpack__` function.
As a result: * We can make the class unpackable without the advanced `Mapping` functionality. * We can control the unpacking process separately. * We throw away userspace named dependencies. * I think we are making behavior more predictable.
What do you think about it? _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2HMRGJ... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On 27/12/20 10:15 am, Christopher Barker wrote:
It does seem like ** could be usable with any iterable that returns pairs of objects. However the trick is that when you iterate a dict, you get the keys, not the items, which makes me think that the only thing you should *need* is an items() method that returns an iterable (pf pairs of objects).
It seems to me it would be more fundamental to use iteration to get the keys and indexing to get the corresponding values. You're only relying on dunder methods then. -- Greg
On 2020-12-26 16:34, Greg Ewing wrote:
On 27/12/20 10:15 am, Christopher Barker wrote:
It does seem like ** could be usable with any iterable that returns pairs of objects. However the trick is that when you iterate a dict, you get the keys, not the items, which makes me think that the only thing you should *need* is an items() method that returns an iterable (pf pairs of objects).
It seems to me it would be more fundamental to use iteration to get the keys and indexing to get the corresponding values. You're only relying on dunder methods then.
This is what I was thinking as well. I don't like the idea of relying on a non-dunder method like .keys() to implement syntax like ** unpacking. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Sat, Dec 26, 2020 at 4:35 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
On 27/12/20 10:15 am, Christopher Barker wrote:
... the only thing you should *need* is an items() method that returns an iterable (pf pairs of objects).
It seems to me it would be more fundamental to use iteration to get the keys and indexing to get the corresponding values. You're only relying on dunder methods then.
And that seems to work as well, I made a "Mapping" with a do-nothing .items(), but a functional __iter__ and __getitem__ and that seems to work as well. Though it seems that .items() could be more efficient, if it's there. Maybe not significant, but still ... In any case, I'd like to see the protocol be duck-typed, like, as far as I know, every other protocol in Python, rather than doing actual type checking. I'd live to hear why it currently checks for Mapping, rather than simply calling the methods it needs. -CHB
-- Greg _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/Z4UL4N... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Sun, Dec 27, 2020 at 01:34:02PM +1300, Greg Ewing wrote:
On 27/12/20 10:15 am, Christopher Barker wrote:
It does seem like ** could be usable with any iterable that returns pairs of objects. However the trick is that when you iterate a dict, you get the keys, not the items, which makes me think that the only thing you should *need* is an items() method that returns an iterable (pf pairs of objects).
It seems to me it would be more fundamental to use iteration to get the keys and indexing to get the corresponding values. You're only relying on dunder methods then.
I think if we were designing mapping protocols now, that would be an excellent idea, but we aren't, we have decades of history from `dict` behind us. And protocols from dict use `keys()` and getitem. E.g. update. I think it would be confusing to have dict protocols use keys and double star use items, so I think it would be better to follow the API of dict.update: D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v So to have an object E usable with double star dict unpacking, perhaps it needs to either: * have a keys() method which iterates over keys (assumed to all be strings), and `__getitem__`; or * support iteration, yielding (key, value) pairs. -- Steve
On 2020-12-26 18:00, Steven D'Aprano wrote:
On Sun, Dec 27, 2020 at 01:34:02PM +1300, Greg Ewing wrote:
On 27/12/20 10:15 am, Christopher Barker wrote:
It does seem like ** could be usable with any iterable that returns pairs of objects. However the trick is that when you iterate a dict, you get the keys, not the items, which makes me think that the only thing you should *need* is an items() method that returns an iterable (pf pairs of objects).
It seems to me it would be more fundamental to use iteration to get the keys and indexing to get the corresponding values. You're only relying on dunder methods then.
I think if we were designing mapping protocols now, that would be an excellent idea, but we aren't, we have decades of history from `dict` behind us. And protocols from dict use `keys()` and getitem. E.g. update.
What do you mean by "protocols from dict"? What are these protocols? -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Sat, Dec 26, 2020 at 06:09:42PM -0800, Brendan Barnwell wrote:
On 2020-12-26 18:00, Steven D'Aprano wrote:
I think if we were designing mapping protocols now, that would be an excellent idea, but we aren't, we have decades of history from `dict` behind us. And protocols from dict use `keys()` and getitem. E.g. update.
What do you mean by "protocols from dict"? What are these protocols?
"And protocols from dict use `keys()` and getitem. E.g. update." The dict in-place union assignment operator also uses the same protocol:
class A: ... def keys(self): ... return iter('abc') ... def __getitem__(self, key): ... return key.upper() ... d = {} d |= A() d {'a': 'A', 'b': 'B', 'c': 'C'}
(Regular union operator does not, it requires an actual dict.) There may be others. I know I have written code that followed the same interface as update, although I don't have it easily at hand. -- Steve
On 2020-12-26 18:44, Steven D'Aprano wrote:
I think if we were designing mapping protocols now, that would be an excellent idea, but we aren't, we have decades of history from `dict` behind us. And protocols from dict use `keys()` and getitem. E.g. update.
What do you mean by "protocols from dict"? What are these protocols? "And protocols from dict use `keys()` and getitem. E.g. update."
If I understand you right, that's not a protocol, that's just the behavior of the dict type specifically. As far as I can tell, it's not even documented behavior, so it doesn't constrain anything. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Sat, Dec 26, 2020 at 06:52:46PM -0800, Brendan Barnwell wrote:
On 2020-12-26 18:44, Steven D'Aprano wrote:
I think if we were designing mapping protocols now, that would be an excellent idea, but we aren't, we have decades of history from `dict` behind us. And protocols from dict use `keys()` and getitem. E.g. update.
What do you mean by "protocols from dict"? What are these protocols? "And protocols from dict use `keys()` and getitem. E.g. update."
If I understand you right, that's not a protocol, that's just the behavior of the dict type specifically. As far as I can tell, it's not even documented behavior, so it doesn't constrain anything.
Yes it is documented: help(dict.update) and it was intentionally the inspiration for the behaviour of dict augmented assignment. If you want to argue it's not a protocol, just an interface, okay, it's an interface. That's a difference that makes no difference. -- Steve
On 2020-12-26 21:02, Steven D'Aprano wrote:
On Sat, Dec 26, 2020 at 06:52:46PM -0800, Brendan Barnwell wrote:
On 2020-12-26 18:44, Steven D'Aprano wrote:
>I think if we were designing mapping protocols now, that would be an >excellent idea, but we aren't, we have decades of history from `dict` >behind us. And protocols from dict use `keys()` and getitem. E.g. >update.
What do you mean by "protocols from dict"? What are these protocols? "And protocols from dict use `keys()` and getitem. E.g. update."
If I understand you right, that's not a protocol, that's just the behavior of the dict type specifically. As far as I can tell, it's not even documented behavior, so it doesn't constrain anything.
Yes it is documented:
help(dict.update)
and it was intentionally the inspiration for the behaviour of dict augmented assignment.
I see. It's rather disturbing that that isn't mentioned in the docs on python.org.
If you want to argue it's not a protocol, just an interface, okay, it's an interface. That's a difference that makes no difference.
No, it does make a difference. What you're describing is the interface to a single existing type. A protocol is a framework that defines behavior for USER-DEFINED types to hook into, as the descriptor protocol lets you define __get__ or the iterator protocol lets you define __iter__. The fact that dict uses a method with a particular name to do this or that should not constrain the creation of future protocols that define behavior for methods to be defined in user-created classes. That said. . . I'm starting to wonder why not just create a new dunder called __items__ and have dict alias that to .items(). Then the **-unpacking protocol could use that and everything would be fine, right? -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Sun, Dec 27, 2020 at 4:30 PM Brendan Barnwell <brenbarn@brenbarn.net> wrote:
That said. . . I'm starting to wonder why not just create a new dunder called __items__ and have dict alias that to .items(). Then the **-unpacking protocol could use that and everything would be fine, right?
+0.95. If we could borrow the time machine and create this protocol way back in Python's past, I think it'd be the best. The semantics of maintaining backward compatibility MAY complicate things... but then again, iterator protocol is a bit complicated too, and everyone's fine with just defining __iter__. ChrisA
On Sat, Dec 26, 2020 at 09:18:09PM -0800, Brendan Barnwell wrote:
Yes it is documented:
help(dict.update)
and it was intentionally the inspiration for the behaviour of dict augmented assignment.
I see. It's rather disturbing that that isn't mentioned in the docs on python.org.
"Disturbing"? It's an oversight, not a conspiracy :-) Just submit a PR and I'm sure it will be accepted.
If you want to argue it's not a protocol, just an interface, okay, it's an interface. That's a difference that makes no difference.
No, it does make a difference. What you're describing is the interface to a single existing type.
It's part of the MutableMapping ABC, although you need to read the source code of that (or the doctsring) to see it: https://github.com/python/cpython/blob/3.9/Lib/_collections_abc.py -- Steve
On Sun, Dec 27, 2020 at 11:36 AM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
On 27/12/20 10:15 am, Christopher Barker wrote:
It does seem like ** could be usable with any iterable that returns pairs of objects. However the trick is that when you iterate a dict, you get the keys, not the items, which makes me think that the only thing you should *need* is an items() method that returns an iterable (pf pairs of objects).
It seems to me it would be more fundamental to use iteration to get the keys and indexing to get the corresponding values. You're only relying on dunder methods then.
But that would mean that a lot of iterables would look like mappings when they're not. Consider:
def naive_items(x): ... return [(key, x[key]) for key in x] ... naive_items(range(9, -1, -1)) [(9, 0), (8, 1), (7, 2), (6, 3), (5, 4), (4, 5), (3, 6), (2, 7), (1, 8), (0, 9)]
ChrisA
On 2020-12-26 18:03, Chris Angelico wrote:
On Sun, Dec 27, 2020 at 11:36 AM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
On 27/12/20 10:15 am, Christopher Barker wrote:
It does seem like ** could be usable with any iterable that returns pairs of objects. However the trick is that when you iterate a dict, you get the keys, not the items, which makes me think that the only thing you should *need* is an items() method that returns an iterable (pf pairs of objects).
It seems to me it would be more fundamental to use iteration to get the keys and indexing to get the corresponding values. You're only relying on dunder methods then.
But that would mean that a lot of iterables would look like mappings when they're not. Consider:
def naive_items(x): ... return [(key, x[key]) for key in x] ... naive_items(range(9, -1, -1)) [(9, 0), (8, 1), (7, 2), (6, 3), (5, 4), (4, 5), (3, 6), (2, 7), (1, 8), (0, 9)]
I don't see that as a major problem. It is no more "surprising" than doing something like list('abc') and getting ['a', 'b', 'c']. If you do {**range(9, -1, -1)} you may get a result that looks strange or isn't useful, but as long as the result is consistent with the protocol, that's fine. Just don't use **-unpacking with ranges if you don't want to. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Sun, Dec 27, 2020 at 1:15 PM Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2020-12-26 18:03, Chris Angelico wrote:
On Sun, Dec 27, 2020 at 11:36 AM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
On 27/12/20 10:15 am, Christopher Barker wrote:
It does seem like ** could be usable with any iterable that returns pairs of objects. However the trick is that when you iterate a dict, you get the keys, not the items, which makes me think that the only thing you should *need* is an items() method that returns an iterable (pf pairs of objects).
It seems to me it would be more fundamental to use iteration to get the keys and indexing to get the corresponding values. You're only relying on dunder methods then.
But that would mean that a lot of iterables would look like mappings when they're not. Consider:
def naive_items(x): ... return [(key, x[key]) for key in x] ... naive_items(range(9, -1, -1)) [(9, 0), (8, 1), (7, 2), (6, 3), (5, 4), (4, 5), (3, 6), (2, 7), (1, 8), (0, 9)]
I don't see that as a major problem. It is no more "surprising" than doing something like list('abc') and getting ['a', 'b', 'c']. If you do {**range(9, -1, -1)} you may get a result that looks strange or isn't useful, but as long as the result is consistent with the protocol, that's fine. Just don't use **-unpacking with ranges if you don't want to.
Perhaps, but that means you can't raise TypeError for anything iterable. Instead, you'd have to raise ValueError, because it could potentially be valid. Are mappings really just iterables with indexing (which most iterables support), or are they distinctly different? Remember, most things iterate over their *values*, but a dict iterates over its *keys*. On the plus side, you COULD convert some objects into sparse lists by basically just dictifying them:
{**range(5)} # with this proposal {0: 0, 1: 1, 2: 2, 3: 3, 4: 4}
But on the downside, you could only do this if they iterate the 'right way', which most iterables won't. It would be MUCH more useful if there were a dedicated way to request the valid keys or items, and use that instead. Then you could convert *any* iterable that way. At the moment, you can create an actual dict from any iterable by enumerating, and that will give the correct items:
def smarter_items(x): ... return list(dict(enumerate(x)).items()) ... smarter_items(range(9, -1, -1)) [(0, 9), (1, 8), (2, 7), (3, 6), (4, 5), (5, 4), (6, 3), (7, 2), (8, 1), (9, 0)] smarter_items(range(10, 20)) [(0, 10), (1, 11), (2, 12), (3, 13), (4, 14), (5, 15), (6, 16), (7, 17), (8, 18), (9, 19)]
If you want to dictify something, that'd be the more normal way to do it, IMO. Instead of something with lots of false positives, wouldn't it be cleaner to have a protocol that specifically returns the equivalent of dict.items()? ChrisA
On Sun, Dec 27, 2020 at 1:45 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
On 27/12/20 3:03 pm, Chris Angelico wrote:
But that would mean that a lot of iterables would look like mappings when they're not.
In the context of ** you're expecting a mapping, not a sequence.
Exactly; and under the proposal I was replying to, ANY object would look like a mapping if you can iterate over it and subscript it - which means all sequences look like mappings, just broken ones. ChrisA
26.12.20 13:23, Anton Abrosimov пише:
I am trying to release comfortable dataclass unpacking using `**` operator. Now I have 5 different ways to do it. But not a single good one. Confused by the implementation of the unpacking operator.
So when I try to unpack any custom class, I get the error:
`type object argument after ** must be a mapping, not MyClass`
Ok, nothing special. I need to use `collections.abc.Mapping` right? Now I need to implement: `__getitem__`, `__iter__`, `__len__`. Not a problem. But additionally I get: `keys`, `items`, `values`. Hey, I don't need them. I don't need the full mapping functionality. I only need the double asterisk to work.
Right, we have a duck typing! We throw out `abc.Mapping`. What do we need to implement? It's `__getitem__` and `keys`. Wtf `keys`?
I am looking at Python Data model: https://docs.python.org/3/reference/datamodel.html There many operators, and they depend on special double underscore methods. Hmm, I don't see unpack operators there, it's strange. But why it's `keys`? Because the historical is `dict`? I think a dependency on `__iter__` is more preferable and expectable over a userspace named `keys`. Actually, `items()` is more predictable.
See thread "Add __keys__ or __items__ protocol". https://mail.python.org/archives/list/python-ideas@python.org/thread/A3CK7Y2...
0. I believe that the `dict` behavior needs to be frozen. The change will break a lot of existing code, it's too much damage. 0.1. Yes, `keys` is not a good name for internal use, but that's okay. 0.2. If I want to make a class look like a `dict`, I understand that I will get `keys`, `items`... This is what I expect. 0.3. When I work with dicts in my code, I have a choice, I can use the default `dict`, or I can create my own dict-like class and implement different behavior. 0.4. `other[k] for k in other.keys()` default behaviour in `dict.update(other: Dict)` is a different big question about the degree of damage. Basically I can use `dict.update(dict.items())`. Back to the stars: 1. `*`, `**` are operators, but behaviorally they are methods or functions. I think this is the main problem. 1.1. Python operators (mostly?) use their dunder methods to control their behavior. 1.2. Unpack operators are nailed to specific objects and their behavior, like an function or method. As a result, we lose control over them. 2. `*` nailed to `Iterable`, not so bad. 2.1. It uses the `__iter__` method. I can implement any behaviour. 2.2. I only see one problem. I can't realize any other behavior for iterating and unpacking inside a custom class. 2.3. A new special method for unpacking is a better idea. By default, this method should return `self.__iter__`. This will give control and not break existing code. 3. `**` nailed to `dict`. I think this is the fundamental problem. 3.1. `dict` is a good choice for the DEFAULT `kwargs` container. But `dict` is too excess for `**`. One method that returns an iterator is enough. 3.2. `**` use a `kwargs[k] for k in kwargs.keys()` like implementation. I have no control over this behavior. 3.3. I am forced to implement excessive functionality. 3.4. I must take userspace named `keys()`. 3.5. I cannot implement `keys` and `__getitem__` independent unpacking inside the class. 4. Which I think we can do. 4.1. Make `*` and `**` operators with their own methods. 4.2. Implement `return self .__ iter __ ()` as the default behavior of `*`. 4.3. Create a new implementation of the `**` operator expecting: `Iterator [Tuple [key, val]]`. 4.4. Implement `return ((k, self[k]) for k in self.keys())` as the specific behaviour of `dict`. 4.5. Create a `collections.abc` layout with an abstract two star unpacking method. 4.6. Update PEP 448.
On Sun, Dec 27, 2020 at 02:05:38PM -0000, Anton Abrosimov wrote:
1. `*`, `**` are operators, but behaviorally they are methods or functions. I think this is the main problem.
No they aren't operators. They aren't in the operator precedence table, and they don't have dunders associated with them: https://docs.python.org/3/reference/expressions.html#operator-precedence Nor can you use them in places you can use arbitrary expressions: >>> a, b, c = *(1, 2, 3) File "<stdin>", line 1 SyntaxError: can't use starred expression here
2. `*` nailed to `Iterable`, not so bad. 2.1. It uses the `__iter__` method. I can implement any behaviour. 2.2. I only see one problem. I can't realize any other behavior for iterating and unpacking inside a custom class.
You contradict yourself: "I can implement any behaviour" "I can't realize any other behaviour ..." Which is correct?
2.3. A new special method for unpacking is a better idea. By default, this method should return `self.__iter__`. This will give control and not break existing code.
You already have control. If you want to use iterator unpacking on an object, make it an iterator. What are you trying to do that you want to use iterator unpacking on something but not make it an iterator? Without a concrete example of why you need: - to use iterator unpacking on something that isn't an iterator; - and mapping unpacking on something that isn't a mapping; all I can say is that his is needless over-generalization. As far as I can see, everything in this thread is YAGNI. -- Steve
Steven D'Aprano wrote:
On Sun, Dec 27, 2020 at 02:05:38PM -0000, Anton Abrosimov wrote:
*, ** are operators, but behaviorally they are methods or functions. I think this is the main problem.
No they aren't operators. They aren't in the operator precedence table,
and they don't have dunders associated with them: https://docs.python.org/3/reference/expressions.html#operator-precedence Nor can you use them in places you can use arbitrary expressions:
a, b, c = *(1, 2, 3) File "<stdin>", line 1 SyntaxError: can't use starred expression here
Hmm... PEP 448 -- Additional Unpacking Generalizations:
This PEP proposes extended usages of the * iterable unpacking operator and ** dictionary unpacking operators to allow unpacking in more positions, an arbitrary number of times, and in additional circumstances. Specifically, in function calls, in comprehensions and generator expressions, and in displays.
Steven D'Aprano wrote:
You contradict yourself: "I can implement any behaviour" "I can't realize any other behaviour ..." Which is correct?
I apologize for my english, I meant that I cannot implement the following behavior inside the class: ``` class MyClass: def __iter__(self): return self.items_for_iteration def __unpack__(self): return self.items_for_unpack ``` I have to make a separate method and have to rely on the user of the class. Steven D'Aprano wrote:
What are you trying to do that you want to use iterator unpacking on something but not make it an iterator?
How can I implement an unpackable dataclass without littering it with non-dunder methods?
On Mon, Dec 28, 2020 at 09:06:40AM -0000, Anton Abrosimov wrote:
Steven D'Aprano wrote:
You contradict yourself: "I can implement any behaviour" "I can't realize any other behaviour ..." Which is correct?
I apologize for my english, I meant that I cannot implement the following behavior inside the class:
``` class MyClass: def __iter__(self): return self.items_for_iteration def __unpack__(self): return self.items_for_unpack ``` I have to make a separate method and have to rely on the user of the class.
Ah, now I understand what you mean: you want iteration and iterator unpacking to do different things: obj = MyClass() list(obj) # iteration # --> returns a b c d print(*obj) # iterator unpacking # --> returns x y z You can't do that, just like you can't make these different: items = list(obj) # iteration items = [item for item in obj] # iteration in a comprehension items = [] for item in obj: # iteration in a for-loop items.append(item) And that is a **good thing** because it would be confusing and horrible if iteration over an object was different depending on how you iterate over it. We're not going to invent new dunder methods: def __iter__(self): def __forloop__(self): def __comprehension__(self): so that they can be different, and I don't think we should invent a new dunder method `__unpack__` so it can be different from iteration. Iterator unpacking is just a form of iteration. -- Steve
Steven D'Aprano wrote:
Steven D'Aprano wrote: You contradict yourself: "I can implement any behaviour" "I can't realize any other behaviour ..." Which is correct? I apologize for my english, I meant that I cannot implement the following behavior inside the class: class MyClass: def __iter__(self): return self.items_for_iteration def __unpack__(self): return self.items_for_unpack
I have to make a separate method and have to rely on the user of the class. Ah, now I understand what you mean: you want iteration and iterator unpacking to do different things: obj = MyClass()
On Mon, Dec 28, 2020 at 09:06:40AM -0000, Anton Abrosimov wrote: list(obj) # iteration # --> returns a b c d print(*obj) # iterator unpacking # --> returns x y z
You can't do that, just like you can't make these different: items = list(obj) # iteration
items = [item for item in obj] # iteration in a comprehension
items = [] for item in obj: # iteration in a for-loop items.append(item)
And that is a good thing because it would be confusing and horrible if iteration over an object was different depending on how you iterate over it. We're not going to invent new dunder methods: def __iter__(self):
def __forloop__(self):
def __comprehension__(self):
so that they can be different, and I don't think we should invent a new dunder method __unpack__ so it can be different from iteration. Iterator unpacking is just a form of iteration.
I agree with that. List unpacking is not a problem for me. The only thought: If `*` is an operator as PEP 448 say then there must be a method for it. The `**` behavior makes me sad.
On Sat, Dec 26, 2020 at 11:23:16AM -0000, Anton Abrosimov wrote:
I am trying to release comfortable dataclass unpacking using `**` operator. Now I have 5 different ways to do it. But not a single good one. Confused by the implementation of the unpacking operator.
So when I try to unpack any custom class, I get the error:
`type object argument after ** must be a mapping, not MyClass`
Ok, nothing special. I need to use `collections.abc.Mapping` right? Now I need to implement: `__getitem__`, `__iter__`, `__len__`. Not a problem. But additionally I get: `keys`, `items`, `values`. Hey, I don't need them. I don't need the full mapping functionality. I only need the double asterisk to work.
Why do you want something that isn't a mapping to be usable with mapping unpacking? Does it *really* hurt you to provide mapping methods when you get them for free? Just inherit from Mapping. -- Steve
Steven D'Aprano wrote:
Why do you want something that isn't a mapping to be usable with mapping unpacking?
I think mapping is not `abc.Mapping` class only. What about: `Iterator[Tuple[str, int]]` ``` @dataclass class MyMap: x: int y: int ``` Is this "mapping"? In Python I can use `/` as path separator: `pathlib.Path.cwd() / 'my_dir'`. I can control the behavior of my class. But I only have one way to unpack the object. Not a perfect way. `dict.update ()` gives more freedom. Steven D'Aprano wrote:
Does it really hurt you to provide mapping methods when you get them for free? Just inherit from Mapping.
``` # first.py: @dataclass class Point2D(Mapping): # second.py @dataclass class Point3D(Point2D): ``` Now I have to think about unnecessary public methods.
On Mon, Dec 28, 2020 at 7:45 PM Anton Abrosimov <abrosimov.a.a@gmail.com> wrote:
Steven D'Aprano wrote:
Why do you want something that isn't a mapping to be usable with mapping unpacking?
I think mapping is not `abc.Mapping` class only.
What about: `Iterator[Tuple[str, int]]`
``` @dataclass class MyMap: x: int y: int ```
Is this "mapping"? In Python I can use `/` as path separator: `pathlib.Path.cwd() / 'my_dir'`. I can control the behavior of my class. But I only have one way to unpack the object. Not a perfect way. `dict.update ()` gives more freedom.
Steven D'Aprano wrote:
Does it really hurt you to provide mapping methods when you get them for free? Just inherit from Mapping.
``` # first.py: @dataclass class Point2D(Mapping):
# second.py @dataclass class Point3D(Point2D): ```
Now I have to think about unnecessary public methods.
Allow me to rephrase what I *think* you're arguing here, and you can tell me if I'm close to the mark. Given an object of a custom class C, you can make it usable as "x, y, z = C()" or "f(*C())" or anything else by defining __iter__, and in all ways that object will be iterable, unpackable, etc. Given the same object, how can you ensure that it can be used as "f(**C())"? What about in "{}.update(C())"? Or "dict(C())"? Is there a single well-defined protocol that allows you to make your object usable in all mapping-like contexts? If that's not what your point is, ignore this post :) ChrisA
Chris Angelico wrote:
Allow me to rephrase what I think you're arguing here, and you can tell me if I'm close to the mark.
You close to the mark. :) Chris Angelico wrote:
Given an object of a custom class C, you can make it usable as "x, y, z = C()" or "f(*C())" or anything else by defining __iter__, and in all ways that object will be iterable, unpackable, etc.
Yes, the only small problem: ``` class MyClass: def __iter__(self): return self.items_for_iteration def __unpack__(self): return self.items_for_unpack ``` This is not hurt. Chris Angelico wrote:
Given the same object, how can you ensure that it can be used as "f(**C())"? What about in "{}.update(C())"? Or "dict(C())"? Is there a single well-defined protocol that allows you to make your object usable in all mapping-like contexts?
Look at the example: ``` seq = zip('abc', 'xyz') d = {} d.update(seq) print(d) # {'a': 'x', 'b': 'y', 'c': 'z'} def f(**kwargs): print(kwargs) seq = zip('abc', 'xyz') f(**seq) # TypeError: f() argument after ** must be a mapping, not zip ``` `dict.update` does the job. I think a single protocol is: `x, y, z = C()`, `[i for i in C()]`, ... call `C.__iter__()` `f(C())`: get C instance. `f(*C())`: call `C.__unpack_args__(self) -> Iterator[Any]:` `f(**C())`: call `C.__unpack_kwargs__(self) -> Iterator[Tuple[str, Any]]:`
On Mon, Dec 28, 2020 at 08:44:07AM -0000, Anton Abrosimov wrote:
Steven D'Aprano wrote:
Why do you want something that isn't a mapping to be usable with mapping unpacking?
I think mapping is not `abc.Mapping` class only.
You don't have to inherit from Mapping for this to work. Double-star unpacking already supports duck-typing:
class MyMapping: ... def keys(self): ... return iter('abc') ... def __getitem__(self, key): ... if key in ('a', 'b', 'c'): ... return key.upper() ... raise KeyError ... def demo(**kwargs): ... print(kwargs) ... import collections.abc issubclass(MyMapping, collections.abc.Mapping) False
demo(**MyMapping()) {'a': 'A', 'b': 'B', 'c': 'C'}
So we already support duck-typing here. We can't use the fallback iteration interface:
class MyOtherMapping: ... def __iter__(self): ... return zip('xyz', 'XYZ') ... demo(**MyOtherMapping()) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: __main__.demo() argument after ** must be a mapping, not MyOtherMapping
so that's a possible enhancement: dict unpacking should(?) fall back on iteration of (key, value) pairs just as dict.update does. I would cautiously support adding that as an enhancement.
What about: `Iterator[Tuple[str, int]]`
I said I would *cautiously* support that, because there are some subtle issues to do with exceptions, but we can worry about that later.
``` @dataclass class MyMap: x: int y: int ```
Is this "mapping"?
No:
issubclass(MyMap, collections.abc.Mapping) False
It doesn't even duck-type as a mapping. It does not support len, keys or subscripting:
obj = MyMap(2, 3) len(obj) TypeError: object of type 'MyMap' has no len() obj.keys() AttributeError: 'MyMap' object has no attribute 'keys' obj['x'] TypeError: 'MyMap' object is not subscriptable
so it is certainly not a mapping. -- Steve
On Mon, Dec 28, 2020 at 3:54 AM Steven D'Aprano <steve@pearwood.info> wrote:
Steven D'Aprano wrote:
Why do you want something that isn't a mapping to be usable with mapping unpacking?
I think mapping is not `abc.Mapping` class only.
You don't have to inherit from Mapping for this to work. Double-star unpacking already supports duck-typing:
class MyMapping: ... def keys(self): ... return iter('abc') ... def __getitem__(self, key): ... if key in ('a', 'b', 'c'): ... return key.upper() ... raise KeyError ... def demo(**kwargs): ... print(kwargs)
Thanks! this would have shortcutted this conversation if it come earlier. The OP, and me, took the error message at its word, and also, in experimenting, didn't happen on the right part of the Mapping API that needed to be supported. I don't know about the OP, but all I wanted was a clear definition of the part of the API needed to support **, and apparently it's a keys() method that returns an iterator of the keys, and a __getitem__ that then returns the values associated with those keys. Which is fine. Though frankly, I would rather have had it use .items() -- seems more efficient to me, and you do need both the keys and the values, and items() is just as much part of the Mapping API as keys. But there is an argument that the ** operator should be able to be supported only with dunder methods -- which could be done if it used the iterator protocol to get the keys, rather than the keys() method, which does not appear to work now. though to be fair, all you need to do to get that is add a __len__ and derive from Mapping. and to the OP's question a decorator that makes a Mapping from a dataclass would be pretty easy to write. -CHB
import collections.abc issubclass(MyMapping, collections.abc.Mapping) False
demo(**MyMapping()) {'a': 'A', 'b': 'B', 'c': 'C'}
So we already support duck-typing here.
We can't use the fallback iteration interface:
class MyOtherMapping: ... def __iter__(self): ... return zip('xyz', 'XYZ') ... demo(**MyOtherMapping()) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: __main__.demo() argument after ** must be a mapping, not MyOtherMapping
so that's a possible enhancement: dict unpacking should(?) fall back on iteration of (key, value) pairs just as dict.update does. I would cautiously support adding that as an enhancement.
What about: `Iterator[Tuple[str, int]]`
I said I would *cautiously* support that, because there are some subtle issues to do with exceptions, but we can worry about that later.
``` @dataclass class MyMap: x: int y: int ```
Is this "mapping"?
No:
issubclass(MyMap, collections.abc.Mapping) False
It doesn't even duck-type as a mapping. It does not support len, keys or subscripting:
obj = MyMap(2, 3) len(obj) TypeError: object of type 'MyMap' has no len() obj.keys() AttributeError: 'MyMap' object has no attribute 'keys' obj['x'] TypeError: 'MyMap' object is not subscriptable
so it is certainly not a mapping.
-- Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/UYDIPM... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Mon, Dec 28, 2020 at 12:15 PM Christopher Barker <pythonchb@gmail.com> wrote:
I don't know about the OP, but all I wanted was a clear definition of the part of the API needed to support **, and apparently it's a keys() method that returns an iterator of the keys, and a __getitem__ that then returns the values associated with those keys. Which is fine.
Though frankly, I would rather have had it use .items() -- seems more efficient to me, and you do need both the keys and the values, and items() is just as much part of the Mapping API as keys.
There may be a (small) performance issue with that -- items() requires creating a tuple object for each key/value pair. Anyway, of course it's too late to change. And there are probably other "protocols" that check for the presence of keys and __getitem__(). Also, in a sense keys() is more fundamental -- deriving keys() from items() would be backwards (throwing away the values -- imagine a data type that stores the values on disk).
But there is an argument that the ** operator should be able to be supported only with dunder methods -- which could be done if it used the iterator protocol to get the keys, rather than the keys() method, which does not appear to work now. though to be fair, all you need to do to get that is add a __len__ and derive from Mapping.
If we had to do it all over from scratch we would probably design mappings and sequences to be differentiably using dunders only. But it's about 31 years too late for that. And looking at the mess JavaScript made of this (sequences are mappings with string keys "0", "1" and so on), I'm pretty happy with how Python did this. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On Mon, Dec 28, 2020 at 12:33 PM Guido van Rossum <guido@python.org> wrote:
On Mon, Dec 28, 2020 at 12:15 PM Christopher Barker <pythonchb@gmail.com> wrote:
Though frankly, I would rather have had it use .items() -- seems more efficient to me, and you do need both the keys and the values, and items() is just as much part of the Mapping API as keys.
There may be a (small) performance issue with that -- items() requires creating a tuple object for each key/value pair.
it does look like items() is a tad faster (dict with 1000 items), but not enough to matter: In [61]: %timeit {k: d[k] for k in d.keys()} 112 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [62]: %timeit {k: v for k, v in d.items()} 92.6 µs ± 1.9 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) Anyway, of course it's too late to change. And there are probably other
"protocols" that check for the presence of keys and __getitem__(). Also, in a sense keys() is more fundamental -- deriving keys() from items() would be backwards (throwing away the values -- imagine a data type that stores the values on disk).
Does there need to be a single defined "protocol" for a mapping (other than the ABC)? -- that is, would **unpacking be able to use .items() and keys() be used in other contexts? And why does ** unpacking need to check at all (LBYL) couldn't it simply do something like: {k: d[k] for k in d} sure, there could occasionally be a Sequence for which that would happen to work (like a range object for instance), but then it would be unlikely to result in the expected result anyway -- just like many other uses of Duck typing. Or not, and it could still be correct. But as you say -- too late the change now anyway. To the OP: you suggested that you had, I think, four ways to make a dataclass "unpackable", but none were satisfactory. How about this decorator: def make_mapping(cls): def __getitem__(self, key): if key in self.__dataclass_fields__: return self.__dict__[key] else: raise KeyError(key) def keys(self): return self.__dataclass_fields__.keys() cls.__getitem__ = __getitem__ cls.keys = keys return cls @make_mapping @dataclasses.dataclass class Point: x: int y: int p = Point(1, 2) print(p) print({**p}) print(dict(p)) -CHB Side Question: when should one use __dict__ vs vars() vs getattr() ??? all three work in this case, but I'm never quite sure which is prefered, and why.
But there is an argument that the ** operator should be able to be supported only with dunder methods -- which could be done if it used the iterator protocol to get the keys, rather than the keys() method, which does not appear to work now. though to be fair, all you need to do to get that is add a __len__ and derive from Mapping.
If we had to do it all over from scratch we would probably design mappings and sequences to be differentiably using dunders only. But it's about 31 years too late for that. And looking at the mess JavaScript made of this (sequences are mappings with string keys "0", "1" and so on), I'm pretty happy with how Python did this.
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Mon, Dec 28, 2020 at 11:36:55PM -0800, Christopher Barker wrote:
Side Question: when should one use __dict__ vs vars() vs getattr() ??? all three work in this case, but I'm never quite sure which is prefered, and why.
`vars(obj)` is defined as returning `obj.__dict__` so technically it probably doesn't matter, but I feel that as a matter of aesthetics and future-proofing, we should avoid direct use of dunders whenever practical. Do you prefer to write `mylist.__len__()` over `len(mylist)`? Then you will probably prefer `obj.__dict__` over `vars(obj)` too :-) -- Steve
On Tue, Dec 29, 2020 at 8:11 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Mon, Dec 28, 2020 at 11:36:55PM -0800, Christopher Barker wrote:
Side Question: when should one use __dict__ vs vars() vs getattr() ??? all three work in this case, but I'm never quite sure which is prefered, and why.
`vars(obj)` is defined as returning `obj.__dict__` so technically it probably doesn't matter, but I feel that as a matter of aesthetics and future-proofing, we should avoid direct use of dunders whenever practical.
To the contrary, vars() is something I added to the language for the benefit of REPL users (like dir()), and other usages look suspect to me. I find that using `__dict__` is more direct about the purpose, and also it is the prevailing style.
Do you prefer to write `mylist.__len__()` over `len(mylist)`? Then you will probably prefer `obj.__dict__` over `vars(obj)` too :-)
Not a valid analogy. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On Tue, Dec 29, 2020 at 09:02:10AM -0800, Guido van Rossum wrote:
On Tue, Dec 29, 2020 at 8:11 AM Steven D'Aprano <steve@pearwood.info> wrote:
To the contrary, vars() is something I added to the language for the benefit of REPL users (like dir()), and other usages look suspect to me. I find that using `__dict__` is more direct about the purpose, and also it is the prevailing style.
I cannot argue with your historical perspective on this, and I agree that `vars()` is not as well known as I believe it should be. So you are right on the bare facts. I still think you are wrong on the aesthetics :-) Your comment also shines some light on why `vars()` with no argument returns `locals()`, which otherwise seems strange to me. Nevertheless, I have to ask what could possibly be "suspect" about using vars() programmatically? It isn't like dir(). dir() is certainly a convenience function for interactive use, and is documented as returning "the most relevant, rather than complete, information". This note is repeated again later in the docs: "Because dir() is supplied primarily as a convenience for use at an interactive prompt, it tries to supply an interesting set of names more than it tries to supply a rigorously or consistently defined set of names, and its detailed behavior may change across releases." On the other hand, `vars()` has no such warnings. There's no wiggle- room: it either returns the instance `__dict__` or it raises TypeError. So aside from the difference in exceptions (AttributeError versus TypeError) I don't think that there is any possible difference between direct attribute access and the output of vars(). Am I wrong? Speaking of slots, I've often been annoyed that there is no abstraction that hides the difference between instances that use a dict as symbol table, and those that use slots. (And those that use both.) If you need an object's symbol table, for introspection or otherwise, you're out of luck if it uses slots. There doesn't seem to be any way to handle these two implementations in the same way, and objects with both slots and a dict can give surprising results if naive code expects `__dict__` to be the symbol table:
class A: ... __slots__ = ('spam', '__dict__') ... obj = A() obj.spam = True 'spam' in obj.__dict__ False obj.__dict__.update(spam=False) obj.spam True
So there's no way to get an object's symbol table in an implementation- independent way. Whether you use `obj.__dict__` or `vars(obj)` it only gives you the symbol table for objects that use a dict. There's nothing that works for objects that use slots. So far I've worked around this in an ad-hoc fashion by testing for `__slots__` and treating that case as special, but it would be nice to ignore the implementation details and just have a "symbol table" object to work with. What do you think?
Do you prefer to write `mylist.__len__()` over `len(mylist)`? Then you will probably prefer `obj.__dict__` over `vars(obj)` too :-)
Not a valid analogy.
I think it is. Apart from a matter of taste, what part of the analogy do you feel is invalid? -- Steve
On Wed, Dec 30, 2020 at 5:01 PM Steven D'Aprano <steve@pearwood.info> wrote:
Speaking of slots, I've often been annoyed that there is no abstraction that hides the difference between instances that use a dict as symbol table, and those that use slots. (And those that use both.)
<snip> it would be nice to
ignore the implementation details and just have a "symbol table" object to work with. What do you think?
I think that would be great -- and wonder if vars() could be extended to do that?
Do you prefer to write `mylist.__len__()` over `len(mylist)`? Then you
will probably prefer `obj.__dict__` over `vars(obj)` too :-)
Not a valid analogy.
I think it is. Apart from a matter of taste, what part of the analogy do you feel is invalid?
For my part, I think the difference is that when you are working with .__dict__ you are doing meta-programming, for which poking around in the dunders makes perfect sense. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Wed, Dec 30, 2020 at 5:01 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Dec 29, 2020 at 09:02:10AM -0800, Guido van Rossum wrote:
On Tue, Dec 29, 2020 at 8:11 AM Steven D'Aprano <steve@pearwood.info> wrote:
To the contrary, vars() is something I added to the language for the benefit of REPL users (like dir()), and other usages look suspect to me. I find that using `__dict__` is more direct about the purpose, and also it is the prevailing style.
I cannot argue with your historical perspective on this, and I agree that `vars()` is not as well known as I believe it should be.
So you are right on the bare facts. I still think you are wrong on the aesthetics :-)
Your comment also shines some light on why `vars()` with no argument returns `locals()`, which otherwise seems strange to me.
This is the first hint that vars() is not what it seems. Nevertheless, I have to ask what could possibly be "suspect" about using
vars() programmatically? It isn't like dir().
But it is dir()'s cousin, and the fact that its meaning hasn't changed doesn't mean they couldn't: Christopher Barker is already asking for such a change. This is the second hint.
dir() is certainly a convenience function for interactive use, and is documented as returning "the most relevant, rather than complete, information".
This note is repeated again later in the docs: "Because dir() is supplied primarily as a convenience for use at an interactive prompt, it tries to supply an interesting set of names more than it tries to supply a rigorously or consistently defined set of names, and its detailed behavior may change across releases."
On the other hand, `vars()` has no such warnings. There's no wiggle- room: it either returns the instance `__dict__` or it raises TypeError. So aside from the difference in exceptions (AttributeError versus TypeError) I don't think that there is any possible difference between direct attribute access and the output of vars(). Am I wrong?
That's just because vars() didn't turn out to be so useful, or perhaps because people didn't think of improving it when dir() was improved. I suspect that if you look at the original docs, dir() and vars() had pretty similar documentation. (And according to PEP 361, `__dir__` was originally added in 3.0 -- though it seems to have been backported to 2.x at some point.)
Speaking of slots, I've often been annoyed that there is no abstraction that hides the difference between instances that use a dict as symbol table, and those that use slots. (And those that use both.) If you need an object's symbol table, for introspection or otherwise, you're out of luck if it uses slots.
There doesn't seem to be any way to handle these two implementations in the same way, and objects with both slots and a dict can give surprising results if naive code expects `__dict__` to be the symbol table:
class A: ... __slots__ = ('spam', '__dict__') ... obj = A() obj.spam = True 'spam' in obj.__dict__ False obj.__dict__.update(spam=False) obj.spam True
So there's no way to get an object's symbol table in an implementation- independent way. Whether you use `obj.__dict__` or `vars(obj)` it only gives you the symbol table for objects that use a dict. There's nothing that works for objects that use slots.
So far I've worked around this in an ad-hoc fashion by testing for `__slots__` and treating that case as special, but it would be nice to ignore the implementation details and just have a "symbol table" object to work with. What do you think?
That you're providing an excellent argument to evolve vars() independently from `__dict__`. :-)
Do you prefer to write `mylist.__len__()` over `len(mylist)`? Then you will probably prefer `obj.__dict__` over `vars(obj)` too :-)
Not a valid analogy.
I think it is. Apart from a matter of taste, what part of the analogy do you feel is invalid?
len() is an important abstraction for containers, and its usage deserves a short name (just like unary minus and abs() for numbers). This is crucial even though you have to use its "true name" (https://xkcd.com/2381/) to define the implementation for a particular class. OTOH, `__dict__` is the opposite of an abstraction -- whether you spell it vars(x) or `x.__dict__`, the matter remains that you're looking inside the implementation of the object, and what you get is not an abstraction -- as you've pointed out for `__slots__`, and as is also apparent for e.g. namedtuple or fundamental objects like numbers, strings and built-in container types. By making the dominant spelling `__dict__`, we remind people that when they use this, they're tinkering with the implementation. Don't get me wrong, this can be fun and useful, but it's not a clean abstraction. The more fundamental abstraction is getattr()/setattr(), since it is supported by *all* objects, not just most classes. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
len() is an important abstraction for containers, and its usage deserves a short name (just like unary minus and abs() for numbers). This is crucial even though >you have to use its "true name" (https://xkcd.com/2381/<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fxkcd.com%2F2381%2F&data=04%7C01%7C%7C3f1cdaa1afa74e3eaac908d8ad48ede4%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637449876024295719%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=rmre4UOxKowtUhZF7Ug79mYEvM6rgRGBeemtNJ6XtZo%3D&reserved=0>) to define the implementation for a particular class.
OTOH, `__dict__` is the opposite of an abstraction -- whether you spell it vars(x) or `x.__dict__`, the matter remains that you're looking inside the >implementation of the object, and what you get is not an abstraction -- as you've pointed out for `__slots__`, and as is also apparent for e.g. namedtuple or >fundamental objects like numbers, strings and built-in container types.
By making the dominant spelling `__dict__`, we remind people that when they use this, they're tinkering with the implementation. Don't get me wrong, this >can be fun and useful, but it's not a clean abstraction. The more fundamental abstraction is getattr()/setattr(), since it is supported by *all* objects, not just >most classes.
Can I suggest that a missing component is lsattr() as a similar function to dir() but with a guarantee that everything returned will succeed if used for a getattr call - possibly without a guarantee that it will list everything that will successfully return from getattr e.g. if there is a custom getattr that does a case independent search so that if getattr for spam, Spam and SPAM all return spam lsattr would only return spam but if there is a local attribute eggs that getattr excludes so would lsattr. It would also be nice if it had a flag, (possibly defaulting to true), to exclude all entries with a leading underscore. (Personally I think that this would be a handy option for dir() as well). Steve (Gadget) Barnes
On Mon, Dec 28, 2020 at 11:36 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Mon, Dec 28, 2020 at 12:33 PM Guido van Rossum <guido@python.org> wrote:
On Mon, Dec 28, 2020 at 12:15 PM Christopher Barker <pythonchb@gmail.com> wrote:
Though frankly, I would rather have had it use .items() -- seems more efficient to me, and you do need both the keys and the values, and items() is just as much part of the Mapping API as keys.
There may be a (small) performance issue with that -- items() requires creating a tuple object for each key/value pair.
it does look like items() is a tad faster (dict with 1000 items), but not enough to matter:
In [61]: %timeit {k: d[k] for k in d.keys()} 112 µs ± 1.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [62]: %timeit {k: v for k, v in d.items()} 92.6 µs ± 1.9 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Interesting -- thanks for taking up the challenge. I still suspect that if we ran the corresponding benchmark at the C level, the first form would win, but it's a matter of hashing twice vs. creating a tuple -- both of which have the wazoo optimized out of them so it could go either way. There are many surprises possible here (e.g. long ago someone found that `s.startswith('x')` is slower than `s[:1] == 'x'` and the reason is the name lookup for `startswith`!).
Anyway, of course it's too late to change. And there are probably other
"protocols" that check for the presence of keys and __getitem__(). Also, in a sense keys() is more fundamental -- deriving keys() from items() would be backwards (throwing away the values -- imagine a data type that stores the values on disk).
Does there need to be a single defined "protocol" for a mapping (other than the ABC)? -- that is, would **unpacking be able to use .items() and keys() be used in other contexts?
And why does ** unpacking need to check at all (LBYL) couldn't it simply do something like:
{k: d[k] for k in d}
sure, there could occasionally be a Sequence for which that would happen to work (like a range object for instance), but then it would be unlikely to result in the expected result anyway -- just like many other uses of Duck typing. Or not, and it could still be correct.
I don't understand why LBYL is considered such an anti-pattern. It helps produce much clearer error messages in this case for users who are exploring this feature, and distinguishing *early* between sequences and mappings is important for that. Long ago we decided that the distinctive feature is that mappings have a `keys()` method whereas sequences don't (and users who add a `keys()` method to a sequence are just asking for trouble). So that's what we use. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On 2020-12-29 10:30, Guido van Rossum wrote:
I don't understand why LBYL is considered such an anti-pattern. It helps produce much clearer error messages in this case for users who are exploring this feature, and distinguishing *early* between sequences and mappings is important for that. Long ago we decided that the distinctive feature is that mappings have a `keys()` method whereas sequences don't (and users who add a `keys()` method to a sequence are just asking for trouble). So that's what we use.
I think what is confusing to me that is that it was not ever clear to me that such a decision was ever made. I can't find anything on python.org/docs that explicitly says that that is how a mapping defines its mapping behavior, let alone that that specifically defines unpacking behavior. It seems the real issue here is one of documentation. My view tends to me that python.org/docs is the ultimate authority on the language's behavior. Unfortunately it sometimes doesn't work that way, and I've been meaning to make a post on this list about it, which hopefully I will do at some point. But like, important stuff like "** unpacking is implemented with this method" should not be buried in a PEP, nor a docstring, and certainly not a source-code comment. As far as I can see, on python.org/docs all we have is that the Mapping abc lists keys() as a mixin method, but nowhere actually says that those methods are also what is used to implement syntactic things like **-unpacking. I see this is a major oversight and I think the docs should be updated more comprehensively when PEPs are approved or decisions like the one you describe are made, so that the actual, real documentation --- not the PEPs! --- is always reflective and in fact definitional of the language's behavior. In other words, what the language is accountable to is the documentation on python.org/docs and nothing else. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Tue, Dec 29, 2020 at 11:18 AM Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2020-12-29 10:30, Guido van Rossum wrote:
Long ago we decided that the distinctive feature is that mappings have a `keys()` method whereas sequences don't
It seems the real issue here is one of documentation. Exactly. The challenge here is that Python is dynamic and has evolved (a lot!) over the years. So there are dunders, and protocols, and ABCs, and they all overlap a bit in purpose. And "protocols" seem to be the least clearly specified. Indeed, the "iteration protocol" is well known and talked about, but not well documented in the standard docs: The only actual reference to it I could find is in the C API docs. __iter__ and __next__ are referred to a lot, but I don't see the protocol described anywhere. And there are various protocols described in the C API docs, including the Mapping protocol: https://docs.python.org/3/c-api/mapping.html But I *think* that's really about the C API -- not what we are talking about here. Also interestingly, it says: "Note that it returns 1 for Python classes with a __getitem__() method" -- so it's not checking for keys() here. There IS a discussion of the iterator protocol (without using that word) here: https://docs.python.org/3/library/stdtypes.html#iterator-types Under "Iterator Types" -- which is a bit odd, as strictly speaking, it's not a type, but where else would it go? So maybe we could add some text to the Mapping Type section on that page: https://docs.python.org/3/library/stdtypes.html#mapping-types-dict Maybe something along the lines of: "When a Mapping type is required in python (e.g. dict.update, ** unpacking), .keys() and __getitem__ are the minimum required to meet the Mapping protocol." Otherwise, the Mapping ABC is clearly defined, but not, in fact, required in most of the contexts that expect a Mapping. The other place we could perhaps improve the docs is in the error message for **: "TypeError: 'str' object is not a mapping" perhaps it could say something like: "TypeError: 'str' object is not a mapping. An object must have a .keys() method to be used in this context" or some such. As far as I can
see, on python.org/docs all we have is that the Mapping abc lists keys() as a mixin method, but nowhere actually says that those methods are also what is used to implement syntactic things like **-unpacking.
Indeed, and it also lists .items() among others, but it would be good to specify that it is keys() that is used to determine if an object "is* a Mapping. I see this as a fundamental clash between the concept of an "ABC" and duck typing -- an ABC defines everything that *could* be needed, whereas duck typing requires only what is actually needed in a given context. But if that's the case, it should be documented. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On 2020-12-29 12:44, Christopher Barker wrote:
On Tue, Dec 29, 2020 at 11:18 AM Brendan Barnwell <brenbarn@brenbarn.net <mailto:brenbarn@brenbarn.net>> wrote:
On 2020-12-29 10:30, Guido van Rossum wrote: > Long ago we decided that the distinctive > feature is that mappings have a `keys()` method whereas sequences don't
It seems the real issue here is one of documentation.
Exactly. The challenge here is that Python is dynamic and has evolved (a lot!) over the years. So there are dunders, and protocols, and ABCs, and they all overlap a bit in purpose. And "protocols" seem to be the least clearly specified. Indeed, the "iteration protocol" is well known and talked about, but not well documented in the standard docs:
No offense to anyone here, but I don't really see that as a valid excuse. Documentation can evolve too. It's fine if the protocols and so on overlap, but the nature of that overlap needs to be documented. To my mind, every time a change is made to Python behavior there must be a corresponding change to the main documentation describing the change. I would go so far as to say that the lack of such documentation updates should be a blocker on the release of the feature. Features without complete documentation in a straightforward place on python.org/docs (not the PEPs!) should not be shipped. I can understand really old stuff like dict having undocumented innards, when it was just Guido tinkering on his own, but stuff that dates from the era of a wider development team and widespread use of Python (like the iterator protocol) faces a higher standard. Is there any ongoing project to bring the official documentation up to date with regard to the kinds of issues you describe?
Maybe something along the lines of:
"When a Mapping type is required in python (e.g. dict.update, ** unpacking), .keys() and __getitem__ are the minimum required to meet the Mapping protocol."
Even that isn't sufficiently explicit, to my mind. It needs to be something more like: When an object is unpacked with ** unpacking, its .keys() method is called to get the available keys, and its __getitem__ is called to obtain the value for each key. That should be somewhere where it describes the unpacking syntax and semantics. If dict.update internally uses .keys() and that's relevant for stuff like subclassing dict, that should be in the dict documentation. In other words, every operation that uses .keys() needs to be defined to use .keys() where that operation it is defined. Or, if there is a "dict protocol" (like maybe the Mapping ABC) then there should be a separate doc page about that protocol, which gives a COMPLETE list of all methods that are part of the protocol along with a COMPLETE list of the language features that make use of that protocol.
Otherwise, the Mapping ABC is clearly defined, but not, in fact, required in most of the contexts that expect a Mapping.
Again, to my mind, the Mapping ABC (and most of the other ABCs there) are not clearly defined, because the semantics of the methods and when they are invoked by other parts of Python's machinery (e.g., syntax) are not defined. In other words, it's not enough to say that .keys() "is part of the Mapping ABC"; you have to actually give a complete list of situations when it's going to be implicitly called, and update that list if needed as the language evolves. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Wed, Dec 30, 2020 at 8:49 AM Brendan Barnwell <brenbarn@brenbarn.net> wrote:
To my mind, every time a change is made to Python behavior there must be a corresponding change to the main documentation describing the change. I would go so far as to say that the lack of such documentation updates should be a blocker on the release of the feature. Features without complete documentation in a straightforward place on python.org/docs (not the PEPs!) should not be shipped.
It's not that simple. As one example, consider zipapp: https://docs.python.org/3/whatsnew/3.5.html#whatsnew-zipapp It existed for a long time before that, and technically WAS documented, but nobody knew about it. Would you say that Python would have been better off leaving this feature languishing on the bug tracker for lack of adequate documentation? What does "adequate" even mean? (Or to use your term, "complete"? Perhaps even harder to define.) Documentation bugs are bugs to be fixed, just like any other. Rather than complaining that this should have prevented the feature from being released, just propose an improvement. Where should this be documented? How should it be worded? (I'd contribute actual wording myself, but I don't know the details of the feature well enough. TBH, I only ever use ** with actual dictionaries.) ChrisA
On 2020-12-29 13:54, Chris Angelico wrote:
Documentation bugs are bugs to be fixed, just like any other. Rather than complaining that this should have prevented the feature from being released, just propose an improvement. Where should this be documented? How should it be worded?
Well, I did propose it, for this particular case, in a part of the message you didn't quote. But I think ultimately to do it right the documentation needs a structural overhaul. The way it is now tries to separate "language reference" and "library reference", where the former seems to mainly be describing syntax, with other stuff shunted off into opaquely-titled sections like "data model". But I think that is confusing So, to give an example, the iterator protocol should be documented right where the `for` statement is documented, and it should be explicitly framed as "this is the definition of what the `for` statement does", not something separate. The unpacking "protocol" (or whatever we call it) should be documented right where function call syntax is documented. __getitem__ should be documented right where the [] syntax is documented. The descriptor protocol should be documented right where attribute access syntax is documented. And so on. In other words, the "language reference" should be more or less a catalog of syntactic features with in-situ definitions of their semantics as the invocation of a particular protocol. I think the way Python handles these things as a language is one of its greatest features. Basically Python is a sort of syntactic framework on which you can hang various implementations by overriding dunders, and that's very powerful. So it's unfortunate that that isn't used as the main organizing principle of the documentation. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Tue, Dec 29, 2020 at 02:20:05PM -0800, Brendan Barnwell wrote:
So, to give an example, the iterator protocol should be documented right where the `for` statement is documented, and it should be explicitly framed as "this is the definition of what the `for` statement does", not something separate.
Hmmm, that's going to be a bit confusing for people who want to understand iter() and next() but don't care about for loops directly.
The unpacking "protocol" (or whatever we call it) should be documented right where function call syntax is documented.
That's going to be a bit obscure for people who want to understand unpacking: a, b, *extras, z = values
__getitem__ should be documented right where the [] syntax is documented.
That's a bit confusing for people who know the method name but don't know what syntax uses it. And its a lot confusing for people who just want to know what subscripting means but don't care about the dunder methods used to implement it.
The descriptor protocol should be documented right where attribute access syntax is documented.
And that's quite hostile to those who want a simple high-level overview of what attribute access does without having a low-level and very advanced feature, descriptors, shoved down their throat. Brendan, I applaud your zeal in wanting to fix the Python docs, but things which are no-brainer improvements to *you* will be no-brainer regressions to others. In particular, the Python docs aren't a dead-tree book, they are a hypertext document. There's no need to have features documented "right where" related features are documented, because that leads to needing everything documented in one place: - subscripting `[]` needs to have mappings and sequences documented right there; - mappings need to have hash documented right there; - sequences need to have integers and the `__index__` dunder documented right there - integers need to have the numeric tower documented right there - which needs ABCs to be documented right there and because everything in Python is connected to everything else, the entire language and std lib needs to be on one page. I totally agree with you that the connections between features, and detailed information about protocols and dunders can sometimes be hard to find, but I think that the solution is not a complete redesign of the docs, but more hyperlinks and See Alsos. -- Steve
On 2020-12-29 16:18, Steven D'Aprano wrote:
On Tue, Dec 29, 2020 at 02:20:05PM -0800, Brendan Barnwell wrote:
So, to give an example, the iterator protocol should be documented right where the `for` statement is documented, and it should be explicitly framed as "this is the definition of what the `for` statement does", not something separate.
Hmmm, that's going to be a bit confusing for people who want to understand iter() and next() but don't care about for loops directly.
With regard to this and your other comments, I can see that that could be useful in some cases. But the important points are: 1) if the for statement uses the iterator protocol (which it does), the part of the documentation where the for statement is defined absolutely must link to the docs on the iterator protocol; and 2) the details of how `for` uses the protocol must be explained in full in the docs for `for`.
I totally agree with you that the connections between features, and detailed information about protocols and dunders can sometimes be hard to find, but I think that the solution is not a complete redesign of the docs, but more hyperlinks and See Alsos.
I would go slightly further than that though. It's not just "see alsos". It can be hyperlinks, but what needs to be made clear is that the hyperlinks are to DEFINITIONAL information where that is the case. In other words, the `for` statement docs need to say that the operation of the `for` statement is DEFINED by the iterator protocol. Now yes, if that definiton has many nooks and crannies we can factor those out into a separate page. But the way that `for` uses the iterator should indeed be spelled out. For instance, right now it says: An iterator is created for the result of the expression_list. The suite is then executed once for each item provided by the iterator, in the order returned by the iterator. Each item in turn is assigned to the target list using the standard rules for assignments (see Assignment statements), and then the suite is executed. What I'm saying is it should say something like: An iterator (link to iterator protocol here) is created by calling iter() on the result of the expression_list. For each loop iteration, next() is called on the iterator, then the resulting value is assigned to the target list using the standard rules for assignment (link to assignment here), and then the suite is executed. Now, you're right that's a bit different from it maybe sounded like in my message before. I'm not necessarily saying that the entire definition of next and iter needs to be in there. But what I am saying is that the nuts and bolts of how `for` uses the protocol DO need to be in there. Things like "returned by the iterator" (which is not clear in the terminology of the iterator protocol) should be replaced by things like "returned by calling next() on the iterator" or "returned by the iterator's __next__ method". My point here is really more about the docs on other parts of Python rather than the documentation of the iterator protocol itself (or other such protocols). You're right that they need to be integrated, but it's not solely a matter of providing links; I think there needs to be considerably more detail about the precise way in which syntactic features make use of these protocols. (Or in some cases less detail may be needed; for instance, right now the `for` docs include a little note about "modifying the sequence" while it is being iterated over; this doesn't belong there because the for statement doesn't care about sequences, but only about the iterator protocol.) For the same reason, I'm less concerned about people who want to know about iter() and next() themselves. That information is nice to have, and it should be there in an easily-findable section of the docs for sure. But `for` is a basic language construct, and it's in the documentation of those basic language constructs where we need that clarity about how they interact with the protocols. I don't think this would require a complete redesign of the docs, because sure, a lot of it could stay the same, but what I meant by "overhaul" is that some restructuring (not just changing of wording) is necessary. For instance, protocols like iterator, descriptor, etc., which, as you say, may be used in various ways, really need their own section if we're going to be referring to them via hyperlinks. But more than that, I think a lot of the documentation on basic stuff like loops, attribute access, even mathematical operators, needs to be reconceptualized so that it couches the entire description in terms of the protocols that are actually used. It's not just a few links here and there. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On 30/12/20 11:20 am, Brendan Barnwell wrote:
So, to give an example, the iterator protocol should be documented right where the `for` statement is documented, and it should be explicitly framed as "this is the definition of what the `for` statement does",
Not sure about that -- there are other syntactic situations that use the iterator protocol, e.g. unpacking assignments. But there should at least be links to it from all the places it's used. -- Greg
Brendan Barnwell writes:
So, to give an example, the iterator protocol should be documented right where the `for` statement is documented, and it should be explicitly framed as "this is the definition of what the `for` statement does", not something separate.
But for the students I teach, that's not what the `for` statement "does", it's an implementation detail. The `for` statement "iterates iterables", they know what those are because they're documented as "iterable", and that's all they need to know. I agree that the word "iterable" should be linked to its definition (which is *part* of the iterator protocol: calling `iter` on an iterable produces an iterator). But the implementations of `for` and sequence unpacking should IMO be described as *applications of* the iterator protocol where that protocol is documented. (It could be immediately preceding or following the documentation of `for`, but it's conceptually separate.)
I think the way Python handles these things as a language is one of its greatest features. Basically Python is a sort of syntactic framework on which you can hang various implementations by overriding dunders, and that's very powerful. So it's unfortunate that that isn't used as the main organizing principle of the documentation.
Even in the language reference, I disagree. It should be layered, so that people who need to know the syntax and external semantics of `for` -- which is complicated enough, since it includes `else`, `continue`, and `break` -- get *only* that in the section documenting `for`, with cross-references to implementation for people who are writing implementations. Much of what you want can be achieved by appropriate sequencing of sections, for people like you (and me!) who read language references front to back. But I believe making it the main organizing principle is going to overwhelm both students and teachers of the language.
On Tue, Dec 29, 2020 at 1:51 PM Brendan Barnwell <brenbarn@brenbarn.net> wrote:
they all overlap a bit in purpose. And "protocols" seem to be the least clearly specified. Indeed, the "iteration protocol" is well known and talked about, but not well documented in the standard docs:
No offense to anyone here, but I don't really see that as a valid excuse.
It was an explanation, not an excuse -- this is an open source project -- you have better ideas for documentation, submit a PR -- or process, there are places to contribute there as well.
Documentation can evolve too. It's fine if the protocols and so on overlap, but the nature of that overlap needs to be documented.
And I made some suggestions for that -- but the Duck Typing is tricky to document, it's by definition not clearly defined. Is there
any ongoing project to bring the official documentation up to date with regard to the kinds of issues you describe?
not that I know of -- great place to volunteer, though.
Maybe something along the lines of:
"When a Mapping type is required in python (e.g. dict.update, ** unpacking), .keys() and __getitem__ are the minimum required to meet the Mapping protocol."
Even that isn't sufficiently explicit, to my mind. It needs to be something more like:
When an object is unpacked with ** unpacking, its .keys() method is called to get the available keys, and its __getitem__ is called to obtain the value for each key.
Ah, but that's different -- I was attempting to document the "mapping protocol" such as it is, not ** behavior. And where THAT should be doucmented isn't clear to me, but maybe here: https://docs.python.org/3/reference/expressions.html?highlight=unpacking where it says: "A double asterisk ** denotes dictionary unpacking. Its operand must be a mapping." With "mapping" being a lnk to the Glossary. So either there, or somewhere linked to there could be a defintion of what a minimal"mapping" is. If dict.update internally uses .keys() and that's relevant
for stuff like subclassing dict, that should be in the dict documentation.
That's actually the one place I could find where it IS documented. In other words, every operation that uses .keys() needs to be
defined to use .keys() where that operation it is defined. Or, if there is a "dict protocol" (like maybe the Mapping ABC) then there should be a separate doc page about that protocol, which gives a COMPLETE list of all methods that are part of the protocol
This is what I'm getting at: there is an informal protocol for "reading" a mapping. and it should be documented. but the COMPLETE list is simply .keys() and .__getitem__ :-) along with a COMPLETE list of
the language features that make use of that protocol.
That is pretty much impossible -- that's kind of the point of a protocol -- it can be used in arbitrary places in arbitrary code. would you expect a COMPLETE list of all the language features that use the iteration protocol?
Otherwise, the Mapping ABC is clearly defined, but not, in fact,
required in most of the contexts that expect a Mapping.
you have to actually give a complete list of situations when it's going to be implicitly called, and update that list if needed as the language evolves.
I don't "have" to do anything, nor does anyone else contributing to the development or documentation of Python. But tone aside, I think you're pushing pretty hard for something that doesn't really fit Python -- again, Duck typing is not a set of hard and fast rules. If you have specific ideas for how the documentation can be improved, by all means make them. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On 2020-12-29 15:01, Christopher Barker wrote:
along with a COMPLETE list of the language features that make use of that protocol.
That is pretty much impossible -- that's kind of the point of a protocol -- it can be used in arbitrary places in arbitrary code. would you expect a COMPLETE list of all the language features that use the iteration protocol?
Yes, but perhaps not as complete as you thought I meant. :-) What I mean by "language features" here is basically syntactic features that IMPLICITLY invoke the protocol. Certainly arbitrary code can use a protocol the sense that any function can call some other function that eventually winds up using the protocol. Also, code can explicitly invoke the iterator protocol, by, e.g., calling obj.__next__(), but that's not a problem because you can easily look up how the object defines __next__. The magic part is when you have something like `for x in obj`, which includes not even any indirect references (i.e., in called functions) to `__iter__` or `__next__`, yet winds up calling them. And it's because that is magic that we need to make it very explicit. So by a complete list of language features that use iteration I would mean. . . well, this is why we need to do this, to make sure we get them all! :-) But the idea is that all usages of the iterator protocol should funnel through a small number of entrypoints --- that's the point of a protocol, in some sense. So the list would be: `for` (including comprehensions), the builtin functions `next` and `iter` because they're sort of a manual crank of the protocol. . . is there anything else? Maybe `yield` because it implicitly creates objects that implement the protocol? That's the kind of "complete list" I mean. I'm not including things like library functions that happen to iterate over stuff. The whole point of the iterator protocol is that it defines iteration, so every such function can say "this function iterates over obj" (with a link to the protocol!) and that's enough. But cases where syntax implicitly invokes the protocol, those are the ones that must be exhaustively listed. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Thu, Dec 31, 2020 at 3:13 PM Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2020-12-29 15:01, Christopher Barker wrote:
along with a COMPLETE list of the language features that make use of that protocol.
That is pretty much impossible -- that's kind of the point of a protocol -- it can be used in arbitrary places in arbitrary code. would you expect a COMPLETE list of all the language features that use the iteration protocol?
Yes, but perhaps not as complete as you thought I meant. :-) What I mean by "language features" here is basically syntactic features that IMPLICITLY invoke the protocol.
Certainly arbitrary code can use a protocol the sense that any function can call some other function that eventually winds up using the protocol. Also, code can explicitly invoke the iterator protocol, by, e.g., calling obj.__next__(), but that's not a problem because you can easily look up how the object defines __next__. The magic part is when you have something like `for x in obj`, which includes not even any indirect references (i.e., in called functions) to `__iter__` or `__next__`, yet winds up calling them. And it's because that is magic that we need to make it very explicit.
That seems pretty reasonable actually. That sort of list wouldn't mention __setstate__, for instance, but it would include everything that can be invoked in some way that doesn't look like a function call. So that would include object lifecycle methods (__new__, __init__, __del__), all your operator functions (__{,r,i}add__ etc), and everything that is necessary to make something behave as if it were some type of thing (__call__ to be callable, __iter__ to be iterable, __enter__/__exit__ to be, uhh, "withable"?). This would be quite a large list, though, and it would have to cope with odd edge cases like this: class Foo: def __getitem__(self, x): if x in range(10): return x * x raise IndexError
list(Foo()) [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
It's fully iterable even though it has no __iter__ method. But this would be a useful piece of reference documentation - all dunders (and any non-dunders) called by the language itself. It'd be very detaily and not useful to most people, but could be of interest. Is there any way to search the CPython source code for all use of dunders, without getting hits for all the places where they're defined? ChrisA
On 30/12/20 9:44 am, Christopher Barker wrote:
So there are dunders, and protocols, and ABCs, and they all overlap a bit in purpose. And "protocols" seem to be the least clearly specified.
there are various protocols described in the C API docs, including the Mapping protocol:
https://docs.python.org/3/c-api/mapping.html <https://docs.python.org/3/c-api/mapping.html>
But I *think* that's really about the C API -- not what we are talking about here.
The library reference doesn't seem to use the terms "sequence protocol" and "mapping protocol". It talks about "sequence types" and "mapping types", but doesn't use the word "protocol" in relation to them. Nor is there any clear list of methods that a type needs to provide in order to be considered a sequence or mapping. There are lists of operations supported by the built-in container types, but those are fairly extensive, and it's not obvious which ones are vital. So are the "sequence protocol" and "mapping protocol" really mythical beasts that don't really exist? Are they more akin to the infamous "file-like object" which is whatever it needs to be for the purpose at hand? Guido has since said that the ABCs are intended to be definitive, but the docs don't really make that clear either. (And the ABC doc page talks about "APIs", not "protocols"!) -- Greg
On Tue, Dec 29, 2020 at 4:24 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
The library reference doesn't seem to use the terms "sequence protocol" and "mapping protocol". It talks about "sequence types" and "mapping types", but doesn't use the word "protocol" in relation to them.
The only one I've really seen used fairly universally is the "iterator protocol" -- though in the docs, it's under "Iterator Types": https://docs.python.org/3/library/stdtypes.html?highlight=iterable#iterator-... But then, under that: "The iterator objects themselves are required to support the following two methods, which together form the iterator protocol:" So that's (I think) the only place it documented and called a protocol.
Nor is there any clear list of methods that a type needs to provide in order to be considered a sequence or mapping.
Isn't that what the ABCs are? The trick here, in this context, is that something doesn't need to be a fully functioning Mapping to be unpacked. But there are a handful of places where a subset of the Mapping API is needed (apparently .keys() and __getitem__, with a particular relationship). So It is good to (a) be consistent about that -- which I think is the case today, and (b) document it -- which is not the case. Given that where the iteration protocol is described is under built in types, it makes sense to me to t put it there -- maybe call it the "mapping unpacking protocol" (better ideas welcome!).
There are lists of operations supported by the built-in container types, but those are fairly extensive, and it's not obvious which ones are vital.
there's more than that -- there's the ABCs. But anyway, the problem with "vital" is the core of duck typing -- what's vital depends on the context -- We can only (and should only) define that for a few places where they are used in core Python -- is there anything else other than Mapping unpacking that we should identify and document?
So are the "sequence protocol" and "mapping protocol" really mythical beasts that don't really exist? Are they more akin to the infamous "file-like object" which is whatever it needs to be for the purpose at hand?
See above -- I don't think there's a Sequence protocol at all. And the "mapping protocol" that at least I"ve been talking about is in a particular context -- unpacking, or somewhat more generally, when you want to be able to access all the items in a Mapping, but not do anything else with it. Guido has since said that the ABCs are intended to be definitive,
but the docs don't really make that clear either. (And the ABC doc page talks about "APIs", not "protocols"!)
I think the ABCs Are definitive -- but what are they definitive of? They are certainly not minimal protocols. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On 2020-12-30 19:26, Christopher Barker wrote:
The trick here, in this context, is that something doesn't need to be a fully functioning Mapping to be unpacked.
But there are a handful of places where a subset of the Mapping API is needed (apparently .keys() and __getitem__, with a particular relationship). So It is good to (a) be consistent about that -- which I think is the case today, and (b) document it -- which is not the case.
Given that where the iteration protocol is described is under built in types, it makes sense to me to t put it there -- maybe call it the "mapping unpacking protocol" (better ideas welcome!).
I don't think that is a good place for it, because it is not a builtin type. It is a protocol which some builtin types happen to implement. As I mentioned before, the place I would put it is in the documentation for the `for` statement, because the most common reason to implement the iterator protocol is to create an object that works with the `for` statement, and because the protocol describes iteration as a mechanism (not any individual type) so is most naturally placed with the description of that mechanism. Steven D'Aprano suggested that the protocl docs could be somewhere else and linked to from places that rely on it, and I think that idea has merit as well. But if we do that, I would say the place where the iterator protocol and other such protocols are documented should be a separate section of the documentation for something like "protocols which implement Python concepts" (although of course that's a terrible name). I think the current placement of this information about the protocol )in the builtin types section) is terrible. The builtin types section is the place for the kind of see-also thing that Steven mentioned --- give the documentation for lists with their methods and so on, and say "lists implement the iterator protocol (link)". -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Tue, Dec 29, 2020 at 10:30 AM Guido van Rossum <guido@python.org> wrote:
Interesting -- thanks for taking up the challenge. I still suspect that if we ran the corresponding benchmark at the C level, the first form would win,
I was thinking that might be the case -- but either way, there is little difference, and at least for ** unpacking, it's probably mostly used for small dicts anyway.
than the ABC)? -- that is, would **unpacking be able to use .items() and keys() be used in other contexts?
And why does ** unpacking need to check at all (LBYL) couldn't it simply do something like:
{k: d[k] for k in d}
I don't understand why LBYL is considered such an anti-pattern. It helps
Does there need to be a single defined "protocol" for a mapping (other produce much clearer error messages in this case for users who are exploring this feature, and distinguishing *early* between sequences and mappings is important for that.
Fair enough, though in this case, it's producing a not quite clear error message, whereas simply trying to call keys() would reflect the actual error. I was thinking about this a bit more, and realized that this pattern is used (at least) in the dict constructor and dict.update() -- but in both of those cases, they can take either a mapping or an iterable of (key, value) pairs. So it is required to make a distinction, and looking for keys() is as good a way as any (maybe the best, certainly well established) (and if you pass a object with a keys() method, but no __getitem__ into dict(), you get: "TypeError: 'MinMap' object is not subscriptable" -- not anything it about it needing to be a Mapping) But for **, which only supports Mappings, maybe there is no need to check for keys() -- it is clearly defined that iter(a_mapping) iterates over the keys, so that part should work, and if the __getitem__ doesn't work appropriately, then that's not really different than passing a iterable that doesn't produce valid (key, value) pairs to dict(). But this is all theoretical -- it's established, and a bit better docs should clear up the confusion. One more note on the docstrings: dict.update() says: "... If E is present and has a .keys() method..." Which nicely defines what is actually required. Whereas dict() says: "...dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs..." Without saying anything about how it determines what whether it's a mapping. So maybe that could be made a bit more clear as well. I also just noticed something else in the docs -- in typing, there is a Protocol type -- maybe we could/should pre-define a mapping protocol type? Or maybe a MinimialMapping ABC, analogous to the Iterable ABC -- though no idea what to call it that would be clear :-) -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Tue, Dec 29, 2020 at 10:30:16AM -0800, Guido van Rossum wrote: [Christopher]
Does there need to be a single defined "protocol" for a mapping (other than the ABC)? -- that is, would **unpacking be able to use .items() and keys() be used in other contexts?
Yes, I think there should be one protocol, or interface if you prefer, for something to be considered dict-like. If some operations expect objects to quack like a dict, then it would be annoying if other related operations expect them to swim like a dict. The status quo right now is: (1) The dict constructor, dict.update, and dict union assignment `|=` all support the double-barrelled interface: - try keys() and `__getitem__` - otherwise fall back onto direct iteration. (2) Dict unpacking `**` only supports the keys/getitem interface. My guess is that's an oversight, not a deliberate difference in behaviour. I think that those four operations should operate in a consistant manner. Otherwise, we would be okay if we can write a object that quacks like a dict via keys/getitem, since that is supported by all four, but suppose you can't and have to use the fallback operation. So you dutifully add the direct iteration API, and now your object works in the dict constructor etc but not dict unpacking. So you have to *also* add the items() method you suggest, an otherwise unnecessary second way of doing the same thing. This is annoying and error-prone: being two methods, there is a risk that they will diverge in behaviour, or that documentation will diverge, and now users of your object need to worry about whether to use direct iteration or items(), and you know that people will keep asking what's the difference between them. It wouldn't be a disaster if we went this way, but it would add unnecessary friction. There is however a third possibility: extend the dict interface by adding a second fallback. I think we're stuck with keeping the existing order as first and second attempts, but we can tack items() at the end: - try keys() and `__getitem__` - try direct iteration - otherwise, try items() I don't hate this, but honestly I think it is YAGNI. [Guido]
I don't understand why LBYL is considered such an anti-pattern. It helps produce much clearer error messages in this case for users who are exploring this feature, and distinguishing *early* between sequences and mappings is important for that. Long ago we decided that the distinctive feature is that mappings have a `keys()` method whereas sequences don't (and users who add a `keys()` method to a sequence are just asking for trouble). So that's what we use.
I remember learning that EAPF was preferred over LBYL back when I started in Python 1.5 days. I think the main reasons were to encourage people to duck-type, and to avoid Time Of Check To Time Of Use errors, e.g. when opening files. But although EAFP was preferred, there are definitely times when LBYL is better, and I think that calling LBYL "unPythonic" is a hyper-correction. -- Steve
2020-12-28 Christopher Barker <pythonchb@gmail.com> dixit:
I don't know about the OP, but all I wanted was a clear definition of the part of the API needed to support **, and apparently it's a keys() method that returns an iterator of the keys, and a __getitem__ [...]
To be more precise: an *iterable* of the keys -- not necessarily an *iterator*; it can be, for example, a list or string: >>> def fun(**kwargs):print(kwargs) ... >>> class C: ... def keys(self): return list('abc') ... def __getitem__(self, key): return 42 ... >>> c = C() >>> fun(**c) {'a': 42, 'c': 42, 'b': 42} And, even, the `keys()` method does not need to be defined on the class level -- it can be a callable attribute of an *instance*: >>> class D: ... def __getitem__(self, key): return 42 ... >>> d = D() >>> d.keys = lambda: 'abc' >>> fun(**d) {'a': 42, 'c': 42, 'b': 42} Cheers, *j
So, I’m ready to admit that I was mistaken in considering `**` as an operator. Therefore, my further reasoning and suggestions were not correct. If I had the right to vote, I would vote for `dict.update()` like behaviour. Thanks for your attention and clarification. I think it is worth creating a separate thread about the correct conversion of `dataclass -> dict`. I will create it soon.
participants (11)
-
Anton Abrosimov
-
Brendan Barnwell
-
Chris Angelico
-
Christopher Barker
-
Greg Ewing
-
Guido van Rossum
-
Jan Kaliszewski
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Steve Barnes
-
Steven D'Aprano