Proposed class for collections: dynamicdict
I have implemented a class in the C code of the collections module which has similar behavior to the defaultdict class. This class, dynamicdict, supplies values for keys that have not yet been added using a default factory callable, but unlike defaultdict, the missing key itself is passed to the callable. This code can be seen here: https://github.com/swfarnsworth/cpython/blob/3.8/Modules/_collectionsmodule.... While this does introduce a lot of redundant code, I'm not sure how it could be done without copying the implementation of the defaultdict class and adjusting how the default factory is called. For example, I don't believe that it's possible to support both behaviors within the defaultdict class without breaking backwards compatibility or adding another parameter to the constructor for the defaultdict class. I don't know if a PEP is needed for this to be added to the standard library, or if it's something that anyone other than myself wants to see added, but I would be happy to further explain the concept, implementation, potential use cases, or anything else that might work towards the adoption of this feature. Regards, Steele
On Apr 10, 2020, at 15:53, Steele Farnsworth <swfarnsworth@gmail.com> wrote:
I have implemented a class in the C code of the collections module which has similar behavior to the defaultdict class. This class, dynamicdict, supplies values for keys that have not yet been added using a default factory callable, but unlike defaultdict, the missing key itself is passed to the callable. This code can be seen here: https://github.com/swfarnsworth/cpython/blob/3.8/Modules/_collectionsmodule....
That sounds like exactly the functionality for dict.__missing__, which is already there. Why can’t you just subclass dict and override that? Is it a readability issue, or a performance issue, or do you need to have a factory function that’s different per instance or accessible as an attribute or some other feature exactly like defaultdict, or…? There might well be a good answer to one of these—although unless that good answer is performance, it seems like this should be a 5-line class in Python, not a custom C class. (Keep in mind that defaultdict was added somewhere around 2.4 or 2.5, while __missing__ has only been there since somewhere around 2.7/3.3. I’ll bet it would be different if it were invented today.)
On Fri, Apr 10, 2020 at 06:02:25PM -0700, Andrew Barnert via Python-ideas wrote:
(Keep in mind that defaultdict was added somewhere around 2.4 or 2.5, while __missing__ has only been there since somewhere around 2.7/3.3. I’ll bet it would be different if it were invented today.)
Both `__missing__` and `defaultdict` were added in version 2.5. https://docs.python.org/2/library/stdtypes.html#dict https://docs.python.org/2/library/collections.html#defaultdict-objects -- Steven
I have built this data structure countless times. So I am in favor.
Why can’t you just subclass dict and override that?
Because TypeError: multiple bases have instance lay-out conflict is one of my least favorite errors. Perhaps `__missing__` could be a first class part of the getitem of protocol, instead of a `dict` specific feature. So that ``` r = x[key] ``` means: ``` try: r = x.__getitem__(key) except KeyError as e: # should we also catch IndexError? try: missing = x.__missing__ except AttributeError: raise e from None r = missing(key) ``` Obviously this would come at some performance cost for non dict mappings so I don't know if this would fly. So instead maybe there could have standard decorator to get the same behavior? ``` def usemissing(getitem): @wraps(getitem) def wrapped(self, key): try: return getitem(self, key) except KeyError as e: try: missing = self.__missing__ except AttributeError: raise e from None return missing(key) return wrapped ``` Alternatively, it could be implemented as part of one of the ABCs maybe something like: ``` class MissingMapping(Mapping): # Could also give MissingMapping its own metaclass # and do the modification of __getitem__ there. def __init_subclass__(cls, **kwargs): super().__init_subclass__(**kwargs) cls.__getitem__ = usemissing(cls.__getitem__) @abstractmethod def __missing__(self, key): pass ``` Caleb Donovick On Fri, Apr 10, 2020 at 6:39 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Apr 10, 2020 at 06:02:25PM -0700, Andrew Barnert via Python-ideas wrote:
(Keep in mind that defaultdict was added somewhere around 2.4 or 2.5, while __missing__ has only been there since somewhere around 2.7/3.3. I’ll bet it would be different if it were invented today.)
Both `__missing__` and `defaultdict` were added in version 2.5.
https://docs.python.org/2/library/stdtypes.html#dict https://docs.python.org/2/library/collections.html#defaultdict-objects
-- Steven _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HARDO2... Code of Conduct: http://python.org/psf/codeofconduct/
On Apr 13, 2020, at 18:44, Caleb Donovick <donovick@cs.stanford.edu> wrote: I have built this data structure countless times. So I am in favor.
Maybe you can give a concrete example of what you need it for, then? I think that would really help the proposal. Especially if your example needs a per-instance rather than per-class factory function.
Why can’t you just subclass dict and override that?
Because TypeError: multiple bases have instance lay-out conflict is one of my least favorite errors.
But defaultdict, being a subclass or dict, has the same problem in the same situations, and (although I haven’t checked) I assume the same is true for the OP’s dynamicdict.
Perhaps `__missing__` could be a first class part of the getitem of protocol, instead of a `dict` specific feature. So that ``` r = x[key] ``` means: ``` try: r = x.__getitem__(key) except KeyError as e: # should we also catch IndexError? try: missing = x.__missing__ except AttributeError: raise e from None r = missing(key) ```
Obviously this would come at some performance cost for non dict mappings so I don't know if this would fly.
Besides performance, I don’t think it fits with Guido’s conception of the protocols as being more minimal than the builtin types—e.g., set has not just a & operator, but also an intersection method that takes 0 or more arbitrary iterables; the set protocol has no such method, so collections.abc.Set neither specifies nor provides an intersection method). It’s a bit muddy of a conception at the edges, but I think this goes over the line, and maybe have been explicitly thought about and rejected for the same reason as Set.intersection. On the other hand, none of that is an argument or any kind against your method decorator:
So instead maybe there could have standard decorator to get the same behavior? ``` def usemissing(getitem): @wraps(getitem) def wrapped(self, key): try: return getitem(self, key) except KeyError as e: try: missing = self.__missing__ except AttributeError: raise e from None return missing(key) return wrapped ```
This seems like a great idea, although maybe it would be easier to use as a class decorator rather than a method decorator. Either this: def usemissing(cls): missing = cls.__missing__ getitem = cls.__getitem__ def __getitem__(self, key): try: return getitem(self, key) except KeyError: return missing(self, key) cls.__getitem__ = __getitem__ return cls Or this: def usemissing(cls): getitem = cls.__getitem__ def __getitem__(self, key): try: return getitem(self, key) except KeyError: return type(self).__missing__(self, key) cls.__getitem__ = __getitem__ return cls This also preserves the usual class-based rather than instance-based lookup for most special methods (including __missing__ on dict subclasses). The first one has the advantage of failing at class decoration time rather than at first missing lookup time if you forget to include a __missing__, but it has the cost that (unlike a dict subclass) you can’t monkeypatch __missing__ after construction time. So I think I’d almost always prefer the first, but the second might be a better fit for the stdlib anyway? I think either the method decorator or the class decorator makes sense for the stdlib. The only question is where to put it. Either fits in nicely with things like cached_property and total_ordering in functools. I’m not sure people will think to look for it there, as opposed to in collections or something else in the Data Types chapter in the docs, but that’s true of most of functools, and at least once people discover it (from a Python-list or StackOverflow question or whatever) they’ll learn where it is and be able to use it easily, just like the rest of that module. It’s simple, but something many Python programmers couldn’t write for themselves, or would get wrong and have a hard time debugging, and it seems like the most flexible and least obtrusive way to do it. (It does still need actual motivating examples, though. Historically, the bar seems to be lower for new decorators in functools than new classes in collections, but it’s still not no bar…) Additionally—I’m a lot less sure if this one belongs in the stdlib like @usemissing, but if you were going to put this on PyPI as a mappingtools or collections2 or more-functools or whatever—you could have an @addmissing(missingfunc) decorator, to handle cases where you want to adapt some third-party mapping type without modifying the code or subclassing: from sometreelib import SortedDict def missing(self, key): # ... whatever ... SortedDict = addmissing(missing)(SortedDict) And if the common use cases are the same kinds of trivial functions as defaultdict, you could also do this: @addmissing(lambda self, key: key) class MyDict…
Alternatively, it could be implemented as part of one of the ABCs maybe something like: ``` class MissingMapping(Mapping): # Could also give MissingMapping its own metaclass # and do the modification of __getitem__ there. def __init_subclass__(cls, **kwargs): super().__init_subclass__(**kwargs) cls.__getitem__ = usemissing(cls.__getitem__)
@abstractmethod def __missing__(self, key): pass ```
Presumably you’d also want to precompose a MutableMissingMapping ABC. Most user mappings are mutable, and I suspect that’s even more true for those that need __missing__, given that most uses of defaultdict are things like building up a multidict without knowing all the keys in advance. As for the implementation, I think __init_subclass__ makes more sense than a metaclass (presumably a subclass or ABCMeta). Since mixins are all about composing, often with multiple inheritance, it’s hard to add metaclasses without interfering with user subclasses. (Or even with your own future—imagine if someone realizes MutableMapping needs its own metaclass, and MissingMapping already has one; now there’s no way to write MutableMissingMapping.) Composing ABCs with non-ABC mixins is already more of a pain than would be ideal, and I think a new submetaclass would make it worse. But at any rate, I’m not sure this is a good idea. None of the other ABCs hide methods like this. For example, Mapping will give you a __contains__ if you don’t have one, but if you do write one that does things differently, yours overrides the default; here, there’d be no way to override the default __getitem__ to do things differently, because that would just get wrapped and replaced. That isn’t unreasonable behavior for a mixin in general, but I think it is confusing for an ABC/mixin hybrid in collections.abc. Also, a protocol or ABC is something you can check for compliance (at runtime, or statically in mypy, or just in theory even if not in the actual code); is there ever any point in asking whether an object complies with MissingMapping? It’s something you can use an object as, but is there any way you can use a MissingMapping differently from a Mapping? So I think it’s not an ABC because it’s not a protocol. Of course you could just make it a pure mixin that isn’t an ABC (and maybe isn’t in collections.abc), which also solves all of the problems above (except the optional one with the metaclass, but you already avoided that). But at that point, are there any advantages over the method or class decorator?
I've implemented the class as a stand-alone module here: https://github.com/swfarnsworth/dynamicdict It could in theory be made significantly more concise if `defdict_type` were the base for this class instead of `PyDict_Type`. On Tue, Apr 14, 2020 at 1:32 PM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
On Apr 13, 2020, at 18:44, Caleb Donovick <donovick@cs.stanford.edu> wrote:
I have built this data structure countless times. So I am in favor.
Maybe you can give a concrete example of what you need it for, then? I think that would really help the proposal. Especially if your example needs a per-instance rather than per-class factory function.
Why can’t you just subclass dict and override that?
Because TypeError: multiple bases have instance lay-out conflict is one of my least favorite errors.
But defaultdict, being a subclass or dict, has the same problem in the same situations, and (although I haven’t checked) I assume the same is true for the OP’s dynamicdict.
Perhaps `__missing__` could be a first class part of the getitem of protocol, instead of a `dict` specific feature. So that
``` r = x[key] ``` means: ``` try: r = x.__getitem__(key) except KeyError as e: # should we also catch IndexError? try: missing = x.__missing__ except AttributeError: raise e from None r = missing(key) ```
Obviously this would come at some performance cost for non dict mappings so I don't know if this would fly.
Besides performance, I don’t think it fits with Guido’s conception of the protocols as being more minimal than the builtin types—e.g., set has not just a & operator, but also an intersection method that takes 0 or more arbitrary iterables; the set protocol has no such method, so collections.abc.Set neither specifies nor provides an intersection method). It’s a bit muddy of a conception at the edges, but I think this goes over the line, and maybe have been explicitly thought about and rejected for the same reason as Set.intersection.
On the other hand, none of that is an argument or any kind against your method decorator:
So instead maybe there could have standard decorator to get the same behavior? ``` def usemissing(getitem): @wraps(getitem) def wrapped(self, key): try: return getitem(self, key) except KeyError as e: try: missing = self.__missing__ except AttributeError: raise e from None return missing(key) return wrapped ```
This seems like a great idea, although maybe it would be easier to use as a class decorator rather than a method decorator. Either this:
def usemissing(cls): missing = cls.__missing__ getitem = cls.__getitem__ def __getitem__(self, key): try: return getitem(self, key) except KeyError: return missing(self, key) cls.__getitem__ = __getitem__ return cls
Or this:
def usemissing(cls): getitem = cls.__getitem__ def __getitem__(self, key): try: return getitem(self, key) except KeyError: return type(self).__missing__(self, key) cls.__getitem__ = __getitem__ return cls
This also preserves the usual class-based rather than instance-based lookup for most special methods (including __missing__ on dict subclasses).
The first one has the advantage of failing at class decoration time rather than at first missing lookup time if you forget to include a __missing__, but it has the cost that (unlike a dict subclass) you can’t monkeypatch __missing__ after construction time. So I think I’d almost always prefer the first, but the second might be a better fit for the stdlib anyway?
I think either the method decorator or the class decorator makes sense for the stdlib. The only question is where to put it. Either fits in nicely with things like cached_property and total_ordering in functools. I’m not sure people will think to look for it there, as opposed to in collections or something else in the Data Types chapter in the docs, but that’s true of most of functools, and at least once people discover it (from a Python-list or StackOverflow question or whatever) they’ll learn where it is and be able to use it easily, just like the rest of that module.
It’s simple, but something many Python programmers couldn’t write for themselves, or would get wrong and have a hard time debugging, and it seems like the most flexible and least obtrusive way to do it. (It does still need actual motivating examples, though. Historically, the bar seems to be lower for new decorators in functools than new classes in collections, but it’s still not no bar…)
Additionally—I’m a lot less sure if this one belongs in the stdlib like @usemissing, but if you were going to put this on PyPI as a mappingtools or collections2 or more-functools or whatever—you could have an @addmissing(missingfunc) decorator, to handle cases where you want to adapt some third-party mapping type without modifying the code or subclassing:
from sometreelib import SortedDict def missing(self, key): # ... whatever ... SortedDict = addmissing(missing)(SortedDict)
And if the common use cases are the same kinds of trivial functions as defaultdict, you could also do this:
@addmissing(lambda self, key: key) class MyDict…
Alternatively, it could be implemented as part of one of the ABCs maybe something like: ``` class MissingMapping(Mapping): # Could also give MissingMapping its own metaclass # and do the modification of __getitem__ there. def __init_subclass__(cls, **kwargs): super().__init_subclass__(**kwargs) cls.__getitem__ = usemissing(cls.__getitem__)
@abstractmethod def __missing__(self, key): pass ```
Presumably you’d also want to precompose a MutableMissingMapping ABC. Most user mappings are mutable, and I suspect that’s even more true for those that need __missing__, given that most uses of defaultdict are things like building up a multidict without knowing all the keys in advance.
As for the implementation, I think __init_subclass__ makes more sense than a metaclass (presumably a subclass or ABCMeta). Since mixins are all about composing, often with multiple inheritance, it’s hard to add metaclasses without interfering with user subclasses. (Or even with your own future—imagine if someone realizes MutableMapping needs its own metaclass, and MissingMapping already has one; now there’s no way to write MutableMissingMapping.) Composing ABCs with non-ABC mixins is already more of a pain than would be ideal, and I think a new submetaclass would make it worse.
But at any rate, I’m not sure this is a good idea. None of the other ABCs hide methods like this. For example, Mapping will give you a __contains__ if you don’t have one, but if you do write one that does things differently, yours overrides the default; here, there’d be no way to override the default __getitem__ to do things differently, because that would just get wrapped and replaced. That isn’t unreasonable behavior for a mixin in general, but I think it is confusing for an ABC/mixin hybrid in collections.abc.
Also, a protocol or ABC is something you can check for compliance (at runtime, or statically in mypy, or just in theory even if not in the actual code); is there ever any point in asking whether an object complies with MissingMapping? It’s something you can use an object as, but is there any way you can use a MissingMapping differently from a Mapping? So I think it’s not an ABC because it’s not a protocol.
Of course you could just make it a pure mixin that isn’t an ABC (and maybe isn’t in collections.abc), which also solves all of the problems above (except the optional one with the metaclass, but you already avoided that). But at that point, are there any advantages over the method or class decorator?
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/N6K2PL... Code of Conduct: http://python.org/psf/codeofconduct/
Besides performance, I don’t think it fits with Guido’s conception of the protocols as being more minimal than the builtin types—e.g., set has not just a & operator, but also an intersection method that takes 0 or more arbitrary iterables; the set protocol has no such method, so
collections.abc.Set neither specifies nor provides an intersection method). It’s a bit muddy of a conception at the edges, but I think this goes over the line, and maybe have been explicitly thought about and rejected for the same reason as Set.intersection.
Making __missing__ a first class part of how __getitem__ seems more analogous to __getattr__ and __getattribute__ than the intersection method. Is there any other dunder that is only implicitly called on builtin types? I mostly agree with everything else you said. On Tue, Apr 14, 2020 at 11:34 AM Steele Farnsworth <swfarnsworth@gmail.com> wrote:
I've implemented the class as a stand-alone module here: https://github.com/swfarnsworth/dynamicdict
It could in theory be made significantly more concise if `defdict_type` were the base for this class instead of `PyDict_Type`.
On Tue, Apr 14, 2020 at 1:32 PM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
On Apr 13, 2020, at 18:44, Caleb Donovick <donovick@cs.stanford.edu> wrote:
I have built this data structure countless times. So I am in favor.
Maybe you can give a concrete example of what you need it for, then? I think that would really help the proposal. Especially if your example needs a per-instance rather than per-class factory function.
Why can’t you just subclass dict and override that?
Because TypeError: multiple bases have instance lay-out conflict is one of my least favorite errors.
But defaultdict, being a subclass or dict, has the same problem in the same situations, and (although I haven’t checked) I assume the same is true for the OP’s dynamicdict.
Perhaps `__missing__` could be a first class part of the getitem of protocol, instead of a `dict` specific feature. So that
``` r = x[key] ``` means: ``` try: r = x.__getitem__(key) except KeyError as e: # should we also catch IndexError? try: missing = x.__missing__ except AttributeError: raise e from None r = missing(key) ```
Obviously this would come at some performance cost for non dict mappings so I don't know if this would fly.
Besides performance, I don’t think it fits with Guido’s conception of the protocols as being more minimal than the builtin types—e.g., set has not just a & operator, but also an intersection method that takes 0 or more arbitrary iterables; the set protocol has no such method, so collections.abc.Set neither specifies nor provides an intersection method). It’s a bit muddy of a conception at the edges, but I think this goes over the line, and maybe have been explicitly thought about and rejected for the same reason as Set.intersection.
On the other hand, none of that is an argument or any kind against your method decorator:
So instead maybe there could have standard decorator to get the same behavior? ``` def usemissing(getitem): @wraps(getitem) def wrapped(self, key): try: return getitem(self, key) except KeyError as e: try: missing = self.__missing__ except AttributeError: raise e from None return missing(key) return wrapped ```
This seems like a great idea, although maybe it would be easier to use as a class decorator rather than a method decorator. Either this:
def usemissing(cls): missing = cls.__missing__ getitem = cls.__getitem__ def __getitem__(self, key): try: return getitem(self, key) except KeyError: return missing(self, key) cls.__getitem__ = __getitem__ return cls
Or this:
def usemissing(cls): getitem = cls.__getitem__ def __getitem__(self, key): try: return getitem(self, key) except KeyError: return type(self).__missing__(self, key) cls.__getitem__ = __getitem__ return cls
This also preserves the usual class-based rather than instance-based lookup for most special methods (including __missing__ on dict subclasses).
The first one has the advantage of failing at class decoration time rather than at first missing lookup time if you forget to include a __missing__, but it has the cost that (unlike a dict subclass) you can’t monkeypatch __missing__ after construction time. So I think I’d almost always prefer the first, but the second might be a better fit for the stdlib anyway?
I think either the method decorator or the class decorator makes sense for the stdlib. The only question is where to put it. Either fits in nicely with things like cached_property and total_ordering in functools. I’m not sure people will think to look for it there, as opposed to in collections or something else in the Data Types chapter in the docs, but that’s true of most of functools, and at least once people discover it (from a Python-list or StackOverflow question or whatever) they’ll learn where it is and be able to use it easily, just like the rest of that module.
It’s simple, but something many Python programmers couldn’t write for themselves, or would get wrong and have a hard time debugging, and it seems like the most flexible and least obtrusive way to do it. (It does still need actual motivating examples, though. Historically, the bar seems to be lower for new decorators in functools than new classes in collections, but it’s still not no bar…)
Additionally—I’m a lot less sure if this one belongs in the stdlib like @usemissing, but if you were going to put this on PyPI as a mappingtools or collections2 or more-functools or whatever—you could have an @addmissing(missingfunc) decorator, to handle cases where you want to adapt some third-party mapping type without modifying the code or subclassing:
from sometreelib import SortedDict def missing(self, key): # ... whatever ... SortedDict = addmissing(missing)(SortedDict)
And if the common use cases are the same kinds of trivial functions as defaultdict, you could also do this:
@addmissing(lambda self, key: key) class MyDict…
Alternatively, it could be implemented as part of one of the ABCs maybe something like: ``` class MissingMapping(Mapping): # Could also give MissingMapping its own metaclass # and do the modification of __getitem__ there. def __init_subclass__(cls, **kwargs): super().__init_subclass__(**kwargs) cls.__getitem__ = usemissing(cls.__getitem__)
@abstractmethod def __missing__(self, key): pass ```
Presumably you’d also want to precompose a MutableMissingMapping ABC. Most user mappings are mutable, and I suspect that’s even more true for those that need __missing__, given that most uses of defaultdict are things like building up a multidict without knowing all the keys in advance.
As for the implementation, I think __init_subclass__ makes more sense than a metaclass (presumably a subclass or ABCMeta). Since mixins are all about composing, often with multiple inheritance, it’s hard to add metaclasses without interfering with user subclasses. (Or even with your own future—imagine if someone realizes MutableMapping needs its own metaclass, and MissingMapping already has one; now there’s no way to write MutableMissingMapping.) Composing ABCs with non-ABC mixins is already more of a pain than would be ideal, and I think a new submetaclass would make it worse.
But at any rate, I’m not sure this is a good idea. None of the other ABCs hide methods like this. For example, Mapping will give you a __contains__ if you don’t have one, but if you do write one that does things differently, yours overrides the default; here, there’d be no way to override the default __getitem__ to do things differently, because that would just get wrapped and replaced. That isn’t unreasonable behavior for a mixin in general, but I think it is confusing for an ABC/mixin hybrid in collections.abc.
Also, a protocol or ABC is something you can check for compliance (at runtime, or statically in mypy, or just in theory even if not in the actual code); is there ever any point in asking whether an object complies with MissingMapping? It’s something you can use an object as, but is there any way you can use a MissingMapping differently from a Mapping? So I think it’s not an ABC because it’s not a protocol.
Of course you could just make it a pure mixin that isn’t an ABC (and maybe isn’t in collections.abc), which also solves all of the problems above (except the optional one with the metaclass, but you already avoided that). But at that point, are there any advantages over the method or class decorator?
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/N6K2PL... Code of Conduct: http://python.org/psf/codeofconduct/
On Apr 15, 2020, at 14:08, Caleb Donovick <donovick@cs.stanford.edu> wrote:
Besides performance, I don’t think it fits with Guido’s conception of the protocols as being more minimal than the builtin types—e.g., set has not just a & operator, but also an intersection method that takes 0 or more arbitrary iterables; the set protocol has no such method, so collections.abc.Set neither specifies nor provides an intersection method). It’s a bit muddy of a conception at the edges, but I think this goes over the line, and maybe have been explicitly thought about and rejected for the same reason as Set.intersection.
Making __missing__ a first class part of how __getitem__ seems more analogous to __getattr__ and __getattribute__ than the intersection method.
That’s a good point, but…
Is there any other dunder that is only implicitly called on builtin types?
Actually, IIRC, __getattr__ is itself another example. Python never calls __getattr__, only __getattribute__. However, object.__getattribute__ (and type.__getattribute__) calls __getattr__ on failure, the exact same way dict.__getitem__ calls __missing__ on failure. (Obviously there are differences in practice—e.g., every object is-a object but not every mapping is-a dict, and it’s pretty rare to implement __getattribute__ without super… but if you do implement __getattribute__ without super, or do the equivalent in a C extension class, __getattr__ doesn’t get called, just like __missing__; I’m pretty sure the docs explain this somewhere.) But there are fallbacks that really are a “first class part of the protocol” as you intended: iteration falls back to __getitem__, construction calls __init__ if __new__ returns an instance, addition tries __radd__ before and/or after __add__, etc., and that’s all definitely part of the protocol, baked into the interpreter itself, not just part of how some class implements the protocol. So your proposal is perfectly coherent and reasonable (at least if you change it from “like __getattr__” to “like __radd__” or something), and this is really just a footnote. But I think I’m still against it, both for the “protocols as minimal as possible” reason, and because there’s no obvious way to make it a first class part of the protocol for mappings without making it a first class part of the protocol for sequences (because at the level of the interpreter, and the syntax, they’re both the same subscription protocol). As a side note to the footnote, check out the pickle protocol. IIRC, the fallback from __reduce_ex__ to __reduce__ is a first-class part of the protocol (like iteration, construction, and arithmetic), but the fallback from __getnewargs__ or __getstate__ is just part of some builtin types’ implementations (like attribution and subscription).
On Mon, Apr 13, 2020 at 06:43:53PM -0700, Caleb Donovick wrote:
Why can’t you just subclass dict and override that?
Because TypeError: multiple bases have instance lay-out conflict is one of my least favorite errors.
For the benefit of the people on this list who aren't as familiar with your code as you are, could you explain how that comment is relevant? I don't like "multiple bases have instance lay-out conflict" errors either. But I don't get them from subclassing dict. py> class MyDict(dict): ... def __missing__(self, key): ... return (key, None) ... py> MyDict()[999] (999, None) Works fine. Normally, if I get that error, it's a sign that I'm probably using too much multiple inheritence and not enough composition, or even that I should step away from from the OO paradigm and reconsider whether or not I actually need a hybrid list+float object :-) So can you please explain why how this exception is relevant, and how the proposal here will fix it? defaultdict is subject to the same "lay-out conflict" issue, so from my naive point of view, this proposal won't help you avoid the error. py> from collections import defaultdict py> class Weird(int, defaultdict): ... pass ... Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: multiple bases have instance lay-out conflict
Perhaps `__missing__` could be a first class part of the getitem of protocol, instead of a `dict` specific feature. So that ``` r = x[key] ``` means: ``` try: r = x.__getitem__(key) except KeyError as e: # should we also catch IndexError? try: missing = x.__missing__ except AttributeError: raise e from None r = missing(key) ```
I don't see how this is much different from what dicts already do. You can't add a `__missing__` attribute to literal dicts (or lists, tuples, etc) since they don't take attributes. {}.__missing__ = lambda key: key fails, so you have to subclass anyway.
Obviously this would come at some performance cost for non dict mappings so I don't know if this would fly.
Giving every single dict a `__missing__` member, for the sake of the rare instance that needs one, would come at a performance and memory cost too. It would be nice if we could work out what the problem we are trying to solve is before we jump straight into solutions mode. To my mind, the naive problem "if `d[key]` fails, call a method with `key` as argument" already has a solution. If the `__missing__` dunder isn't a solution, I don't know what the problem is. -- Steven
construction calls __init__ if __new__ returns an instance
Actually type's __call__ method does that, although that doesn't help my point at all... Your point about there being no method to perform the dispatch is good. To get what I want without interpreter changes there would need to be some sort auxiliary getitem method which would perform the protocol. Or alternatively MissingMapping (or some other mixin) could define __getitem__ and require __getitem_impl__ and __missing__. As this wouldn't require any changes to anything I think it might be the best solution I have proposed.
I don't see how this is much different from what dicts already do.
I was suggesting making __missing__ part of what a subscript expression in a load context (__getitem__) means so that user defined Mappings could use it. I've gotten fairly off topic I might make a new thread for my idea. Caleb On Wed, Apr 15, 2020 at 6:35 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Mon, Apr 13, 2020 at 06:43:53PM -0700, Caleb Donovick wrote:
Why can’t you just subclass dict and override that?
Because TypeError: multiple bases have instance lay-out conflict is one of my least favorite errors.
For the benefit of the people on this list who aren't as familiar with your code as you are, could you explain how that comment is relevant?
I don't like "multiple bases have instance lay-out conflict" errors either. But I don't get them from subclassing dict.
py> class MyDict(dict): ... def __missing__(self, key): ... return (key, None) ... py> MyDict()[999] (999, None)
Works fine.
Normally, if I get that error, it's a sign that I'm probably using too much multiple inheritence and not enough composition, or even that I should step away from from the OO paradigm and reconsider whether or not I actually need a hybrid list+float object :-)
So can you please explain why how this exception is relevant, and how the proposal here will fix it?
defaultdict is subject to the same "lay-out conflict" issue, so from my naive point of view, this proposal won't help you avoid the error.
py> from collections import defaultdict py> class Weird(int, defaultdict): ... pass ... Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: multiple bases have instance lay-out conflict
Perhaps `__missing__` could be a first class part of the getitem of protocol, instead of a `dict` specific feature. So that ``` r = x[key] ``` means: ``` try: r = x.__getitem__(key) except KeyError as e: # should we also catch IndexError? try: missing = x.__missing__ except AttributeError: raise e from None r = missing(key) ```
I don't see how this is much different from what dicts already do. You can't add a `__missing__` attribute to literal dicts (or lists, tuples, etc) since they don't take attributes.
{}.__missing__ = lambda key: key
fails, so you have to subclass anyway.
Obviously this would come at some performance cost for non dict mappings so I don't know if this would fly.
Giving every single dict a `__missing__` member, for the sake of the rare instance that needs one, would come at a performance and memory cost too.
It would be nice if we could work out what the problem we are trying to solve is before we jump straight into solutions mode.
To my mind, the naive problem "if `d[key]` fails, call a method with `key` as argument" already has a solution. If the `__missing__` dunder isn't a solution, I don't know what the problem is.
-- Steven _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/DF4TMS... Code of Conduct: http://python.org/psf/codeofconduct/
On Apr 16, 2020, at 11:03, Caleb Donovick <donovick@cs.stanford.edu> wrote:
Or alternatively MissingMapping (or some other mixin) could define __getitem__ and require __getitem_impl__ and __missing__. As this wouldn't require any changes to anything I think it might be the best solution I have proposed.
I think this makes more sense as a decorator (either a class decorator that wraps __getitem__, or a method decorator that you apply to __getitem__). A mixin just complicated all the multiple inheritance issues. Does it inherit from Mapping, or does it expect you to inherit both, and to get the order right to satisfy ABCMeta? What about MutableMapping? Would it ever need to coexist with other mixins with similar issues? Also, does it try to lift your docstring and signature up to the public method, or does it just provide generic ones? All of these issues become trivial with a decorator. Somewhere I have some code for a set of class decorators that help implementing mappings and sequences: you provide basic __fooitem__ methods, and it wraps them with methods that do all the extra stuff dict, tuple, and list do. IIRC, the mutable sequence stuff is the only place where it gets complicated, but there may be others I haven’t remembered. I can dig this up if you‘re interested. Maybe it’s even worth cleaning up and posting to PyPI, or even proposing for somewhere in the stdlib (collections or functools?).
On Apr 16, 2020, at 12:26, Andrew Barnert <abarnert@yahoo.com> wrote:
Somewhere I have some code for a set of class decorators that help implementing mappings and sequences: you provide basic __fooitem__ methods, and it wraps them with methods that do all the extra stuff dict, tuple, and list do. IIRC, the mutable sequence stuff is the only place where it gets complicated, but there may be others I haven’t remembered. I can dig this up if you‘re interested. Maybe it’s even worth cleaning up and posting to PyPI, or even proposing for somewhere in the stdlib (collections or functools?).
Ok, I found it, and it’s in better shape than I thought it was. (It even passes all the relevant tests from the stdlib test suite.) So I posted it on https://github.com/collectionhelpers in case anyone wants to play with it. If there’s any demand for turning it into a PyPI package, I can do that, but otherwise I won’t bother. Anyway, if you look at the code, most of it is devoted to the mutable sequence __setitem__, but making mapping __getitem__ handle __missing__ wasn’t quite as trivial as you’d expect (it breaks the __contains__ and get methods you inherit from Mapping…). Whether that counts as an argument for or against any version of the various proposals in this thread, I’m not sure.
So, going back to the original post, dynamicdict is definitely something I've reimplemented myself multiple times essentially exactly as your pseudo-code in your C-command describes, just with the factory being required to be not None (because I explicitly didn't want it to accept the 'no-factory' case: ```python class dynamic_defaultdict(dict): def __init__(self, default_factory, *args, **kwargs): super().__init__(*args, **kwargs) self._default_factory = default_factory def __missing__(self, key): self[key] = value = self._default_factory(key) return value ``` But I actually have used this quite a bit without significant issues and don't fully understand the layout issue described. It seems to work with dict equality as well, but I'm also super new to this list so maybe I'm just naive and don't understand the nuance. Cheers, -Jeff Edwards On Thu, Apr 16, 2020 at 7:35 PM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
On Apr 16, 2020, at 12:26, Andrew Barnert <abarnert@yahoo.com> wrote:
Somewhere I have some code for a set of class decorators that help
implementing mappings and sequences: you provide basic __fooitem__ methods, and it wraps them with methods that do all the extra stuff dict, tuple, and list do. IIRC, the mutable sequence stuff is the only place where it gets complicated, but there may be others I haven’t remembered. I can dig this up if you‘re interested. Maybe it’s even worth cleaning up and posting to PyPI, or even proposing for somewhere in the stdlib (collections or functools?).
Ok, I found it, and it’s in better shape than I thought it was. (It even passes all the relevant tests from the stdlib test suite.) So I posted it on https://github.com/collectionhelpers in case anyone wants to play with it. If there’s any demand for turning it into a PyPI package, I can do that, but otherwise I won’t bother.
Anyway, if you look at the code, most of it is devoted to the mutable sequence __setitem__, but making mapping __getitem__ handle __missing__ wasn’t quite as trivial as you’d expect (it breaks the __contains__ and get methods you inherit from Mapping…). Whether that counts as an argument for or against any version of the various proposals in this thread, I’m not sure.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/PQPYJW... Code of Conduct: http://python.org/psf/codeofconduct/
I have often wanted this! I use defaultdict only occasionally, and every time I do, I start out thinking it behaves the way you describe dynamicdict. Then I read the docs again and decide I didn't actually want defaultdict to start with, and roll something different. On Fri, Apr 10, 2020, 6:50 PM Steele Farnsworth <swfarnsworth@gmail.com> wrote:
I have implemented a class in the C code of the collections module which has similar behavior to the defaultdict class. This class, dynamicdict, supplies values for keys that have not yet been added using a default factory callable, but unlike defaultdict, the missing key itself is passed to the callable. This code can be seen here: https://github.com/swfarnsworth/cpython/blob/3.8/Modules/_collectionsmodule....
While this does introduce a lot of redundant code, I'm not sure how it could be done without copying the implementation of the defaultdict class and adjusting how the default factory is called. For example, I don't believe that it's possible to support both behaviors within the defaultdict class without breaking backwards compatibility or adding another parameter to the constructor for the defaultdict class.
I don't know if a PEP is needed for this to be added to the standard library, or if it's something that anyone other than myself wants to see added, but I would be happy to further explain the concept, implementation, potential use cases, or anything else that might work towards the adoption of this feature.
Regards, Steele _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/XYSY3K... Code of Conduct: http://python.org/psf/codeofconduct/
participants (6)
-
Andrew Barnert
-
Caleb Donovick
-
David Mertz
-
Jeff Edwards
-
Steele Farnsworth
-
Steven D'Aprano