[Python-ideas] OrderedCounter and OrderedDefaultDict

Andrew Barnert abarnert at yahoo.com
Sat Oct 17 04:08:15 CEST 2015


Actually, forget all that; it's even simpler.

At least in recent 3.x, the only thing wrong with inheriting from both types, assuming you put OrderedDict first, is the __init__ signature. So:

    class OrderedDefaultDict(OrderedDict, defaultdict):
        def __init__(self, default_factory=None, *a, **kw):
            OrderedDict.__init__(self, *a, **kw)
            self.default_factory = default_factory

More importantly, because __missing__ support is built into dict, despite the confusing docs for defaultdict, you don't really need defaultdict at all here:

    class OrderedDefaultDict(OrderedDict):
        def __init__(self, default_factory=None, *a, **kw):
            OrderedDict.__init__(self, *a, **kw)
            self.default_factory = default_factory
        def __missing__(self, key):
            self[key] = value = default_factory()
            return value

And either of these should work with 2.5+ (according to https://docs.python.org/2/library/stdtypes.html#dict that's when dict.__missing__ was added).

Sent from my iPhone

> On Oct 16, 2015, at 18:57, Andrew Barnert <abarnert at yahoo.com> wrote:
> 
> Sorry, sent too soon…
> 
> Sent from my iPhone
> 
>> On Oct 16, 2015, at 18:39, Andrew Barnert via Python-ideas <python-ideas at python.org> wrote:
>> 
>>> On Oct 16, 2015, at 03:36, Ian Foote <ian at feete.org> wrote:
>>> 
>>> I recently wanted to use an OrderedCounter and an OrderedDefaultDict. After a bit of googling I discovered that OrderedCounter is quite easy to implement:
>> 
>> This is already in the docs for OrderedDict, so it shouldn't have taken any googling.
>>> from collections import Counter, OrderedDict
>>> 
>>> class OrderedCounter(Counter, OrderedDict):
>>>      'Counter that remembers the order elements are first seen'
>>>      def __repr__(self):
>>>          return '%s(%r)' % (self.__class__.__name__,
>>>                             OrderedDict(self))
>>>      def __reduce__(self):
>>>          return self.__class__, (OrderedDict(self),)
>>> 
>>> from https://rhettinger.wordpress.com/2011/05/26/super-considered-super/
>>> 
>>> Unfortunately an OrderedDefaultDict did not seem so easy. I did find http://stackoverflow.com/questions/6190331/can-i-do-an-ordered-default-dict-in-python which suggests:
>> 
>> Most of the trickiness here is handling None as a factory, which is only useful for subclasses that implement a custom __missing__
> 
> … or that requires two-stage initialization, where it can initialize the base with None and then assign the factory later. Still not all that common, but it is a separate reason for the extra complexity besides a custom __missing__ that doesn't need the factory.
> 
>> To make something that works like you'd expect except for that part is a lot simpler.
> 
> Off the top of my head (untested, since I'm on my phone):
> 
> class DefaultOrderedDict(OrderedDict):
>     def __init__(self, default_factory, *a, **kw):
>         super().__init__(*a, **kw)
>         self.default_factory = default_factory
>     def __getitem__(self, key):
>         try:
>             return super().__getitem__(key)
>         except KeyError:
>             return self.__missing__(key)
>     def __missing__(self, key):
>         self[key] = value = self.default_factory()
>         return value
>     def __repr__(self):
>         return '{}({}, {})'.format(type(self).__name__, self.default_factory, dict.__repr__(self))
> 
> In fact, if you aren't worried about subclassing, or about being able to directly call or pass around a bound__missing__, you can make this even simpler.
> 
> I may be wrong about this being picklable/copyable without any extra help; if so, it still doesn't need the whole complicated "treat a None factory as if it doesn't exist" part.
> 
> Also, I think the SO recipe works in Python 2.4+, or whichever first added OrderedDict, while this requires 3.1 (although it should be obvious how to backport it, make sure that OrderedDict isn't an old-style class…).
> 
> Anyway, other than those issues, it seems pretty obvious, and I don't think there's any part of that functionality that's easy to get wrong. Handling a None factory in __missing__ and pickle/copy is easy to get wrong, but if you aren't going to subclass this and therefore aren't going to try to build that part, you're not going to build it wrong.
> 
>> And I'm not sure a docs recipe that's mostly code that's unnecessary for most uses, hiding the important and simple part, would be a great idea.
>>> from collections import OrderedDict, Callable
>>> 
>>> class DefaultOrderedDict(OrderedDict):
>>>     # Source: http://stackoverflow.com/a/6190500/562769
>>>     def __init__(self, default_factory=None, *a, **kw):
>>>         if (default_factory is not None and
>>>            not isinstance(default_factory, Callable)):
>>>             raise TypeError('first argument must be callable')
>>>         OrderedDict.__init__(self, *a, **kw)
>>>         self.default_factory = default_factory
>>> 
>>>     def __getitem__(self, key):
>>>         try:
>>>             return OrderedDict.__getitem__(self, key)
>>>         except KeyError:
>>>             return self.__missing__(key)
>>> 
>>>     def __missing__(self, key):
>>>         if self.default_factory is None:
>>>             raise KeyError(key)
>>>         self[key] = value = self.default_factory()
>>>         return value
>>> 
>>>     def __reduce__(self):
>>>         if self.default_factory is None:
>>>             args = tuple()
>>>         else:
>>>             args = self.default_factory,
>>>         return type(self), args, None, None, self.items()
>>> 
>>>     def copy(self):
>>>         return self.__copy__()
>>> 
>>>     def __copy__(self):
>>>         return type(self)(self.default_factory, self)
>>> 
>>>     def __deepcopy__(self, memo):
>>>         import copy
>>>         return type(self)(self.default_factory,
>>>                           copy.deepcopy(self.items()))
>>> 
>>>     def __repr__(self):
>>>         return 'OrderedDefaultDict(%s, %s)' % (self.default_factory,
>>>                                                OrderedDict.__repr__(self))
>>> 
>>> This seems to me both easy to get wrong and hard to discover. It also risks getting out of sync with updates to collections.
>>> I'm wondering if this is sufficient justification to add OrderedCounter and OrderedDict to collections, either directly or as recipes.
>> 
>> Since one of them is already a recipe and the other one would probably not be useful as one, I don't think that's a great idea.
>> 
>> Adding OrderedDefaultDict to the module itself might be useful. Or, alternatively, put it in PyPI or ActiveState and add a link to it from the docs? (I'm pretty sure there are already implementations in both places, actually. Also, it would be nice to be able to point at a third-party implementation that works in 2.6+/3.2+ or whatever even if it only appears in the 3.6 docs.)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20151016/b1602a47/attachment-0001.html>


More information about the Python-ideas mailing list