Sorry, sent too soon…

Sent from my iPhone

On Oct 16, 2015, at 18:39, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:

On Oct 16, 2015, at 03:36, Ian Foote <ian@feete.org> wrote:

I recently wanted to use an OrderedCounter and an OrderedDefaultDict. After a bit of googling I discovered that OrderedCounter is quite easy to implement:

This is already in the docs for OrderedDict, so it shouldn't have taken any googling.
from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
     'Counter that remembers the order elements are first seen'
     def __repr__(self):
         return '%s(%r)' % (self.__class__.__name__,
                            OrderedDict(self))
     def __reduce__(self):
         return self.__class__, (OrderedDict(self),)

from https://rhettinger.wordpress.com/2011/05/26/super-considered-super/

Unfortunately an OrderedDefaultDict did not seem so easy. I did find http://stackoverflow.com/questions/6190331/can-i-do-an-ordered-default-dict-in-python which suggests:

Most of the trickiness here is handling None as a factory, which is only useful for subclasses that implement a custom __missing__

… or that requires two-stage initialization, where it can initialize the base with None and then assign the factory later. Still not all that common, but it is a separate reason for the extra complexity besides a custom __missing__ that doesn't need the factory.

To make something that works like you'd expect except for that part is a lot simpler.

Off the top of my head (untested, since I'm on my phone):

class DefaultOrderedDict(OrderedDict):
    def __init__(self, default_factory, *a, **kw):
        super().__init__(*a, **kw)
        self.default_factory = default_factory
    def __getitem__(self, key):
        try:
            return super().__getitem__(key)
        except KeyError:
            return self.__missing__(key)
    def __missing__(self, key):
        self[key] = value = self.default_factory()
        return value
    def __repr__(self):
        return '{}({}, {})'.format(type(self).__name__, self.default_factory, dict.__repr__(self))

In fact, if you aren't worried about subclassing, or about being able to directly call or pass around a bound__missing__, you can make this even simpler.

I may be wrong about this being picklable/copyable without any extra help; if so, it still doesn't need the whole complicated "treat a None factory as if it doesn't exist" part.

Also, I think the SO recipe works in Python 2.4+, or whichever first added OrderedDict, while this requires 3.1 (although it should be obvious how to backport it, make sure that OrderedDict isn't an old-style class…).

Anyway, other than those issues, it seems pretty obvious, and I don't think there's any part of that functionality that's easy to get wrong. Handling a None factory in __missing__ and pickle/copy is easy to get wrong, but if you aren't going to subclass this and therefore aren't going to try to build that part, you're not going to build it wrong.

And I'm not sure a docs recipe that's mostly code that's unnecessary for most uses, hiding the important and simple part, would be a great idea.
from collections import OrderedDict, Callable

class DefaultOrderedDict(OrderedDict):
    # Source: http://stackoverflow.com/a/6190500/562769
    def __init__(self, default_factory=None, *a, **kw):
        if (default_factory is not None and
           not isinstance(default_factory, Callable)):
            raise TypeError('first argument must be callable')
        OrderedDict.__init__(self, *a, **kw)
        self.default_factory = default_factory

    def __getitem__(self, key):
        try:
            return OrderedDict.__getitem__(self, key)
        except KeyError:
            return self.__missing__(key)

    def __missing__(self, key):
        if self.default_factory is None:
            raise KeyError(key)
        self[key] = value = self.default_factory()
        return value

    def __reduce__(self):
        if self.default_factory is None:
            args = tuple()
        else:
            args = self.default_factory,
        return type(self), args, None, None, self.items()

    def copy(self):
        return self.__copy__()

    def __copy__(self):
        return type(self)(self.default_factory, self)

    def __deepcopy__(self, memo):
        import copy
        return type(self)(self.default_factory,
                          copy.deepcopy(self.items()))

    def __repr__(self):
        return 'OrderedDefaultDict(%s, %s)' % (self.default_factory,
                                               OrderedDict.__repr__(self))

This seems to me both easy to get wrong and hard to discover. It also risks getting out of sync with updates to collections.
I'm wondering if this is sufficient justification to add OrderedCounter and OrderedDict to collections, either directly or as recipes.

Since one of them is already a recipe and the other one would probably not be useful as one, I don't think that's a great idea.

Adding OrderedDefaultDict to the module itself might be useful. Or, alternatively, put it in PyPI or ActiveState and add a link to it from the docs? (I'm pretty sure there are already implementations in both places, actually. Also, it would be nice to be able to point at a third-party implementation that works in 2.6+/3.2+ or whatever even if it only appears in the 3.6 docs.)