data:image/s3,"s3://crabby-images/1b26f/1b26ffc44bedb22116f7982400582a73afc6fa38" alt=""
I recently wanted to use an OrderedCounter and an OrderedDefaultDict. After a bit of googling I discovered that OrderedCounter is quite easy to implement: from collections import Counter, OrderedDict class OrderedCounter(Counter, OrderedDict): 'Counter that remembers the order elements are first seen' def __repr__(self): return '%s(%r)' % (self.__class__.__name__, OrderedDict(self)) def __reduce__(self): return self.__class__, (OrderedDict(self),) from https://rhettinger.wordpress.com/2011/05/26/super-considered-super/ Unfortunately an OrderedDefaultDict did not seem so easy. I did find http://stackoverflow.com/questions/6190331/can-i-do-an-ordered-default-dict-... which suggests: |fromcollections importOrderedDict,CallableclassDefaultOrderedDict(OrderedDict):# Source: http://stackoverflow.com/a/6190500/562769def__init__(self,default_factory=No... isnotNoneandnotisinstance(default_factory,Callable)):raiseTypeError('first argument must be callable')OrderedDict.__init__(self,*a,**kw)self.default_factory =default_factory def__getitem__(self,key):try:returnOrderedDict.__getitem__(self,key)exceptKeyError:returnself.__missing__(key)def__missing__(self,key):ifself.default_factory isNone:raiseKeyError(key)self[key]=value =self.default_factory()returnvalue def__reduce__(self):ifself.default_factory isNone:args =tuple()else:args =self.default_factory,returntype(self),args,None,None,self.items()defcopy(self):returnself.__copy__()def__copy__(self):returntype(self)(self.default_factory,self)def__deepcopy__(self,memo):importcopy returntype(self)(self.default_factory,copy.deepcopy(self.items()))def__repr__(self):return'OrderedDefaultDict(%s, %s)'%(self.default_factory,OrderedDict.__repr__(self))| This seems to me both easy to get wrong and hard to discover. It also risks getting out of sync with updates to collections. I'm wondering if this is sufficient justification to add OrderedCounter and OrderedDict to collections, either directly or as recipes. Thanks, Ian F
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Oct 16, 2015, at 03:36, Ian Foote <ian@feete.org> wrote:
I recently wanted to use an OrderedCounter and an OrderedDefaultDict. After a bit of googling I discovered that OrderedCounter is quite easy to implement:
This is already in the docs for OrderedDict, so it shouldn't have taken any googling.
Most of the trickiness here is handling None as a factory, which is only useful for subclasses that implement a custom __missing__. To make something that works like you'd expect except for that part is a lot simpler. And I'm not sure a docs recipe that's mostly code that's unnecessary for most uses, hiding the important and simple part, would be a great idea.
Since one of them is already a recipe and the other one would probably not be useful as one, I don't think that's a great idea. Adding OrderedDefaultDict to the module itself might be useful. Or, alternatively, put it in PyPI or ActiveState and add a link to it from the docs? (I'm pretty sure there are already implementations in both places, actually. Also, it would be nice to be able to point at a third-party implementation that works in 2.6+/3.2+ or whatever even if it only appears in the 3.6 docs.)
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
Sorry, sent too soon… Sent from my iPhone
… or that requires two-stage initialization, where it can initialize the base with None and then assign the factory later. Still not all that common, but it is a separate reason for the extra complexity besides a custom __missing__ that doesn't need the factory.
To make something that works like you'd expect except for that part is a lot simpler.
Off the top of my head (untested, since I'm on my phone): class DefaultOrderedDict(OrderedDict): def __init__(self, default_factory, *a, **kw): super().__init__(*a, **kw) self.default_factory = default_factory def __getitem__(self, key): try: return super().__getitem__(key) except KeyError: return self.__missing__(key) def __missing__(self, key): self[key] = value = self.default_factory() return value def __repr__(self): return '{}({}, {})'.format(type(self).__name__, self.default_factory, dict.__repr__(self)) In fact, if you aren't worried about subclassing, or about being able to directly call or pass around a bound__missing__, you can make this even simpler. I may be wrong about this being picklable/copyable without any extra help; if so, it still doesn't need the whole complicated "treat a None factory as if it doesn't exist" part. Also, I think the SO recipe works in Python 2.4+, or whichever first added OrderedDict, while this requires 3.1 (although it should be obvious how to backport it, make sure that OrderedDict isn't an old-style class…). Anyway, other than those issues, it seems pretty obvious, and I don't think there's any part of that functionality that's easy to get wrong. Handling a None factory in __missing__ and pickle/copy is easy to get wrong, but if you aren't going to subclass this and therefore aren't going to try to build that part, you're not going to build it wrong.
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
Actually, forget all that; it's even simpler. At least in recent 3.x, the only thing wrong with inheriting from both types, assuming you put OrderedDict first, is the __init__ signature. So: class OrderedDefaultDict(OrderedDict, defaultdict): def __init__(self, default_factory=None, *a, **kw): OrderedDict.__init__(self, *a, **kw) self.default_factory = default_factory More importantly, because __missing__ support is built into dict, despite the confusing docs for defaultdict, you don't really need defaultdict at all here: class OrderedDefaultDict(OrderedDict): def __init__(self, default_factory=None, *a, **kw): OrderedDict.__init__(self, *a, **kw) self.default_factory = default_factory def __missing__(self, key): self[key] = value = default_factory() return value And either of these should work with 2.5+ (according to https://docs.python.org/2/library/stdtypes.html#dict that's when dict.__missing__ was added). Sent from my iPhone
data:image/s3,"s3://crabby-images/291c0/291c0867ef7713a6edb609517b347604a575bf5e" alt=""
On 17.10.2015 04:08, Andrew Barnert via Python-ideas wrote:
Not saying that inheritance is a bad thing but to me It seems to me that ordering and default values should be orthogonal aspects of the standard dict. Just as Sandi described it here: https://www.youtube.com/watch?v=29MAL8pJImQ Best, Sven
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Wed, Oct 21, 2015 at 5:14 AM, Sven R. Kunze <srkunze@mail.de> wrote:
Yeah... that video is absolutely correct... for Ruby. In Python, you can use multiple inheritance to do that in the exactly obvious way. class EchoRandomHouse(EchoHouse, RandomHouse): pass When I got to the bit in that video where she says that inheritance paints you into a corner, I went and reimplemented her example code in Python, and it's flawless. It doesn't even matter which order you put the two superclasses, as there's no conflict. ChrisA
data:image/s3,"s3://crabby-images/9a238/9a238b21f3d2d309d792173fd87dbe82d234e23d" alt=""
On 10/20/2015 03:40 PM, Chris Angelico wrote:
I think you missed her real point, which applies both to Python and Ruby. In her presentation, it's obscured a bit by too much discussion of "one side or the other" code duplication (which can be "solved" with multiple inheritance). It takes her a couple minutes more to get to the real point, which starts at the slide "inheritance is for specialization, not for sharing code." One symptom of the problem is that using multiple inheritance this way doesn't scale: the number of leaf subclasses you need grows geometrically with the number of behavior knobs. Composition with strategy objects doesn't have this issue. Carl
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Wed, Oct 21, 2015 at 8:56 AM, Carl Meyer <carl@oddbird.net> wrote:
Sure, but a dictionary with a default handler _is_ a form of specialization - as is a dictionary that preserves order. Both of them behave absolutely identically to a regular dict when you set something, retrieve it, iterate over them, etc, etc, etc. She recommends a massive superclass that's capable of any form of injection; using inheritance allows the dict type to be broadly unaware of the modified dictionaries that exist. Suppose I write a SortedDict that behaves exactly the way any other dictionary does, but when you iterate over its keys, you get them in order. I should be able to do this without tampering with the original dict type. In the composition model, I would need to appeal for an "IterationOrder" feature to be added to the base dict; using inheritance, it's as simple as: class SortedDict(dict): def __iter__(self): yield from sorted(self.keys()) To me, this is the biggest benefit of inheritance: you do NOT have to predict what someone might want to do. I can subclass someone else's object and change how it works.
Sure it does... but nobody needs to know about _all_ the leaf subclasses. How many subclasses of object are there in Python?
len(object.__subclasses__()) 122
And that in a bare interactive Py3, without importing anything fancy. ('import decimal' raises that to 159, for instance.) Composition has its place, don't get me wrong. But inheritance isn't the attractive nuisance she makes it out to be. It works just fine. ChrisA
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Wed, Oct 21, 2015 at 09:50:21AM +1100, Chris Angelico wrote:
I think this is wrong. I think that in general you need to have a fairly good idea of the internal workings of the class before you can inherit from it. Otherwise, how do you know what methods need to be overwritten? Take dict. One nuisance with inheriting from dict is that it isn't sufficient to override __setitem__, you also have to override update and clear as well. And possibly others -- the exact set of which methods depend on which other methods are not documented in the dict API. Given an arbitrary class, how can you possibly tell which methods you need to override, or even which methods are *safe* to override? [...]
Composition has its place, don't get me wrong. But inheritance isn't the attractive nuisance she makes it out to be. It works just fine.
It doesn't work "just fine", it has risks and annoyances, and multiple inheritance even more so. Classes have problems. See Jack Diederich's talk "Stop Writing Classes", or at least stop writing *stupid* classes: http://eev.ee/blog/2013/03/03/the-controller-pattern-is-awful-and-other-oo-h... As far as multiple inheritance, there are real issues with MI that aren't solved by coming up with a nifty solution to the diamond problem. See Michele Simionato's series of essays on super, multiple inheritance, mixins and traits: http://www.artima.com/weblogs/viewpost.jsp?thread=246488 Clearly classes are useful, but they aren't an unalloyed good thing, and inheriting from them has its issues. -- Steve
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Tue, Oct 20, 2015 at 6:50 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Right. Chris's thinking recalls the reason why inheritance at some became popular (I guess it was in the '90s). Steven explains (in the part that I've cut) why many experts have soured on it quite a bit. Personally, I happen to think that inheritance is often useful when a number of classes are designed together (typically all at once and belonging to the same library) and also occasionally when a base class is explicitly and carefully designed to be inherited (there are beautiful things you can do with the template pattern, for example). But inheriting from an implementation that wasn't designed with your use case in mind is often asking for trouble -- if not now, then in the future when a refactored implementation is released. You might interject, that's the fault of the implementation refactoring -- they didn't properly think about interface compatibility. But while it's usually easy enough to keep an interface compatible where it's just the user calling methods on the implementation, the "interface" presented by subclassing is much more complex -- you would have to specify exactly which method's implementation calls which other method, and you'd also have to ensure that the object is in a sane state when it calls that other method, because it *could* be the case that the latter is overridden by a subclass. It's terribly fragile, and better avoided. -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/291c0/291c0867ef7713a6edb609517b347604a575bf5e" alt=""
On 21.10.2015 05:36, Guido van Rossum wrote:
Yep. It's because experience shows that the usefulness is limited when it comes to orthogonal aspects. Furthermore, I can remember changing an algorithm's design **because** of the lack of such built-in datastructure. That's always a bad sign.
That's a pretty good observation.
Maybe, that's one reason why people hesitate to write their own OrderDefaultDict or DefaultOrderedDict. It's just not their responsibility. Best, Sven
data:image/s3,"s3://crabby-images/83003/83003405cb3e437d91969f4da1e4d11958d94f27" alt=""
On 2015-10-20 18:50, Steven D'Aprano wrote:
I've always considered that (and related problems) to be one of Python's warts. And, as you say, it's a documentation flaw. It's true that it's easy to write classes with poorly documented APIs, which makes them hard to extend because you don't know how they work. But I don't think that means subclassing is not a good idea. It means writing classes with clearly-specified APIs is a good idea. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
data:image/s3,"s3://crabby-images/980d1/980d1e4a110b86a06fe535e4d8377768d2e2398b" alt=""
Chris Angelico <rosuav@gmail.com> writes:
Even considering that OrderedDict (or a _proper_ SortedDict - other languages' equivalent class doesn't require hashable keys because it stores items as a sorted list) requires a complete overhaul of how the dictionary stores items? Really, from an implementation standpoint, it seems like a bad idea to even inherit in this case.
data:image/s3,"s3://crabby-images/291c0/291c0867ef7713a6edb609517b347604a575bf5e" alt=""
On 21.10.2015 00:50, Chris Angelico wrote:
She recommends a massive superclass that's capable of any form of injection
Nope. She does not. The "superclass" is not "massive" at all. It is even slimmer as orthogonal aspects are refactored out into separate entities. In fact, it makes it even easier to test and maintain these separate aspects (the core dev should be interested in that). Furthermore, it's, of course, up to debate which aspects should be injectable and which are not. Just imagine, I would need to use several orderings and/or a default value for missing keys: normal_dict = dict() ordered_dict = dict(order=dict.order_by_insert) sorted_dict = dict(order=sorted) sorted_default_dict = dict(order=sorted, default=int) How many subclasses am I supposed to write, maintain and upgrade (in case Guido rewrites his precious dict implementation)? I would even allege that for sufficiently large teams and projects, there are *multiple* implementations with the same intent. Please, no. Best, Sven PS: the instances above are real-world examples, I remember requiring during the course of the last year.
data:image/s3,"s3://crabby-images/1cc95/1cc950bfec4765637944fceea914eb174b6fd0d4" alt=""
Just want to say that I'm happy to see that lately the disadvantages of inheritance (which are already known for a very long time) are getting more attention. It's not bad by definition, but there's so much truth in Sandy her talk and I think for many Python projects, we went way too far into "abusing" inheritance. Actually, it's a bit unfortunate that we made inheritance so user friendly and powerful in Python that for many people it became the logical way to extend or reuse some code. Jonathan 2015-10-21 19:41 GMT+02:00 Sven R. Kunze <srkunze@mail.de>:
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Oct 21, 2015, at 11:25, Jonathan Slenders <jonathan@slenders.be> wrote:
One of the most important things people have learned about OO over the past two decades is that subtyping, implementation extension, and implementation mixins, and interface-extending mixins (think collections.abc.Sequence adding count for you) are all different things. Compared to the other languages in existence at the time of Python 1.x or even the 2.2/2.3 changeover, it's hard to fault Python. The fact that it can easily be used to write bad code is a little unfortunate, but the fact that it can also easily be used to write good code, when other languages either don't allow some things to be expressed, force them to be expressed in clumsy or limited ways, or force you to misuse inappropriate features instead more than makes up for it. In particular, any language that has fixed structure layouts and vtables makes is much more unfriendly to both kinds of mixins, and makes interface subtyping clumsy; Python has no such problems. Yeah, maybe we could design something better in 2015, but based on the knowledge people had at the time?
data:image/s3,"s3://crabby-images/291c0/291c0867ef7713a6edb609517b347604a575bf5e" alt=""
On 21.10.2015 21:27, Andrew Barnert wrote:
True. Just because everything is possible does not necessarily mean one should do it. @Jonathan Inheritance has its disadvantages. The bigger the class hierarchy, the uglier it becomes. (We have this with the django view class system.) I see the same coming for the dict discussion if people insist on having separate leaf classes for all possible orthogonal aspects. That does not scale and it's hard to test ( --> @core-devs ). Something that's become clearer and clearer to me is that *nobody could have known this in advance*. It seemed like a good idea to start with several different base classes for each aspect (not knowing that they could be combine later). But now, people start to realize they might need these different aspects in a single implementation. So, it's the time to think about how to solve it in a scalable and maintainable way. Multiple inheritance might be a solution, composition is another. This time I think composition is the better one. Best, Sven PS: Funny coincidence, Sandi uses the aspect "order" for the house. The same as a dict could have an aspect "order".
data:image/s3,"s3://crabby-images/e2594/e259423d3f20857071589262f2cb6e7688fbc5bf" alt=""
On 10/21/2015 1:41 PM, Sven R. Kunze wrote:
The dict class itself is, in a sense, a poor example for this discussion. It is a critical part of Python's infrastructure, involved in a large fraction of executed statements. It therefore needs to be as fast as possible. For CPython, this means a heavily optimized C implementation that disallows injection and that takes shortcuts like bypassing method lookups. This makes the usefulness of subclassing limited. Of course, before 2.2, dict (and other built-in types) could not even be subclassed. UserDict is the 'user base dict class', meant to be subclassed by users. A solution to UserDict being too slow could be a C accelerator that did not bypass method lookups. A revised UserDict could be designed for injection in the way being discussed. -- Terry Jan Reedy
data:image/s3,"s3://crabby-images/291c0/291c0867ef7713a6edb609517b347604a575bf5e" alt=""
On 22.10.2015 03:11, Terry Reedy wrote:
The discussion *is* about dict: normal_dict = dict() ordered_dict = dict(order=dict.order_by_insert) sorted_dict = dict(order=sorted) sorted_default_dict = dict(order=sorted, default=int) Why couldn't dict() or{} redirect to the super-fast built-in C-implementation whereas dict(order=sorted, default=int) redirects to some more feature rich one? As you see, I for one don't see a contradiction between performance and features. Best, Sven
data:image/s3,"s3://crabby-images/2658f/2658f17e607cac9bc627d74487bef4b14b9bfee8" alt=""
Carl Meyer wrote:
I'm not sure there's any content in that statement. In a duck-typed language, sharing code is the *only* reason to use inheritance. Also, what is "specialisation" anyway? Any time you have two objects with the same interface, and mostly-similar but slightly different behaviours, you could regard one as being a specialisation of the other. Whether they share any code or not is an implementation detail. I think the real issue is that if you specialise something by inheriting and overriding random methods, you make your class fragile with respect to changes in the base class. If the author of the base class changes something undocumented about the way its methods interact with each other, your class may break. With pluggable behaviours, there is a well-defined interface between the main class and the classes representing the behaviours, so that sort of thing is much less likely to happen. You could get the same result with inheritance by designating some methods as "designed to be overridden", and documenting how they interact with the rest of the class. Effectively you are then defining an interface *within* the class and between the class and its subclasses. This is really just another implementation of pluggable behaviours where the plugging mechanism consists of overriding one or more methods. So to summarise, I would say that the issue here isn't really composition vs. inheritance, but structured vs. ad-hoc behaviour modification. I wouldn't say that one is always better than the other. Sometimes you need to do things that the author of a class didn't anticipate, and then the ability to override things in an ad-hoc manner is very useful. But I think the speaker is right to point out that doing *everything* that way can lead to a lot of trouble. -- Greg
data:image/s3,"s3://crabby-images/2dd36/2dd36bc2d30d53161737124e2d8ace2b4b4ce052" alt=""
On Fri, Oct 16, 2015 at 9:08 PM, Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
Thanks! - [ ] Could/should maybe either of these make it into the standard library, that would save a fair amount of copying. .. Great in combination w/ dict views: https://docs.python.org/2/library/stdtypes.html#dictionary-view-objects
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Nov 13, 2015, at 20:46, Wes Turner <wes.turner@gmail.com> wrote:
You could say the same about everything in the recipes, every one-liner like "def identity(x): return x", and so on. But most such things aren't used that often, and there's a cost to putting them in the stdlib—more for people to learn and remember, more code to be maintained, bigger downloads, etc. So just saying "maybe it would save some copying" isn't an argument for adding it to the stdlib. Adding OrderedDefaultDict as a docs recipe (as OrderedCounter already is) might be worth doing. Reorganizing the docs a bit to make it more obvious (by better highlighting __missing__, and making it clear that it's a method of dict rather than something special about defaultdict) seems even more likely to be worth it. If someone has an idea of how it should read, just file a docs bug and submit a patch and see if whoever's in charge of that area says it needs to come back here.
.. Great in combination w/ dict views: https://docs.python.org/2/library/stdtypes.html#dictionary-view-objects
Of course, and that works out of the box.
data:image/s3,"s3://crabby-images/2dd36/2dd36bc2d30d53161737124e2d8ace2b4b4ce052" alt=""
This seems to keep a consistent __init__ signature with OrderedDict (by .pop()-ing 'default_factory' from kwargs instead of specifying as a positionalkwarg): class OrderedDefaultDict(OrderedDict): def __init__(self, *a, **kw): default_factory = kw.pop('default_factory', self.__class__) OrderedDict.__init__(self, *a, **kw) self.default_factory = default_factory def __missing__(self, key): self[key] = value = self.default_factory() return value I've added a few tests (as well as to_json, and _repr_json_ https://gist.github.com/westurner/be22dba8110be099a35e/c1a3a7394e401d4742df0... (Without this fix, json.loads(output_json, object_pairs_hook=OrderedDefaultDict) doesn't seem to work). On Fri, Nov 13, 2015 at 11:17 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
participants (14)
-
Andrew Barnert
-
Brendan Barnwell
-
Carl Meyer
-
Chris Angelico
-
Chris Barker - NOAA Federal
-
Greg Ewing
-
Guido van Rossum
-
Ian Foote
-
Jonathan Slenders
-
Random832
-
Steven D'Aprano
-
Sven R. Kunze
-
Terry Reedy
-
Wes Turner