On 07/18/2009 08:45 PM, Raymond Hettinger wrote: First of all, thanks for the input. I was hoping you would read and comment on this.
Starting with 2.7 and 3.1, we do have a Counter object that has some multiset capabilities -- I want to see how that fares in the real world before entertaining a proposal for another multiset/bag variant.
I'm not really a fan of the Counter class, I think that it has some fundamental flaws. The biggest problem that I see is that it doesn't behave like a collection with multiple elements, when you iterate over it you only get the unique elements and len(Counter) returns the number of unique elements instead of the number of items in the multiset. Then there's the fact that non-positive counts are allowed:
c = Counter() c['a'] += 1 c['a'] -= 1 'a' in c True
This isn't a collection, it's something else - a counter. I'm not saying it's not useful, but it's not a multiset. Why I think it should be removed is that `defaultdict(int)` now provides exactly the same functionality.
With respect to ordered sets, I started by posting an efficient recipe on ASPN so that we could see how it gets used. Though I'm a fan of the idea in general, I've found few use cases and have noted nearly zero uptake on the recipe (so the OrderedSet idea may just be YAGNI). The wording of your proposal indicates that it is motivated by a desire to complete the grid of possible variants (frozen/unfrozen, itemized/multivalued, ordered/unordered, etc), but it would have been better to focus on use cases (existing needs that aren't being met by the current toolsets).
After posting the pure Python recipe for ordered sets, my plan had been to wait to see how it was received. If there was an active interest (none so far), I've already worked out the steps needed to extend existing C code to make the built-in sets remember their insertion order (no need for a separate type, just make the regular sets remember their order) -- this can be done with very little impact on performance.
Your implementation of ordered sets also doesn't work for me. I want to be able to access elements by index, take slices of the list, just be able to use it as a list in general. I think it might be a good addition to the current set class, it's basically an improved set.
There are a couple of problems here. Unifying many collections under a single API has the unfortunate side-effect of freezing development on those collections (it becomes hard to change one without changing them all). Another issue is that the API doesn't seem comfortable to me -- when I want a set or list or dict, I just say so -- I don't want to go through a menu of all of the possible variants. Also, it is not like those variants are readily substitutable for one another -- if your code relies on hashability, then you don't really have the option of switching the mutability flag. To me, this factory function is like a comp sci exercise in hyper-generalization -- it doesn't make any real-world code any cleaner.
I'm not as gung ho about the ABCs as I was before. But, the way I see it is that the collection factory/ABC would simplify some situations, especially for new programmers. They wouldn't have to learn the difference between bags, sets, lists and setlists right away. They would just have an easy way to create a collection that provided a basic API, maybe just add, remove, contains and count methods. All the classes are still available by name to those that want them, and that is how I would probably use them.
Sorry for the sour notes. You're definitely showing insight into the relationship between various collection classes but I don't think the ideas represent good, use-case driven language development. The proposals are all too theoretically based; instead, the evolution of the collections module needs to follow a slower path, one that lets the existing tools mature. And tools like OrderedSet need to have some early life as a recipe to see how people use it and to learn from their experiences.
I agree that I need use-cases and will go through my old code and post some in the near future. I also think that use cases will follow if the option is provided, that it is the responsibility of the language developers to push innovation and a certain paradigm. I am firmly convinced that there need to be bag and setlist classes included at least in the collections module, if not the standard library. Python is supposed to be *batteries included* and I see these as glaring omissions. If my arguments were too theoretical it's because I'm a mathematician at heart and am more apt to be convinced by a theoretical argument as opposed to a utilitarian one. -Michael Lenzen