[Python-ideas] Proposal for extending the collections module - bags / multisets, ordered sets / unique lists
Michael Lenzen
m.lenzen at gmail.com
Sun Jul 19 22:01:47 CEST 2009
On 07/18/2009 08:45 PM, Raymond Hettinger wrote:
First of all, thanks for the input. I was hoping you would read and
comment on this.
>
> Starting with 2.7 and 3.1, we do have a Counter object that has some
> multiset capabilities -- I want to see how that fares in the real world
> before entertaining a proposal for another multiset/bag variant.
>
I'm not really a fan of the Counter class, I think that it has some
fundamental flaws. The biggest problem that I see is that it doesn't
behave like a collection with multiple elements, when you iterate over
it you only get the unique elements and len(Counter) returns the number
of unique elements instead of the number of items in the multiset. Then
there's the fact that non-positive counts are allowed:
>>> c = Counter()
>>> c['a'] += 1
>>> c['a'] -= 1
>>> 'a' in c
True
This isn't a collection, it's something else - a counter. I'm not
saying it's not useful, but it's not a multiset. Why I think it should
be removed is that `defaultdict(int)` now provides exactly the same
functionality.
> With respect to ordered sets, I started by posting an efficient recipe
> on ASPN so that we could see how it gets used. Though I'm a fan of the
> idea in general, I've found few use cases and have noted nearly zero
> uptake on the recipe (so the OrderedSet idea may just be YAGNI). The
> wording of your proposal indicates that it is motivated by a desire to
> complete the grid of possible variants (frozen/unfrozen,
> itemized/multivalued, ordered/unordered, etc), but it would have been
> better to focus on use cases (existing needs that aren't being met by
> the current toolsets).
>
> After posting the pure Python recipe for ordered sets, my plan had been
> to wait to see how it was received. If there was an active interest
> (none so far), I've already worked out the steps needed to extend
> existing C code to make the built-in sets remember their insertion order
> (no need for a separate type, just make the regular sets remember their
> order) -- this can be done with very little impact on performance.
Your implementation of ordered sets also doesn't work for me. I want to
be able to access elements by index, take slices of the list, just be
able to use it as a list in general. I think it might be a good
addition to the current set class, it's basically an improved set.
>
> There are a couple of problems here. Unifying many collections under a
> single API has the unfortunate side-effect of freezing development on
> those collections (it becomes hard to change one without changing them
> all). Another issue is that the API doesn't seem comfortable to me --
> when I want a set or list or dict, I just say so -- I don't want to go
> through a menu of all of the possible variants. Also, it is not like
> those variants are readily substitutable for one another -- if your code
> relies on hashability, then you don't really have the option of
> switching the mutability flag. To me, this factory function is like a
> comp sci exercise in hyper-generalization -- it doesn't make any
> real-world code any cleaner.
I'm not as gung ho about the ABCs as I was before. But, the way I see
it is that the collection factory/ABC would simplify some situations,
especially for new programmers. They wouldn't have to learn the
difference between bags, sets, lists and setlists right away. They
would just have an easy way to create a collection that provided a basic
API, maybe just add, remove, contains and count methods. All the
classes are still available by name to those that want them, and that is
how I would probably use them.
> Sorry for the sour notes. You're definitely showing insight into the
> relationship between various collection classes but I don't think the
> ideas represent good, use-case driven language development. The
> proposals are all too theoretically based; instead, the evolution of the
> collections module needs to follow a slower path, one that lets the
> existing tools mature. And tools like OrderedSet need to have some early
> life as a recipe to see how people use it and to learn from their
> experiences.
>
I agree that I need use-cases and will go through my old code and post
some in the near future. I also think that use cases will follow if the
option is provided, that it is the responsibility of the language
developers to push innovation and a certain paradigm.
I am firmly convinced that there need to be bag and setlist classes
included at least in the collections module, if not the standard
library. Python is supposed to be *batteries included* and I see these
as glaring omissions. If my arguments were too theoretical it's because
I'm a mathematician at heart and am more apt to be convinced by a
theoretical argument as opposed to a utilitarian one.
-Michael Lenzen
More information about the Python-ideas
mailing list