[Python-ideas] Proposal for extending the collections module - bags / multisets, ordered sets / unique lists

Michael Lenzen m.lenzen at gmail.com
Sun Jul 19 22:01:47 CEST 2009


On 07/18/2009 08:45 PM, Raymond Hettinger wrote:

First of all, thanks for the input.  I was hoping you would read and 
comment on this.

>
> Starting with 2.7 and 3.1, we do have a Counter object that has some
> multiset capabilities -- I want to see how that fares in the real world
> before entertaining a proposal for another multiset/bag variant.
>

I'm not really a fan of the Counter class, I think that it has some 
fundamental flaws.  The biggest problem that I see is that it doesn't 
behave like a collection with multiple elements, when you iterate over 
it you only get the unique elements and len(Counter) returns the number 
of unique elements instead of the number of items in the multiset.  Then 
there's the fact that non-positive counts are allowed:

 >>> c = Counter()
 >>> c['a'] += 1
 >>> c['a'] -= 1
 >>> 'a' in c
True

This isn't a collection, it's something else - a counter.  I'm not 
saying it's not useful, but it's not a multiset.  Why I think it should 
be removed is that `defaultdict(int)` now provides exactly the same 
functionality.

> With respect to ordered sets, I started by posting an efficient recipe
> on ASPN so that we could see how it gets used. Though I'm a fan of the
> idea in general, I've found few use cases and have noted nearly zero
> uptake on the recipe (so the OrderedSet idea may just be YAGNI). The
> wording of your proposal indicates that it is motivated by a desire to
> complete the grid of possible variants (frozen/unfrozen,
> itemized/multivalued, ordered/unordered, etc), but it would have been
> better to focus on use cases (existing needs that aren't being met by
> the current toolsets).
>
> After posting the pure Python recipe for ordered sets, my plan had been
> to wait to see how it was received. If there was an active interest
> (none so far), I've already worked out the steps needed to extend
> existing C code to make the built-in sets remember their insertion order
> (no need for a separate type, just make the regular sets remember their
> order) -- this can be done with very little impact on performance.

Your implementation of ordered sets also doesn't work for me.  I want to 
be able to access elements by index, take slices of the list, just be 
able to use it as a list in general.  I think it might be a good 
addition to the current set class, it's basically an improved set.

>
> There are a couple of problems here. Unifying many collections under a
> single API has the unfortunate side-effect of freezing development on
> those collections (it becomes hard to change one without changing them
> all). Another issue is that the API doesn't seem comfortable to me --
> when I want a set or list or dict, I just say so -- I don't want to go
> through a menu of all of the possible variants. Also, it is not like
> those variants are readily substitutable for one another -- if your code
> relies on hashability, then you don't really have the option of
> switching the mutability flag. To me, this factory function is like a
> comp sci exercise in hyper-generalization -- it doesn't make any
> real-world code any cleaner.

I'm not as gung ho about the ABCs as I was before.  But, the way I see 
it is that the collection factory/ABC would simplify some situations, 
especially for new programmers.  They wouldn't have to learn the 
difference between bags, sets, lists and setlists right away.  They 
would just have an easy way to create a collection that provided a basic 
API, maybe just add, remove, contains and count methods.  All the 
classes are still available by name to those that want them, and that is 
how I would probably use them.

> Sorry for the sour notes. You're definitely showing insight into the
> relationship between various collection classes but I don't think the
> ideas represent good, use-case driven language development. The
> proposals are all too theoretically based; instead, the evolution of the
> collections module needs to follow a slower path, one that lets the
> existing tools mature. And tools like OrderedSet need to have some early
> life as a recipe to see how people use it and to learn from their
> experiences.
>

I agree that I need use-cases and will go through my old code and post 
some in the near future.  I also think that use cases will follow if the 
option is provided, that it is the responsibility of the language 
developers to push innovation and a certain paradigm.

I am firmly convinced that there need to be bag and setlist classes 
included at least in the collections module, if not the standard 
library.  Python is supposed to be *batteries included* and I see these 
as glaring omissions.  If my arguments were too theoretical it's because 
I'm a mathematician at heart and am more apt to be convinced by a 
theoretical argument as opposed to a utilitarian one.

-Michael Lenzen



More information about the Python-ideas mailing list