[Python-ideas] Proposal for extending the collections module - bags / multisets, ordered sets / unique lists

Raymond Hettinger python at rcn.com
Sun Jul 19 03:45:01 CEST 2009


On Jul 17, 2009, at 6:54 PM, Michael Lenzen wrote:

> In a nutshell, I want to add 2 new classes (and respective frozen  
> varients) to the collections module, namely a bag (multiset) class  
> and a setlist (ordered set/unique list) class.  I know this has been  
> floated before, but I think it merits more consideration.

FWIW, my goal for the collections module is to keep it relatively  
small and not to include every variant someone can think of.   We've  
already rejected red-black trees, pairing heaps, and blist (which is a  
list variant that allows fast insertions).

Starting with 2.7 and 3.1, we do have a Counter object that has some  
multiset capabilities -- I want to see how that fares in the real  
world before entertaining a proposal for another multiset/bag variant.

With respect to ordered sets, I started by posting an efficient recipe  
on ASPN so that we could see how it gets used.  Though I'm a fan of  
the idea in general, I've found few use cases and have noted nearly  
zero uptake on the recipe (so the OrderedSet idea may just be  
YAGNI).    The wording of your proposal indicates that it is motivated  
by a desire to complete the grid of possible variants (frozen/ 
unfrozen, itemized/multivalued, ordered/unordered, etc), but it would  
have been better to focus on use cases (existing needs that aren't  
being met by the current toolsets).

After posting the pure Python recipe for ordered sets, my plan had  
been to wait to see how it was received.  If there was an active  
interest (none so far), I've already worked out the steps needed  to  
extend existing C code to make the built-in sets remember their  
insertion order (no need for a separate type, just make the regular  
sets remember their order) -- this can be done with very little impact  
on performance.

So that leaves your proposed extensions to the abstract base classes.   
My issue with those is that collections ABCs suffer from the same  
problem as the itertools module -- the more of them you add, the  
harder it is to learn and remember which ones to use (it is possible  
to go wild with collections abstractions and then find that zero value  
has been added).  When developing the ABCs, Guido rejected the idea of  
adding intermediate layers of abstraction (IIRC, he documented his  
rationale in the PEP).

My own thought on the subject is that the collections ABC development  
needs to be somewhat restrained.  They are new and have not yet been  
widely adopted.  The community is just now gaining experience with  
them and those experiences are vital for making informed choices on  
how or whether to grow the collections ABCs.

Also, the collections ABCs that we already have have not been fully  
exercised or debugged.  The ones we have need to be shaken-out before  
additional efforts are undertaken to extend them (or to create  
intermediate layers of abstraction).

 >  The last addition but the first I will list, because I think it is  
the coolest
 > and provides the most functionality, is the new function I propose
 > adding to collections.  The function is simply called 'collection'  
and has
 > the following signature:
 >
 > collection(iterable=None, mutable=False, ordered=False, unique=False)

There are a couple of problems here.  Unifying many collections under  
a single API has the unfortunate side-effect of freezing development  
on those collections (it becomes hard to change one without changing  
them all).  Another issue is that the API doesn't seem comfortable to  
me -- when I want a set or list or dict, I just say so -- I don't want  
to go through a menu of all of the possible variants.  Also, it is not  
like those variants are readily substitutable for one another -- if  
your code relies on hashability, then you don't really have the option  
of switching the mutability flag.  To me, this factory function is  
like a comp sci exercise in hyper-generalization -- it doesn't make  
any real-world code any cleaner.

Sorry for the sour notes.  You're definitely showing insight into the  
relationship between various collection classes but I don't think the  
ideas represent good, use-case driven language development.   The  
proposals are all too theoretically based; instead, the evolution of  
the collections module needs to follow a slower path, one that lets  
the existing tools mature.  And tools like OrderedSet need to have  
some early life as a recipe to see how people use it and to learn from  
their experiences.


Raymond



More information about the Python-ideas mailing list