[Python-ideas] Proposal for extending the collections module - bags / multisets, ordered sets / unique lists
Raymond Hettinger
python at rcn.com
Sun Jul 19 03:45:01 CEST 2009
On Jul 17, 2009, at 6:54 PM, Michael Lenzen wrote:
> In a nutshell, I want to add 2 new classes (and respective frozen
> varients) to the collections module, namely a bag (multiset) class
> and a setlist (ordered set/unique list) class. I know this has been
> floated before, but I think it merits more consideration.
FWIW, my goal for the collections module is to keep it relatively
small and not to include every variant someone can think of. We've
already rejected red-black trees, pairing heaps, and blist (which is a
list variant that allows fast insertions).
Starting with 2.7 and 3.1, we do have a Counter object that has some
multiset capabilities -- I want to see how that fares in the real
world before entertaining a proposal for another multiset/bag variant.
With respect to ordered sets, I started by posting an efficient recipe
on ASPN so that we could see how it gets used. Though I'm a fan of
the idea in general, I've found few use cases and have noted nearly
zero uptake on the recipe (so the OrderedSet idea may just be
YAGNI). The wording of your proposal indicates that it is motivated
by a desire to complete the grid of possible variants (frozen/
unfrozen, itemized/multivalued, ordered/unordered, etc), but it would
have been better to focus on use cases (existing needs that aren't
being met by the current toolsets).
After posting the pure Python recipe for ordered sets, my plan had
been to wait to see how it was received. If there was an active
interest (none so far), I've already worked out the steps needed to
extend existing C code to make the built-in sets remember their
insertion order (no need for a separate type, just make the regular
sets remember their order) -- this can be done with very little impact
on performance.
So that leaves your proposed extensions to the abstract base classes.
My issue with those is that collections ABCs suffer from the same
problem as the itertools module -- the more of them you add, the
harder it is to learn and remember which ones to use (it is possible
to go wild with collections abstractions and then find that zero value
has been added). When developing the ABCs, Guido rejected the idea of
adding intermediate layers of abstraction (IIRC, he documented his
rationale in the PEP).
My own thought on the subject is that the collections ABC development
needs to be somewhat restrained. They are new and have not yet been
widely adopted. The community is just now gaining experience with
them and those experiences are vital for making informed choices on
how or whether to grow the collections ABCs.
Also, the collections ABCs that we already have have not been fully
exercised or debugged. The ones we have need to be shaken-out before
additional efforts are undertaken to extend them (or to create
intermediate layers of abstraction).
> The last addition but the first I will list, because I think it is
the coolest
> and provides the most functionality, is the new function I propose
> adding to collections. The function is simply called 'collection'
and has
> the following signature:
>
> collection(iterable=None, mutable=False, ordered=False, unique=False)
There are a couple of problems here. Unifying many collections under
a single API has the unfortunate side-effect of freezing development
on those collections (it becomes hard to change one without changing
them all). Another issue is that the API doesn't seem comfortable to
me -- when I want a set or list or dict, I just say so -- I don't want
to go through a menu of all of the possible variants. Also, it is not
like those variants are readily substitutable for one another -- if
your code relies on hashability, then you don't really have the option
of switching the mutability flag. To me, this factory function is
like a comp sci exercise in hyper-generalization -- it doesn't make
any real-world code any cleaner.
Sorry for the sour notes. You're definitely showing insight into the
relationship between various collection classes but I don't think the
ideas represent good, use-case driven language development. The
proposals are all too theoretically based; instead, the evolution of
the collections module needs to follow a slower path, one that lets
the existing tools mature. And tools like OrderedSet need to have
some early life as a recipe to see how people use it and to learn from
their experiences.
Raymond
More information about the Python-ideas
mailing list