[Python-ideas] Proposal for extending the collections module - bags / multisets, ordered sets / unique lists
m.lenzen at gmail.com
Sat Jul 18 03:54:15 CEST 2009
In a nutshell, I want to add 2 new classes (and respective frozen
varients) to the collections module, namely a bag (multiset) class and a
setlist (ordered set/unique list) class. I know this has been floated
before, but I think it merits more consideration.
Everything I'm writing here is available with better formatting here:
The latest version of my implementation is here:
I look at the basic data structures of Python like this (ignoring
mutability which I will mention later) :
| Unordered | Ordered |
Non-Unique | | list |
Unique | set | |
Python is missing 2 basic collections. First, an unordered non-unique
collection, a bag aka multiset, the simplest collection there is. You
can't even really call them a data structure because there's no
structure to them (which is their beauty). Secondly it's missing an
ordered unique collection, an ordered set or unique list, what I'm
calling a setlist.
That is the scope of when I started out, of course it grew and grew.
The last addition but the first I will list, because I think it is the
coolest and provides the most functionality, is the new function I
propose adding to collections. The function is simply called
'collection' and has the following signature:
collection(iterable=None, mutable=False, ordered=False, unique=False)
What it does is return a collection (instantiated from iterable if
passed) with the passed properties. This will only be possible with
then new classes I propose, because they don't exist yet.
New ABCs (abstract base classes)
After starting to study the ABCs built in to Python I came up with 2
more ABCs that I think should be included:
Collection - implements Sized, Iterable and Container. Sequence, Set
and Mapping would then all inherit Collection instead of the three
aforementioned classes. Built-in classes that would be subclasses of
Collection are set, frozenset, list and tuple. Collection would provide
* _from_iterable(cls, it) - This is a classmethod that I took from
Set, for convenience of returning new instantiations
* __getitem__(key, value) - This is an abstract method. Every
Collection would have to make its elements accessible. Of course set and
frozenset do not currently implement this method so they will have to be
changed, more later.
Mutable - I think there should be one metaclass for all mutable types
just like there is currently Iterable, Sized and Container. We could
then check instanceof(obj, Mutable). MutableSequence, MutableSet and
MutableMapping would inherit Mutable, or we might be able to just do
away with them altogether. There are 3 abstractmethods defined:
* pop(self) - This is the simplest to implement, all mutable classes
* __setitem__(self, key, value) - This is already implemented for
list and dict but would have to be defined for set, more later
* __delitem__(self, key) Same as above.
Extending set to fit the ABCs
I mentioned before that sets don't currently fit into my model. Here are
my propositions for the three methods that would need to be implemented.
* set.__setitem__(self, elem, value) - Set whether or not elem is in
the set based on the boolean evaluation of value. s[elem] = 1 would call
s.add(elem) and s[elem] = False would call s.remove(elem), appropriately
raising a KeyError if elem not in s. The reason I chose this way is that
the only thing you can set about an element of a set is whether or not
it is present.
* __getitem__(self, item) - Simply returns item in self. Just like a
bag returns the multiplicity of an item when you get it, this returns
the number of times item appears in self. This also corresponds to the
definition of setitem.
* __delitem__(self, item) - This is the simplest case, just make it
an alias for set.remove(item)
I think this is the most controversial part of my proposition, but the
more I think about it, the more I like it. s aren't reserved for one
single purpose in Python, lists restrict the index to an integer while
dicts allow any Hashable object. list allows slicing inside the s
while it is meaningless for a dict.
Finally, the New Classes
These are the 2 new Collections (4 if you count the frozen variants) to
fill in the holes I mentioned in the beginning.
bags - These would be true multisets, AKA unordered tuples. See the
wikipedia page on Multisets for more info. The benefits of a bag vs a
list are that order doesn't matter for equality testing and that
searching for an item is constant time. The latter is the real convincer
for me. I think this should replace the Counter class in the
collections module which doesn't provide anything more than
setlist - AKA unique list, ordered set, bijection int->hashable, and
ordered collection of unique elements. I find myself often creating a
list of unique elements, then an inverse dict to look up the location
quickly or test inclusion. I chose the name setlist because it's not a
set xor a list, it's both. Then it occurred to me that bands' setlists
are a perfect example of this data structure, an ordered collection of
unique elements, real bands that is,
More information about the Python-ideas