[Python-Dev] collections module

Raymond Hettinger python at rcn.com
Fri Jan 9 00:32:52 EST 2004


I would like to establish a new module for some collection classes.

The first type would be a bag, modeled after one of Smalltalk's
key collection classes (similar to MultiSets in C++ and bags in
Objective C).

The second type would be a high speed queue implemented using
linked list blocks like we used for itertools.tee().  The 
interface could be as simple as push, pop, and len.  Each
operation ought to run about as fast a list assignment (meaning
that it beats the heck out of the list.pop(0), list.append(data)
version).

This module could also serve as the "one obvious place to put it"
for other high performance datatypes that might be added in the
future (such as fibheaps or some such).

Guido wanted this to be discussed by a number of people.  While
a rich set of collection classes have proven their worth in other
contexts, it could be that more is not better for python.  One the
language's elegances is the ability to do just about anything with
just lists and dicts.  If you have too many mutable containers to 
choose from, the choice of which to use becomes less obvious.  

My thoughts are that the existence of the array module, for example, 
has never impaired anyone's ability to learn or use lists and dicts.  

Also, no one reaches for a bag unless they are counting things.  

Likewise, queues are in a mental concept space distinct from dicts, 
sets, and such.  My only worry with queues is differentiating
them from Queue objects which have builtin synchronization for 
use in threads.

Besides, having bag and queue in a separate module means that they
needn't enter anyone's consciousness until a need arises.

What do you guys think?


Raymond Hettinger



>>> from collections import bag
>>> b = bag('banana')
>>> b.sortedbycount()
[('a', 3), ('n', 2), ('b', 1)]
>>> list(b)
['a', 'a', 'a', 'b', 'n', 'n']
>>> b.asSet()
set(['a', 'b', 'n'])




More information about the Python-Dev mailing list