[Python-Dev] collections module

Sat Jan 10 07:07:51 EST 2004

Raymond Hettinger wrote:

> I truly don't understand your answer.  The bag and queue collections
> have been tried and true in many languages.  

I'm not so sure about a queue class, but I do believe that bag classes
got into libraries only for academic reasons such as symmetry. In
particular, they got into Smalltalk because of that reason, and
everybody copies it because Smalltalk has it. This has to stop :-)

> Do I need to turn this into a PEP, take it to comp.lang.python,
> cheerlead for it, and make the total effort ten times harder than it
> needs to be?

If everybody else thinks bags are a great idea, go ahead. YAGNI.

> Or am I missing something basic like it being detrimental to the
> language in some way?

They are perhaps detrimental because they violate the principle

There should be one-- and preferably only one --obvious way to do it.

For queues, this is debatable, as the obvious way to do it
(list.append, list.pop(0)) is inefficient if the list is long.

In the specific case of bags, I think a different generalization
would be more useful. The only use case of bags is when you want
to count things, so the current pattern is

# count them
d = {}
for t in things:
   d[t] = d.get(t, 0) + 1
# print them sorted by item
for k, v in sorted(d.iteritems()):
   print k, v
# print them sorted by frequency
for k, v in sorted(d.iteritems(), key=operator.itemgetter(1)):
   print k,v

With a bag, this could be "simplified" to

d = Bag()
for t in things:
   d.add(t)
# print them sorted by item
for k, v in sorted(d.iteritems()):
   print k, v
# print them sorted by frequency
for k, v in d.sortedByFrequency():
   print k,v

I don't think this is reasonably simpler.

What's worse: this type does not support a very similar activity,
namely collecting similar items

d = {}
for t in things:
   k = category(d) # e.g. if t are files, one may collect them
                   # by size class (empty, <1k, <10k, <100k, ...)
   try:
     d[k].append(t)
   except KeyError:
     d[k] = [t]

Both applications could be implemented if dictionaries had
the notion of a default value, which would be generated
by a callable (whether with or without argument is debatable)

# count things
d = {}
d.defaultmaker = lambda k:0
for t in things:
   d[t] += 1

# sort things by category
d = {}
d.defaultmaker = lambda k:[]
for t in things:
   d[category(t)].append(t)

Having a key in the lambda is probably better, as it would
also allow lazily-filled dictionaries.

Regards,
Martin