[Python-ideas] Proposal for extending the collections module - bags / multisets, ordered sets / unique lists

Sun Jul 19 10:49:09 CEST 2009

On Sun, 19 Jul 2009 05:47:28 pm Mike Meyer wrote:

> > A dict or set having a count method would make no sense either.
> > This has nothing to do with polymorphism.
>
> Actually, count on a dict makes *lots* of sense. I can think of a lot
> of uses for counting the number of occurrences of values in a
> dictionary, mostly revolving around using dictionaries to implement
> sparse arrays of various kinds.

But dicts don't allow you to look up on values, you look up on keys.

dict.count(value) might be useful -- I doubt that there are "a lot of 
uses", but there are probably a few -- but it's hardly a fundamental 
operation on a dict. I'd argue it would be better as a function 
count_value(mapping, value) rather than a method. That way, you can 
make it polymorphic on the first argument, without requiring that all 
mappings inherit from each other.

Arguably you might have dict.count(key) as a dict method, but since that 
can only return 0 or 1, it's just another way of spelling

key in dict

which is faster, shorter and easier to read.

> As for sets, the "count" method is obviously useful on multisets. 

But not for sets, where it is just another way of spelling "element in 
set".

> And 
> it takes on reasonable values for the subset of multisets with unique
> values. Yes, it's either 1 or 0, and yes, it's isomorphic to testing
> membership, but it makes sense. An algorithm that uses count on
> multisets should work just fine on sets - that's what polymorphism
> buys for you. 

That can't be true, because their behaviour is different and they 
therefore aren't interchangeable. If obj is either a set or a multiset, 
the state of obj after:

obj.add(element)
obj.add(element)
obj.remove(element)

will be different.

An algorithm that expects obj to be empty will fail if obj is a 
multiset, and an algorithm that expects obj will not be empty will fail 
if it is a set. Either way, they aren't interchangeable.

> Saying that "count" doesn't make sense on sets is a lot 
> like saying "length" and "width" don't make sense on squares, since
> those are equal - which would mean code written to handle rectangles
> wouldn't work on squares, unless they were crammed into rectangle
> objects.

Funny you should mention that. Squares and rectangles is just another 
example of the Circle And Ellipse Problem:

http://www.c2.com/cgi/wiki?CircleAndEllipseProblem

You can't expect to use squares everywhere you use rectangles. Assuming 
mutable objects:

rectangle.width = 2*rectangle.height
assert rectangle.width == 2*rectangle.height

will work, but:

square.width = 2*square.height

if problematic. There are two "reasonable" approaches to it:

(1) keep the invariant width==height by setting both the width and the 
height to twice the existing height, which will then cause:

assert square.width == 2*square.height

to fail;  or

(2) treat the square as if it were a rectangle, and stretch the object 
as requested, but that will break the invariant that squares are, well, 
square.

Either way, you break something.

Note that the problem is slightly different if you use immutable 
objects. Now you can't stretch a square into a rectangular shape, but 
how do you create your objects? Both of:

square(width=5, height=10)
rectangle(width=5)

will fail. (Although rectangle could make height optional, and default 
to the same value as width. However, that's also problematic: "In the 
face of ambiguity, refuse the temptation to guess".)

> Part of the problem is that Python's collections types are a
> collection (no pun intended) of things that have shown themselves to
> be useful in practice.  This has resulted in types that have nearly
> identical sets of operations on them, but the operations have
> different names, so the types - which would be polymorphic if they'd
> been designed to work together - aren't polymorphic. 

They wouldn't be polymorphic, because they aren't polymorphic. The 
following is a good example of why they're not polymorphic, but some 
algorithms are agnostic to the choice of data structure:

> I get bit by 
> this fairly often. Most recently spending last friday afternoon
> auditing code to make sure I'd turned all the list methods into the
> appropriate set methods in a graph class's node walking method.

Lists are ordered, sets are not. You can *insert* or *append* to a list, 
but not to a set, because sets aren't ordered.

True, one could artificially pretend that sets are polymorphic to lists 
by using misleading method names:

set.add -> set.append

and do-nothing methods like set.sort() and set.reverse(). But apart from 
satisfying people who have read too many OO books, what would be the 
point?

-- 
Steven D'Aprano