[Python-ideas] Proposal for extending the collections module - bags / multisets, ordered sets / unique lists
Steven D'Aprano
steve at pearwood.info
Sun Jul 19 10:49:09 CEST 2009
On Sun, 19 Jul 2009 05:47:28 pm Mike Meyer wrote:
> > A dict or set having a count method would make no sense either.
> > This has nothing to do with polymorphism.
>
> Actually, count on a dict makes *lots* of sense. I can think of a lot
> of uses for counting the number of occurrences of values in a
> dictionary, mostly revolving around using dictionaries to implement
> sparse arrays of various kinds.
But dicts don't allow you to look up on values, you look up on keys.
dict.count(value) might be useful -- I doubt that there are "a lot of
uses", but there are probably a few -- but it's hardly a fundamental
operation on a dict. I'd argue it would be better as a function
count_value(mapping, value) rather than a method. That way, you can
make it polymorphic on the first argument, without requiring that all
mappings inherit from each other.
Arguably you might have dict.count(key) as a dict method, but since that
can only return 0 or 1, it's just another way of spelling
key in dict
which is faster, shorter and easier to read.
> As for sets, the "count" method is obviously useful on multisets.
But not for sets, where it is just another way of spelling "element in
set".
> And
> it takes on reasonable values for the subset of multisets with unique
> values. Yes, it's either 1 or 0, and yes, it's isomorphic to testing
> membership, but it makes sense. An algorithm that uses count on
> multisets should work just fine on sets - that's what polymorphism
> buys for you.
That can't be true, because their behaviour is different and they
therefore aren't interchangeable. If obj is either a set or a multiset,
the state of obj after:
obj.add(element)
obj.add(element)
obj.remove(element)
will be different.
An algorithm that expects obj to be empty will fail if obj is a
multiset, and an algorithm that expects obj will not be empty will fail
if it is a set. Either way, they aren't interchangeable.
> Saying that "count" doesn't make sense on sets is a lot
> like saying "length" and "width" don't make sense on squares, since
> those are equal - which would mean code written to handle rectangles
> wouldn't work on squares, unless they were crammed into rectangle
> objects.
Funny you should mention that. Squares and rectangles is just another
example of the Circle And Ellipse Problem:
http://www.c2.com/cgi/wiki?CircleAndEllipseProblem
You can't expect to use squares everywhere you use rectangles. Assuming
mutable objects:
rectangle.width = 2*rectangle.height
assert rectangle.width == 2*rectangle.height
will work, but:
square.width = 2*square.height
if problematic. There are two "reasonable" approaches to it:
(1) keep the invariant width==height by setting both the width and the
height to twice the existing height, which will then cause:
assert square.width == 2*square.height
to fail; or
(2) treat the square as if it were a rectangle, and stretch the object
as requested, but that will break the invariant that squares are, well,
square.
Either way, you break something.
Note that the problem is slightly different if you use immutable
objects. Now you can't stretch a square into a rectangular shape, but
how do you create your objects? Both of:
square(width=5, height=10)
rectangle(width=5)
will fail. (Although rectangle could make height optional, and default
to the same value as width. However, that's also problematic: "In the
face of ambiguity, refuse the temptation to guess".)
> Part of the problem is that Python's collections types are a
> collection (no pun intended) of things that have shown themselves to
> be useful in practice. This has resulted in types that have nearly
> identical sets of operations on them, but the operations have
> different names, so the types - which would be polymorphic if they'd
> been designed to work together - aren't polymorphic.
They wouldn't be polymorphic, because they aren't polymorphic. The
following is a good example of why they're not polymorphic, but some
algorithms are agnostic to the choice of data structure:
> I get bit by
> this fairly often. Most recently spending last friday afternoon
> auditing code to make sure I'd turned all the list methods into the
> appropriate set methods in a graph class's node walking method.
Lists are ordered, sets are not. You can *insert* or *append* to a list,
but not to a set, because sets aren't ordered.
True, one could artificially pretend that sets are polymorphic to lists
by using misleading method names:
set.add -> set.append
and do-nothing methods like set.sort() and set.reverse(). But apart from
satisfying people who have read too many OO books, what would be the
point?
--
Steven D'Aprano
More information about the Python-ideas
mailing list