[Python-ideas] Proposal for extending the collections module - bags / multisets, ordered sets / unique lists

Sun Jul 19 17:53:43 CEST 2009

On Sun, 19 Jul 2009 18:49:09 +1000
Steven D'Aprano <steve at pearwood.info> wrote:

> On Sun, 19 Jul 2009 05:47:28 pm Mike Meyer wrote:
> 
> > > A dict or set having a count method would make no sense either.
> > > This has nothing to do with polymorphism.
> >
> > Actually, count on a dict makes *lots* of sense. I can think of a lot
> > of uses for counting the number of occurrences of values in a
> > dictionary, mostly revolving around using dictionaries to implement
> > sparse arrays of various kinds.
> 
> But dicts don't allow you to look up on values, you look up on keys.

So? I wasn't talking about looking up values, I was talking about
counting them. Of course, lists don't do that very well either - you
can only look up two values, or maybe it's one. And that's just
another thing that makes changing an implementations from lists to dicts

> dict.count(value) might be useful -- I doubt that there are "a lot of 
> uses", but there are probably a few -- but it's hardly a fundamental 
> operation on a dict. I'd argue it would be better as a function 
> count_value(mapping, value) rather than a method. That way, you can 
> make it polymorphic on the first argument, without requiring that all 
> mappings inherit from each other.

That would be fine - except that count_values(list, value) is another
way of spelling list.count(value).

> Arguably you might have dict.count(key) as a dict method,

You might argue that, but I wasn't.

> > As for sets, the "count" method is obviously useful on multisets. 
> But not for sets, where it is just another way of spelling "element in 
> set".

True. And if you know you're working with sets, that's the preferred
spelling.

> 
> > And 
> > it takes on reasonable values for the subset of multisets with unique
> > values. Yes, it's either 1 or 0, and yes, it's isomorphic to testing
> > membership, but it makes sense. An algorithm that uses count on
> > multisets should work just fine on sets - that's what polymorphism
> > buys for you. 
> 
> That can't be true, because their behaviour is different and they 
> therefore aren't interchangeable. If obj is either a set or a multiset, 
> the state of obj after:
>
> obj.add(element)
> obj.add(element)
> obj.remove(element)
> 
> will be different.

That just means the two classes don't always behave the same, which is
why we give them different names.

> An algorithm that expects obj to be empty will fail if obj is a 
> multiset, and an algorithm that expects obj will not be empty will fail 
> if it is a set. Either way, they aren't interchangeable.

So? An algorithm that doesn't depend on that behavior will still
work. Yes, multisets and sets are different types, so not all
algorithms that work on one will work properly on the other. Making
them polymorphic will make the set of algorithms that work on both
larger, which helps with not repeating yourself.

> > Saying that "count" doesn't make sense on sets is a lot 
> > like saying "length" and "width" don't make sense on squares, since
> > those are equal - which would mean code written to handle rectangles
> > wouldn't work on squares, unless they were crammed into rectangle
> > objects.
> 
> Funny you should mention that. Squares and rectangles is just another 
> example of the Circle And Ellipse Problem:
> 
> http://www.c2.com/cgi/wiki?CircleAndEllipseProblem
> 
> You can't expect to use squares everywhere you use rectangles.

I would never argue that you can.  If there were identical, we
wouldn't have two names for them.

> Assuming mutable objects:

That's an *excellent* place to start if you're going to show they're
different kinds of objects! Ditto for sets vs. multisets. It's their
*values* that determine which class they belong to, so changing their
values is the easiest way to break them.

> Note that the problem is slightly different if you use immutable 
> objects. Now you can't stretch a square into a rectangular shape, but 
> how do you create your objects? Both of:

But only slightly - you have to know what type of object you're
creating if you're going to create them. 

This is a well-know - and fairly well-analyzed - problem. Created
types need to be contravariant instead of covariant.
> > Part of the problem is that Python's collections types are a
> > collection (no pun intended) of things that have shown themselves to
> > be useful in practice.  This has resulted in types that have nearly
> > identical sets of operations on them, but the operations have
> > different names, so the types - which would be polymorphic if they'd
> > been designed to work together - aren't polymorphic. 
> They wouldn't be polymorphic, because they aren't polymorphic. The 
> following is a good example of why they're not polymorphic, but some 
> algorithms are agnostic to the choice of data structure:

They aren't polymorphic because python has chosen names that make them
non-polymorphic.

> > I get bit by 
> > this fairly often. Most recently spending last friday afternoon
> > auditing code to make sure I'd turned all the list methods into the
> > appropriate set methods in a graph class's node walking method.
> Lists are ordered, sets are not. You can *insert* or *append* to a list, 
> but not to a set, because sets aren't ordered.

Of course I can insert and append to a set. Just because the set loses
the ordering information doesn't mean the element wasn't added to the
set (or not, depending).

> True, one could artificially pretend that sets are polymorphic to lists 
> by using misleading method names:
> 
> set.add -> set.append
> 
> and do-nothing methods like set.sort() and set.reverse(). But apart from 
> satisfying people who have read too many OO books, what would be the 
> point?

Saving time for people who are used to having a well-designed
hierarchy of collections by letting them not waste time dealing with
name differences over implementation details. Things like repeating
myself by writing foo_for_sets and foo_for_dicts and foo_for_lists, or
changing method names, or mapping between two representations so I can
use code that chose one set of names, or dealing with the headaches of
maintaining two copies of a structure.

Python has been moving this direction for a long time - int/long
unification, lists being acceptable most places that tuples are,
adding iterator interfaces to the collection types - all are things
that make using those types in a polymorphic manner easier. This is
just another step in that direction.

	<mike
-- 
Mike Meyer <mwm at mired.org>		http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org