terminological obscurity

Michael Chermside mcherm at mcherm.com
Mon May 24 11:02:51 EDT 2004


Grant Edwards writes:
> I think the fact that Python lists can be heterogogenous is one
> of the most brilliantly useful things in the language, but
> apparently we're not supposed to use lists like that.  Since
> tuples aren't mutable, I'm completely at a loss as to how we're
> supposed to deal with mutable heterogenous sequences.

I'm afraid that I may have been a little misleading by simply
tossing in the term "heterogenous sequence" without making it
clear just what I meant. Guido's quote: "Tuples are for heterogeneous
data, list are for homogeneous data. Tuples are *not* read-only
lists." suffers from the same problem. Earlier in this discussion,
David Eppstein used the term "heterogeneity of purpose" as contrasted
with "heterogeneity of type", which I think captures it quite nicely,
but I tend to think very concretely, so I will try to elaborate with
examples to illustrate how I (and, I think, Guido as well) am using
the term.

The simplest example of a homogeneous list would be a big list of
numbers -- perhaps the amounts of the transactions on a bank account.
Add them all up, and you get the current balance. This list is
clearly homogeneous.

A list of Transaction objects might be homogeneous too, but what if
you have subclasses of transaction? Then the list might contain
DepositTransaction objects, WithdrawalTransaction objects, even a
few InterestEarnedTransaction objects. The list now contains several
different types of objects, so is it now heterogeneous?

No... this list is still a homogeneous list -- the important point
is that all of the things in the list are Transactions... even if
some are subclass instances. There are probably some properties
that are common to all Transactions: perhaps they all have a "date"
field, and all have a getDescription() method. Then we can go
through the list and sort it by date (since all items in the list
have a date), or assemble a statement by invoking getDescription()
on each Transaction. We can do this because we know that every
object is a Transaction... if someone stuck an int in the list,
then we could no longer do these things. A homogeneous list doesn't
need to contain all the exact same type -- it doesn't even require
that the objects all share a common superclass (although they
usually do). It merely has to contain only objects with certain
properties, and what you can do to items in the list (without
typechecking first) is determined by what properties you know the
objects all share.

An extreme example would be a list maintained by a memory manager.
This might contain ints, strings, and user-defined classes all in
one list. But the list is homogeneous because all we care about
is that it contains only objects that have been allocated by the
memory manager. In this particular case, there's very little that
the objects in the list share... but they all have a reference
count and a memory location, and that's really all that our memory
manager cares about.

So by now, I've probably convinced you that *every* collection is
homogeneous, and thus removed all meaning from the words
"homogeneous" and "heterogeneous". Not so... it depends (as David
suggested) more on how you plan to USE the list than on what the
types are. Again, I'll try to find an example.

Suppose that I have the following list: [Dog, 'Fido', 2, 14]. The
first item is some subclass of Animal, the second is an animal's
name, the third is that animal's name in (real) years, and the
fourth is the animal's age in "dog-years" (or whatever the
appropriate animal is). I could have other examples of lists like
this: [Dog, 'Rover', 4, 28] or [Bunny, 'Mopsie', 2, 32]. These
lists are heterogeneous. It wouldn't make any sense to use a for
loop to go through the items in the list... there's no feature
that we care about which is common to all the items in the list.
On the other hand, it IS sensible to take several lists like
this and examine the 3rd item in each one. This is an example of
a heterogeneous list. It is very useful for passing around
several values at once (it might, for instance, be used to return
multiple values from a function) -- if it got much fancier than
this example then we'd probably be better off using a class.

So now that I've gone on at great length about what I mean by
"homogeneous items" and "heterogeneous items", let's return to
that statement that Guido made. Part of issue seems to be
mutability, so we'll include that too, thus giving us four
possible things you might want:

    1. a mutable heterogeneous collection
    2. an immutable heterogeneous collection
    3. a mutable homogeneous collection
    4. an immutable homogeneous collection

Guido observed that heterogeneous collections are used mostly
just to pass around a bunch of related objects (eg: function
arguments, multiple return values, values from a stat call).
Because you're using the tuple just to pass them around, there's
rarely any need for mutability. So he created the 'tuple' type
expressly for handling case #2, and provided NO way of handling
case #1. Then he observed that homogeneous collections are
typically built up one item at a time, which requires
mutability. So he created the 'list' type for handling case #3.

Guido's statement:
  "Tuples are for heterogeneous data, list are for homogeneous
  data. Tuples are *not* read-only lists."
is intended to point out that although lists are mutable and
tuples are not, that whether a collection is homogeneous or not
is a MORE IMPORTANT distinction than whether it will be changed.

So now let's suppose you're dealing with a real coding problem
and you need to store a collection. What to do? Well, first,
figure out whether your collection is homogeneous or
heterogeneous (this is basically a question about what you plan
to DO with the collection, not what kinds of objects are stored
in it). Then (if you want) you can also think about whether you
need mutability. So you'll find yourself in one of the 4
situations enumerated above. If you're in case #2 or #3, then
use a tuple or list (respectively) -- that's what the types
were designed for.

If you find yourself in case #1, then I suggest you take a
careful look at your design. Should you be using a class here
instead?

And if you're in case #4, then normally I'd suggest just using
a list... the fact that it CAN be mutated doesn't mean that you
HAVE to. But perhaps for performance reasons (or some other
reason) you'd prefer to use a tuple. That's OK, it's just not
the use tuples were designed for, so you'll find that tuple may
be missing a few convenience methods for working with homogeneous
collections. If that bothers you, just use list.

I'm repeating myself here, but the key point of this whole essay
is that you should understand the difference between homogeneous
and heterogeneous collections (it's a distinction in how they're
USED not just what types they contain), and realize that this
distinction is MORE IMPORTANT than mutability, and that list and
tuple were designed around these uses.

-- Michael Chermside





More information about the Python-list mailing list