Why are tuples immutable?

Adam DePrince adam at cognitcorp.com
Thu Dec 16 05:40:29 CET 2004


On Mon, 2004-12-13 at 17:23, jfj wrote: 
> Yo.
> 
> Why can't we __setitem__ for tuples?
> The way I see it is that if we enable __setitem__ for tuples there
> doesn't seem to be any performance penalty if the users don't use it
> (aka, python performance independent of tuple mutability).
> 
> On the other hand, right now we have to use a list if we want to
> __setitem__ on a sequence. If we could use tuples in the cases where
> we want to modify items but not modify the length of the sequence,
> programs could be considerably faster. Yes?
> 
> Enlighten me.
> 
> G.
> 
> 


Forgive me ... my first copy of this was accidentally in html ... 

Lets forget discussions of performance of tuples vs. lists for a moment,
performance issues associated with allowing for changing keys in
dictionaries and consider the larger question.

Q: Why do we have immutable data types?

A: Abstraction, logical consistency and annoying theoretical computer
science.

Let us consider the Set type.  Ignore for now the fact that it is
basically a dictionary.   We can all agree that a Set is an unordered
collection of distinct objects.  Similar to their theoretical
counterparts we can add elements, remove elements and perform a variety
of set arithmetic operations on sets.  There is no theoretical set
operation called "transmute" - that is, you can't mutate an element to
something else.  It isn't part of the model.  You can remove an element
and add another, but that isn't the same.  This is probably a good
thing, because if you allow mutation of elements withing a set, look at
what can happen:

Set A has A, B and C.
Set B has C, D and E.

The intersection of the two sets is a Set containing C.  The cardinality
of the sets (len) is the same.

Now, lets "mutate" the B in Set A to C.  First of all, A ceases to be a
set, for we have the elements A C and C.   Do we keep this non-set like
set?  Or do we remove the element?  The first is out of the question;
Set carries with it a definition that is borrowed from mathematics. 
Keeping the extra C around makes our object a collection, not a set, and
should be called such.  I, for one, don't want to use a purposefully
misleading language.  

Perhaps we resolve our issue by defining the mutation of B to C to
result in set A containing the elements A and C.  This would still be a
set, but there is another problem.  We didn't *do* anything to Set A,
but it changed in a fundamental manner.  Because of a change in one
place, the mutation of B, the cardinality of our set has suddenly
changed without saying "remove B from Set A."  That is a Bad Programming
practice.  Touching something in one place shouldn't break something
else.  

The solution to this problem is to recognize that certain mathematical
data-structures basically require the use of parameters that cannot be
serendipitously changed.  But why kowtow to mathematics at the expense
of cool hack-ability? 

When you set your thoughts down to code, your writing serves two
purposes.  

1) Instruct the machine on your intent.
2) Argue to the reader the correctness of #1.

Around here, we call #2 "self documenting code."  Now, I've seen a lot
of discussion of this on python-list.  Frequently silly pieces of code
such as (forgive me):

flatten = sum
list of strings = flatten( list of lists of strings )

are written and argued for in the name of self documenting code. 
Personally, I think that is the "This line adds 1 to X and assigns it to
Y" variety of self documenting code.  

What is really meant by self documenting code is code that argues in a
semi-rigorous sense for its own correctness.  Programming is
fundamentally an engineering discipline.  It differs from the more
physical engineering disciplines - the parts are all abstract
mathematical entities, but their reliability as such is offset by the
sheer complexity of the constructed machine.   Much as a mechanical
engineer might discuss a tie rod end or pressure vessel in the language
of calculus and statics, a computer programmer really should wrap their
discussion in the language of discrete mathematics and computational
linguistics.  

A tuple has a specific mathematical meaning that admittedly differs
somewhat depending on the context.  Same for a Set, List and so forth. 
Cast your problem in terms of mathematics and keep your language's
nomenclature and behavior similar to existing mathematical constructs. 
Doing this will afford you the opportunity to stand upon the shoulders
of giants as you go about your work.  And to properly stand upon those
shoulders, we must match the theoretical models used.  That means we
need more, not fewer, immutable counterparts to most of our datatypes. 




Adam DePrince 




More information about the Python-list mailing list