Is 'everything' a refrence or isn't it?

Steven D'Aprano steve at REMOVETHIScyber.com.au
Thu Jan 5 19:59:21 EST 2006


On Thu, 05 Jan 2006 05:21:24 +0000, Bryan Olson wrote:

> Steven D'Aprano wrote:
>> Mike Meyer wrote:
> [...]
>>> Correct. What's stored in a list is a reference.
>> 
>> Nonsense. What is stored in the list is an object.
> 
> According to the Python Language Reference:
> 
>      Some objects contain references to other objects; these are
>      called containers. Examples of containers are tuples, lists
>      and dictionaries.
>      [http://docs.python.org/ref/objects.html]


Is it so hard to understand that the word "reference" has a general,
imprecise meaning in common English (which is clearly how the quote
above is using the word) while still having in the context of assignment
and argument passing a more precise meaning which is dangerously
misleading?

Words are important -- not only for what they mean, but for what the
connotations they carry. For people who come to Python from C-like
languages, the word "reference" means something that is just not true in
the context of Python's behaviour. That's why people come to Python with a
frame that tells that what call by reference implies ("I can do this...")
and then they discover that they often *can't* do that.

It is a crying shame, because reference is a nice, generic word that just
cries out to be used in a nice, generic way, as the Python doc you quoted
is doing. But unfortunately, the term "reference" has been co-opted by C
programmers to effectively mean "pointer to a value".

The aim of the Python doc authors is to communicate information about
Python effectively. That means they have to be aware of words'
connotations and the mental frames they evoke. In the same way a good
programmer must work around bugs in the operating system or libraries, a
good author must work around bugs in people's mental frames which will
cause misunderstanding. That's why "reference" is a bad word to use in the
context of Python containers and argument handling: far from communicating
correct information effectively, it gives many readers a misleading
assumption about how Python code will behave.

The proof of this is the number of times people write to this newsgroup
confused about "call by reference" -- not because they came to Python with
assumptions about its behaviour, but because somebody told that Python was
"call by reference". Or that lists contain "references" to other objects.

In English, I can say "I'm writing a book on Russian history. I've
included a reference to Smith's classic work on the tsars of Russia."
That's a good, generic use of the word "reference". Nobody thinks that
anything I do to modify my book will effect Smith's work. Books don't work
that way.

But in programming, things do work that way. If my class Book contains a
reference to Smith's classic work, I can modify it. (Unless the language
deliberately restricts my ability to modify certain objects, as Python
does with immutable objects.)

That's what programmers expect when you talk about references, especially
if they come from a C (or Pascal) background. In Python, sometimes that's
true, and sometimes it is not, and the only way to tell is by looking at
the object itself, not by thinking about Python's high-level behaviour.

Thinking about Python's behaviour ("it always passes references to
objects") will invoke misleading frames in many programmers' minds. The
word "reference" is misleading and should be avoided, because what the
average non-Python programmer understands by the word is different from
what the experienced Pythonista understands by it.

If we were writing academic papers, we could define "call by reference"
and "objects contain references" any way we liked, and it would be the
responsibility of the readers to ensure they understood *our* meaning. But
we're not -- we're communicating information to ordinary programmers, many
of whom are kids still in school, not academics. Many of them will be
coming to Python with preconceived ideas of the meaning of "call by
reference", "assign a variable", etc. It is *our* responsibility to use
language that will not be misleading, that will not invoke incorrect
frames, that will not lead them up the garden path and be surprised by
Python's behaviour.

If we say "Python is call be reference" (or call by value, as many people
also say) we *know* the consequence will be newbies writing in saying "I
was told Python is call by reference, so I did this, and it didn't work,
is that a bug in Python? What is wrong?" It is not a bug in Python, it is
a bug in their mental model of how Python works, and we put that bug in
their head. Every time that happens, it is *our* fault, not theirs, for
using language guaranteed to mislead. If we use an unfamiliar term like
"call by object", the reader has no preconceived understanding of what it
means and won't be lead to incorrect assumptions.

We know that this will happen because it has happened time and time again
in the past. Are we incapable of learning from experience? Are we
intelligent sentient beings or do we just parrot what was said in the
past with no concern for the consequences of what we say?


-- 
Steven.




More information about the Python-list mailing list