[Tutor] class variables

Sat Dec 21 08:14:32 CET 2013

On Fri, Dec 20, 2013 at 02:04:49AM -0500, Keith Winston wrote:
> I am a little confused about class variables: I feel like I've repeatedly
> seen statements like this:

I don't like the terms "class variable" and "instance variable". In the 
Python community, these are usually called class and instance attributes 
rather than variables or members.

(Sometimes, people will call them "members", especially if they are used 
to C#. The meaning here is member as in an arm or leg, as in 
"dismember", not member in the sense of belonging to a group.)

Normally, we say that a string variable is a variable holding a string, 
a float variable is a variable holding a float, an integer variable is a 
variable holding an integer. So a class variable ought to be a variable 
holding a class, and an instance variable ought to be a variable holding 
an instance. In Python we can have both of those things!

Unlike Java, classes are "first-class citizens" and can be treated 
exactly the same as strings, floats, ints and other values. So a "class 
variable" would be something like this:

for C in list_of_classes:
    # Here, C holds a class, and so we might call
    # it a "class variable", not a string variable
    do_something_with(variable)

> There is only one copy of the class variable and when any one object makes a
> change to a class variable, that change will be seen by all the other
> instances.
> Object variables are owned by each individual object/instance of the class.
> In this case, each object has its own copy

Talking about copies is not a good way to understand this. It might make 
sense to talk about copies in some other languages, but not in Python. 
(Or any of many languages with similar behaviour, like Ruby or Java.) 

I'm going to give you a simple example demonstrating why thinking about 
copies is completely the wrong thing to do here. If you already 
understand why "copies" is wrong, you can skip ahead here, but otherwise 
you need to understand this even though it doesn't directly answer your 
question.

Given a simple class, we can set an attribute on a couple of instances 
and see what happens. Copy and paste these lines into a Python 
interactive session, and see if you can guess what output the print will 
give:

class Test:
    pass

spam = Test()
eggs = Test()

obj = []
spam.attribute = obj
eggs.attribute = obj
spam.attribute.append("Surprise!")

print(eggs.attribute)

If you think about *copies*, you might think that spam and eggs have 
their own independent copies of the empty list. But that's not what 
Python does. You don't have two copies of the list, you have a single 
list, and two independent references to it. (Actually, there are three: 
obj, spam.attribute, eggs.attribute.) But only one list, with three 
different names.

This is similar to people. For instance, the President of the USA is 
known as "Mr President" to his staff, "POTUS" to the military, "Barrack" 
to his wife Michelle, "Mr Obama" to historians and journalists, "Dad" to 
his children, and so forth. But they all refer to the same person. In a 
few years, Barrack Obama will stand down as president, and somebody else 
will be known as "Mr President" and "POTUS", but he'll still be 
"Barrack" to Michelle.

Python treats objects exactly the same. You can have lots of names for 
the same object. Some objects, like lists, can be modified in place. 
Other objects, like strings and ints, cannot be.

In Python, we refer to this system as "name binding". You have things 
which are names, like "obj", and we associate an object to that name. 
Another term for this is a "reference", in the generic sense that we 
"refer" to things.

So we can bind an object to a name:

obj = []

We can *unbind* the name as well:

del obj

In Python, assignment with = is name binding, and not copying:

spam.attribute = obj

does not make a copy of the list, it just makes "spam.attribute" and 
"obj" two different names for the same list. And likewise for 
"eggs.attribute".

Hopefully now you can understand why it is wrong to talk about "copies" 
here. In Python, you only get copies when you explicitly call a function 
which makes a copy, and never from = assignment (name binding).

Now let me get back to your original question:

> But when I test, I see some interesting things: first (and this is
> consistent with above) the class variables are created when the class is
> defined, and can be used even without any instances of the class being
> created.

Correct. Not only that, but class attributes will show up from instances 
as well:

py> class Parrot:
...     colour = "green"
...     def description(self):
...             return "You see a %s coloured bird." % self.colour
...
py> polly = Parrot()
py> polly.description()
'You see a green coloured bird.'

> Second, initially confusing but maybe I understand... there are pointers to
> the class variables associated with every instance of the object, 

Don't think about pointers. That's too specific. It just so happens that 
the version of Python you are using *does* use pointers under the hood, 
but that's not always the case. For instance, Jython is written in Java, 
and IronPython is written in dot-Net's CLR. Neither of those languages 
have pointers, but they have *something* that will do the same job as a 
pointer.

This is why we talk about references. The nature of the reference 
remains the same no matter what version of Python you use, regardless of 
how it works under the hood.

Putting aside that, you're actually mistaken here about there being an 
association between the instance and class attribute. There is no 
association between the instance and the class attribute. (Or rather, no 
*direct* association. Of course there is an indirect association.) What 
actually happens is something rather like this:

Suppose we ask Python for "polly.colour". Python looks at the instance 
polly, and checks to see if it has an instance attribute called "polly". 
If it does, we're done. But if it doesn't, Python doesn't give up 
straight away, it next checks the class of polly, which is Parrot. Does 
Parrot have an attribute called "polly"? Yes it does, so that gets 
returned.

The actual process is quite complicated, but to drastically 
over-simplify, Python will check:

- the instance
- the class
- any super-classes of the class

and only raise an exception if none of these have an attribute of the 
right name.

> but if I
> assign THOSE variables new values, it crerates new, "local"/instance
> variables.

When you ask for the polly.colour attribute, Python will search the 
instance, the class, and any super-classes for a match. What happens 
when you try to assign an attribute?

py> polly.colour = 'red'
py> polly.description()
'You see a red coloured bird.'
py> Parrot.colour
'green'

The assignment has created a new name-binding, creating the instance 
attribute "colour" which is specific to that one instance, polly. The 
class attribute remains untouched, as would any other instances (if we 
had any). No copies are made.

So unlike *getting* an attribute, which searches both the instance 
and the class, *setting* or *deleting* an attribute stops at the 
instance.

I like to describe this as class attributes are *shared*. Unless 
shadowed by an instance attribute of the same name, a class attribute is 
seen by all instances and its content is shared by all. Instance 
attributes, on the other hand, are distinct.

> So:
> Class.pi == 3.14  # defined/set in the class def
> instance.pi == 3.14  # initially
> instance.pi = 4  # oops, changed it
> Class.pi == 3.14  # still
> Class.pi = "rhubarb"  # oops, there I go again
> instance.pi == 4  # still
> 
> Sorry if I'm beating this to a pulp, I think I've got it... I'm just
> confused because the way they are described feels a little confusing, but
> maybe that's because I'm not taking into account how easy it is to create
> local variables...

Local variables are a whole different thing again. Another reason why I 
dislike the habit of calling these things "instance variables", borrowed 
from languages like Java.

-- 
Steven