[Tutor] <var:data> value changes

Fri May 14 18:05:20 EDT 2004

At 16:04 2004-05-13 +0200, denis wrote:
>Some comments and corrections about the previous message (thanks to Magnus &
>Danny) pointed Python's feature called interning:

BTW I saw in that PEP 237 uses the term interning for this handling
of small integers (although with double quotes on the first use). I
suppose that means it's ok to use that term also for non-strings.

>the first aspect of this
>feature is that integers in range(-5,100) (I didn't know about negative
>ones) are generated and stored as data by python; the second is that they
>won't
>be duplicated, but used 'in place' instead, each time the value is needed:
[lots of experiments removed]

I think the important thing to learn here is when to use "a==b"
and when to use "a is b". From a technical point of view we can
say that "a==b" is a test on whether the values that the names
a and b refer to are equal (which is not as strict as to say that
they are identical), and "a is b" is technically a test to see
whether "a" and "b" refer to the same location in memory.

I think we can ignore those technical details though. To put it
simply, we typically use the comparision "a == b" in our programs
when we want to check if two things are "the same". "a is b" is a
fairly rare bird. Forgetting about memory locations, equality and
identity are different things.

For instance, you never use the is-operator to compare numbers or
strings in normal code. It's really pure nonsense. It's like asking
the bank clerk if this is really the bank notes you deposited when
you make a withdrawal. (Only Goofy does things like that.)

Never rely on interning. If you need to check that a value is an
integer of value 0, and not a float of that value, you can't use
"a == 0", because 0.0 == 0 returns true, but even if "a is 0"
will actually work today (on CPython) there is no guarantee that
it always will. In this case, you should use "a == 0 and type(a)
is int".

The most common use for the is-operator on normal python code is
tests for singletons. For instance, these is only one None object
in a Python program. Type objects, such as int, str, float etc,
are also singletons. So, "if a is None:" or "if type(a) is not
int:" are statement that make sense.

You can also write "if a == None:" or "if type(a) != int:", but
that's a bit like asking me "Is your name Magnus Lyckå?" instead
of "Are you Magnus Lyckå?" if you meet me and are a bit uncertain
on whether that guy in front of me is that guy on the Tutor mailing
list. You're not really interested in whether I have a certain name,
you are intested in whether of not I am a certain person.

Python is obviously implemented to reuse immutable objects at times,
which means that some pairs of objects that we require to return
True on a test like "a == b", will also return true on "a is b". This
might not happen, but it *can* happen for immutables.

As usual, it's safer to assume as little as possible...

>As expected, basic interning is available for small strings (what are the
>criteria?); and the short-term interning form works too.
>What about sequences ?
>
> >>> x=('a')
> >>> y=('a')
> >>> x is y
>True

Those aren't sequences. Parenthesis don't imply tuples. Comma imples tuples.
Sometimes you need to use parenthesis around tuples to disambiguate things,
but in general parenthesis has the same meaning in Python as in English or
in traditional mathematical notation.

 >>> a=2
 >>> a,
(2,)
 >>> (a)
2

See? You are always allowed to end list or tuple literals with a trailing
comma, but for tuples with only one member, it's compulsory.

x and y above are thus strings. We already know that strings that look
like identifiers are interned. Tuples are not interned.

 >>> a, b = (1,2), (1,2)
 >>> a is b
False

> >>> x=['a']
> >>> y=['a']
> >>> x is y
>False
> >>> x=['a']; y=['a'] ; x is y
>False
> >>> x=[]; y=[] ; x is y
>False
> >>> x=[True]; y=[True] ; x is y
>False
>
>Very probable conclusion : yes for immutables, no for mutables. It doesn't
>even work for empty lists or built-in constants.

It would be disastrous if Python interned mutable objects. That would for
instance mean that you can't create more than one empty list in a program.
That would really be stupid, since we often create empty list which we
poppulate with values in some kind of look or recursive process.

>Now, I really want to explore this difference deeper. First, let's enlarge x
>a bit:
>
> >>> x+=[False]
> >>> x
>[True, False]
> >>> x+='a'
> >>> x
>[True, False, 'a']
[snip]
>I typed the last line just for fun, but I am surprised that python accepts
>x+='a', as
>'a' is no list, and refuses x+=1 (?). I would expect to be obliged to write
>x.append(item) for both strings and numbers, as both are immutable --but
>strings are sequences (what about tuples?). And with a dictionary ?

But you failed to test how it treats this "sequence" by using a list with
only one member. Does it convert the string to a list, or does it just
append the string?

 >>> a = []
 >>> a += 'hello'
 >>> a
['h', 'e', 'l', 'l', 'o']

Ok. That explains why "a += 1" won't work, but I'm still a bit surprised.
Does the old list.extend() method work like this?

 >>> a.extend(' there')
 >>> a
['h', 'e', 'l', 'l', 'o', ' ', 't', 'h', 'e', 'r', 'e']

Yes it does, so it's nothing new with augmented assignment ("a += b" behaves
just like "a.extend(b)"). I suppose this behaviour isn't so strange after
all. It's actaully consistent with how Python regularly convert sequences
to lists, or rather how python accepts any sequence as input in functions
that returns lists. It works like this in other cases too:

 >>> map(None, ['h','e','l','l','o'], 'there')
[('h', 't'), ('e', 'h'), ('l', 'e'), ('l', 'r'), ('o', 'e')]
 >>> [x.upper() for x in 'a string']
['A', ' ', 'S', 'T', 'R', 'I', 'N', 'G']

> >>> d={}
> >>> x+=d
> >>> x
>[True, False, 'a', 'ciao!']    # no success ;-(

Try a non-empty dictionary instead.

 >>> d = {'first': 0, 'second': 1}
 >>> d
{'second': 1, 'first': 0}
 >>> a = []
 >>> a += d
 >>> a
['second', 'first']

Keys. Right. Same thing would happen if you did this:

 >>> a = []
 >>> for key in d:
         a.append(key)

 >>> a
['second', 'first']

I think you need to read a bit about iterators to get on top of this.

http://docs.python.org/tut/node11.html#SECTION0011900000000000000000
http://www-106.ibm.com/developerworks/library/l-pycon.html?n-l-9271
http://www.python.org/peps/pep-0234.html

>True and False are rather close, but not aside; 1 and 2 are strangely far
>(?). All items are separated from the list, they're not "in" the list. Which
>means, as expected, that the list holds its elements' addresses -- only its
>elements' addresses.

Yep, it's the same with other builtin singletons such as None, int, str, file

 >>> map(id, (str, int, float, long, file, True, False, None))
[504166464, 504108944, 504077792, 504119480, 504076280, 504029048, 
504029032, 504130904]

What about things like builtin functions?

 >>> map(id, (range, min, max))
[8107072, 8106912, 8106872]

Nope, they are allocated on the heap just as "normal" object.

>It also clear that for a dictionary the dict. itself, its keys and its
>values are totally separated.

Certainly. All containers in Python are filled with references to
objects, not with the objects themselves. So, if you do,

 >>> l1 = [1,2,{'three':3}]
 >>> l2 = l1

you have just one list,

 >>> l1 is l2
True

and then if you do,

 >>> l3 = l1[:]

you get a copy of that list, so now you have two lists,

 >>> l1 is l3
False

but you still only have one dict, which both lists contain as
it's last element.

 >>> l1[-1] is l3[-1]
True

So, it you change the first element of l3, the two lists won't
be identical any more,

 >>> l1[0]=1.1
 >>> l1==l3
False

but if you change the *content* if the last element, that will be
visible in both lists.

 >>> l1[-1]['four']=4
 >>> l1
[1.1000000000000001, 2, {'four': 4, 'three': 3}]
 >>> l3
[1, 2, {'four': 4, 'three': 3}]

>Just as a recall about value changes by immutable types:
>
> >>> t='Freude, schöner Götterfunken,...' ; id(t)
>6742216
> >>> t='Ô joie, divine étincelle de beauté...' ; id(t)
>10781656
>
>The (whole) variable's address changes. What happens when I change x, now?
>There are two kinds of changes:

I try not to use "variable" in Python, because that word is ambigous.
Which is the variable? The name/reference or the actual object/value.

When you write

t = 'a string'

you create a string object which is automatically places somewhere in
the memory that Python handles for us, and a name 't' is created in the
current scope. The name 't' is bound to the string object containing
the text 'a string'. If you then do

t = 'another string'

you will create another string object, and then rebind 't' to that
object. (This will mean that 'a string' will be carbage collected by
Python is no other name is bound to it, but that's another story.)

--
Magnus Lycka (It's really Lyck&aring;), magnus at thinkware.se
Thinkware AB, Sweden, www.thinkware.se
I code Python ~ The Agile Programming Language