Counterintuitive Python behavior

Wed Apr 17 10:01:17 EDT 2002

dominikush at yahoo.com wrote:

> Hello,
> 
> one thing I like very much about Python is that statements
> work like you would expect them to work. Take for example
> the use of dict.values() for dictionaries: If you store the
> result of dict.values(), and change the dictionary after-
> wards, the previously stored result remains untouched.

The .values() method of a dictionary is defined to return
a new list of the values.  That's more or less inevitable, 
since a dictionary doesn't _have_ a list of its value 
normally, so it must build it on the fly when you ask for it.
It's not a copy -- it's a new list object.

However, Python does NOT copy except in situations where
a copy is specifically defined to happen.  The .values()
method being in a vague sense such a situation, as mentioned...
a new object, rather than a copy of any existing one.

In general, whenever possible, Python returns references 
to the same objects it already had around, rather than 
copying; if you DO want a copy you ask for it -- see module 
copy if you want to do so in a general way.  Of course,
building new objects is a different case.

If this is counteintuitive, so be it -- there is really
no alternative in the general case without imposing huge
overhead, making copies of everything "just in case".
MUCH better to get copies only on explicit request (and
new objects, when there's no existing object that could
either be copied or referred-to).

Of course there are in-between cases -- such as slices.

The standard sequences give you a new object when you 
ask for a slice; this only matters for lists (for immutable
objects you shouldn't care if you get copies or what).
A list is not able to "share a part of itself", so when
asked for a slice it gives out a copy, a new list (for
generality, of course, it then also does when asked for
a slice-of-everything, thelist[:] -- so in that limit case
the new object can be seen as a copy of the existing one).

The justly popular Numeric package, on the other hand,
defines an array type which IS able to share some or all 
data among several array objects -- so a slice of a Numeric 
array does share data with the array it's sliced from.  It's
a new object, mind you:

>>> import Numeric
>>> a=Numeric.array(range(6))
>>> b=a[:]
>>> id(a)
136052568
>>> id(b)
136052728
>>>

but the two distinct objects a and b do share data:

>>> a
array([0, 1, 2, 3, 4, 5])
>>> b
array([0, 1, 2, 3, 4, 5])
>>> a[3]=23
>>> b
array([ 0,  1,  2, 23,  4,  5])
>>>

Each behavior has excellent pragmatics behind it -- lists 
are _way_ simpler by not having to worry about data sharing, 
arrays have different use cases by far -- but it's hard to 
be unsurprising when two somewhat similar objects differ 
in such details.

But all of the copies which do "happen", e.g. by the
limit case of list slicing or whatever else (with ONE 
exception of which more later) are always SHALLOW copies.

NEVER does Python embark on the HUGE task of _deep_ copying 
unless you very specifically ask it to -- specifically with 
function deepcopy of module copy.  DEEP copying is a serious 
matter -- function deepcopy has to watch out for cycles, 
reproduce any identity of references, potentially follow 
references to any depth, recursively -- it has to reproduce
faithfully a graph of objects referencing each other
with unbounded complexity.  It works, but of course it
can never be as fast as the mundane business of shallow
copying (which in turn is never as fast as just handing
out one more reference to an existing object, whenever
the latter course of action is feasible).

So, that's what has apparently snagged you here:

> However, if a dictionary has lists as value entries, I get
> a counterintuitive behavior (which, just recently, broke my
> code): If you change the dict, the list you previously
> created via dict.values() gets automagically updated. A nice
> feature, but nothing I would have expected!

Not really -- if you change _objects to which the dict refers_
(rather than changing the dict in se), then other references
to just-the-same-objects remain references to just the same
objects -- if the objects mutate, you see the mutated objects
from whatever references to them you may be using.

>>>> dict = {'a':[1],'b':[2]}
>>>> list = dict.values()
>>>> list
> [[1], [2]]

Don't use the names of built-in types as variables: you WILL
be burned one day if you do this.  dict, list, str, tuple, file,
int, long, float, unicode... do NOT use these identifiers for 
your own purposes, tempting though they may be (an "attractive
nuisance", to be sure).  If you don't get into the habit of
avoiding them, one day you'll be trying to (e.g.) build a
list with x=list('ciao') and get puzzling errors... because
you have rebound identifier 'list' to refer to a certain list
object rather than to the list type itself.

Use alist, somedict, myfile, whatever... nothing to do with
your problem here, just some other simple advice!-)

>>>> dict['a'].append(3)

This does not "change the dictionary" -- the dictionary object
still contains exactly the same references, to objects with
the same id's (two string objects, the keys, and two list
objects, the values).  You're changing (mutating) one of those
objects, but that's quite another issue.  You could be
mutating said list object through any reference to it
whatsoever, e.g.:

>>> alist=list('ciao')
>>> adict={'a':alist}
>>> adict
{'a': ['c', 'i', 'a', 'o']}
>>> alist.pop()
'o'
>>> adict
{'a': ['c', 'i', 'a']}
>>>

If you wanted dictionary adict to refer to a COPY (a "snapshot",
if you will) of the contents of alist, you could have said so:

>>> import copy
>>> alist=list('ciao')
>>> adict={'a':copy.copy(alist)}
>>> adict
{'a': ['c', 'i', 'a', 'o']}
>>> alist.pop()
'o'
>>> adict
{'a': ['c', 'i', 'a', 'o']}
>>>

and then the dictionary object's string-representation would
be isolated from whatever changes to the list to which name
alist refers.  The string representation delegates part of its
job to the objects to which the dictionary object refers, so,
if you want to isolate it, you do need copies -- maybe deep
ones, in fact (<shudder>... well no, not really, but...:-).

>>>> dict
> {'a': [1, 3], 'b': [2]}
>>>> list
> [[1, 3], [2]]
> 
> Looks like that in the first case a copy is returned while
> in the latter case list references are returned. Ok, but

Nope.  ALWAYS references.  .values() doesn't return a reference
to an existing object NOR a copy of an existing object, because
there's no "existing object" in this case -- so it always
returns a NEW object, suitably built as per its specs.

> according to Python's philosophy I shouldn't mind if I work
> with lists in the dictionary or anything else. If the
> behavior depends on the knowledge of the type of values I
> put into a dictionary, I find that somehow counterintuitive.

There is no such dependence.  Just a huge difference
between changing an object, and changing (mutating) some
OTHER object to which the first refers.

In Bologna over 100 years ago we had a statue of a local hero
depicted pointing forwards with his finger -- presumably to
the future, but given where exactly it was placed, the locals
soon identified it as "the statue that points to Hotel
Belfiore".  The one day some enterprising developer bought
the hotel's building and restructured it -- in particular,
where the hotel used to be was now a restaurant, Da Carlo.

So, "the statue that points to Hotel Belfiore" had suddenly
become "the statue that points to Da Carlo"...!  Amazing
isn't it?  Considering that marble isn't very fluid and the
statue had not been moved or disturbed in any way...?

This is a real anecdote, by the way (except that I'm not
sure of the names of the hotel and restaurant involved --
I could be wrong on those), but I think it can still help
here.  The dictionary, or statue, has not changed at all,
even though the objects it refers/points to may have been
mutated beyond recognition, and the name people know it
by (the dictionary's string-representation) may therefore
change.  That name or representation was and is referring
to a non-intrinsic, non-persistent, "happenstance"
characteristic of the statue, or dictionary...

> Who is wrong here: my intuition or Python (2.2)? If it's
> my intuition, how can I train my thinking about Python's
> execution model, so that my intuition get's better ;-)

Your intuition, which led you astray here (Python does just
what it should do), can be trained in several ways.  The
works of J. L. Borges and I. Calvino, if you like fiction
that's reasonably sophisticated but still quite pleasant,
are good bets.  If you like non-fiction written by
engineers fighting hard to dispell some of the errors of
philosophers, Wittgenstein and Korzibsky are excellent.

I'm not kidding, but I realize that many Pythonistas don't
really care for either genre.  In which case, this group
and its archives, essays by GvR and /F, and the Python
sources, may also prove interesting reading.

Alex