Copying dictionaries containing lists

Alex Martelli aleax at aleax.it
Fri Mar 21 04:44:47 EST 2003


<posted & mailed>

Harald Massa wrote:

> I read about dictionaries:
> 
> """If you want to modify a dictionary and keep a copy of the original,
> use the copy method. For example, opposites is a dictionary that
> contains pairs of opposites:
> 
>>>> opposites = {'up': 'down', 'right': 'wrong', 'true': 'false'}
>>>> alias = opposites
>>>> copy = opposites.copy()

Yep, this is the idiomatic way of getting a (shallow) copy of a dict.


> and than I did the following:
> 
> 
> ActivePython 2.2.2 Build 224 (ActiveState Corp.) based on
> Python 2.2.2 (#37, Nov 26 2002, 10:24:37) [MSC 32 bit (Intel)] on
> win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> karl={'int':[],'rep':[]}
>>>> mirja=karl.copy()
>>>> print karl
> {'int': [], 'rep': []}
>>>> print mirja
> {'int': [], 'rep': []}
> 
> #.... Now mirja should be a copy of karl, should'nt it?

Yes, a shallow copy, to be precise -- the two dictionary objects
are distinct (you can mutate one dict object without in any way
affecting the other one) but they use the same key and value
objects.


>>>> karl["rep"].append("Something special")
>>>> print karl
> {'int': [], 'rep': ['Something special']}

Here, you are not mutating the dict object itself, but rather one
list object that is the value for both an entry in dict object
'karl' and an entry in dict object 'mirja'.

> #works as expected... but:
> 
>>>> print mirja
> {'int': [], 'rep': ['Something special']}
> #Not at all expected. Hot did "Something special" get to mirja?

It's not in dictionary object mirja, it's in list object
mirja['rep'] which is just the same as list object karl['rep'].

> #Let's put in tanja:
>>>> tanja={'int':[],'rep':[]}

Here, you're building a completely distinct third dict object,
whose values in particular have nothing to do with the value
objects in existing dictionaries karl and mirja.


> #And append interisting things to karl
>>>> karl["int"].append("Nothing usual")
> 
>>>> print karl
> {'int': ['Nothing usual'], 'rep': ['Something special']}
> #as expected.
>>>> print mirja
> {'int': ['Nothing usual'], 'rep': ['Something special']}
> #the same problem again.
> 
>>>> print tanja
> {'int': [], 'rep': []}
> # only tanja is a good girl.
> 
> What is going wrong? What did I misunderstand?

You're missing the distinction between shallow and deep copies.

A shallow copy is the ordinary case -- quite fast, not too
expensive of memory, and perfectly adequate when what you want
to do is mutate one dictionary without affecting the other.

When what you want to do is "deeper" -- e.g., mutate some
ENTRIES in a dictionary object without mutating originally
equal entries in another -- then you may need the higher
memory consumption, slower speed, and additional complexities
of a *DEEP* copy.  In a deep copy, a whole "cloud" of objects
referring to each other is "cloned" so the original cloud and
the resulting one have NO mutable obejcts in common any more.

The Pythonic way to obtain a deep copy of such a "cloud of
objects" is as follows:

import copy
mirja = copy.deepcopy(karl)

that's it -- the deepcopy function in standard library module
copy does all the complicated work for you.  When it's done,
the set of mutable objects reachable by starting from karl and
navigating along the references, and the set of mutable objects
similarly reachable from mirja, are disjoint.

DEEP copying is not the default because of all the extra
complication and "expense" -- shallow copies are much simpler
and less onerous whenever they're sufficient (and in practical
programming, most of the time they are sufficient or even
sometimes semantically preferable).  But when you have
determined that deep copies are what you need, Python makes
it quite painless (from the point of view of the code you
have to write) to obtain deep copies.


Incidentally, module copy also exposes a function named copy
which does shallow copies.  By using copy.copy(xx), you do
not have to care about what type of object xx might be -- you
will get a shallow copy whether xx is a dict, a list, or
something else yet (if xx is an immutable object, copy.copy
is even clever enough to be sometimes able to return xx itself,
"indistinguishable" from a shallow copy thereof from the point
of view of effects on your application -- you can check for
that "behind-the-scene" stuff with operator 'is' and/or with
built-in function 'id', if you're curious).


Alex





More information about the Python-list mailing list