Legitimacy of deepcopy
db3l at fitlinxx.com
Mon Jun 7 23:38:11 CEST 2004
Scott David Daniels <Scott.Daniels at Acm.Org> writes:
> The problem with deepcopy is that it is impossible to do "correctly,"
> since "correctly" depends on where your data structure abstractions are.
Well, just because there's more than one way to do it, need not mean
that a particular solution isn't a correct one. The semantics of
deepcopy are well-defined and I think perfectly valid. Now, if they
don't fit a particular need then it doesn't fit the need, but that's
up to the user to decide on a case by case basis.
To the original poster, we used deepcopy in a number of interface
points where we need to ensure isolation of internal data structures
versus copies of that information returned to callers. For example,
we can't just return a reference to an internal list or dictionary
which would let the caller later mutate our internal object without
our knowledge. For that, deepcopy works just the way we need, since
our primary goal is to avoid mutation of our original objects.
In most of our use cases, the objects in question are providing data
storage (either in-memory or as a simulation for a database) and need
strict isolation between their internal storage and what callers see
as the objects being returned.
In such cases, I really don't see any good solution in lieu of
deepcopy, and in fact deepcopy (as covered below) is probably already
a pretty good minimal solution in terms of actual work it does. It's
certainly a legitimate need and deepcopy is certainly a legitimate
implementation in my eyes. (I'll grant you that sometimes it feels
strange forcing object copies when you don't normally find yourself
needing to, but when yoiu want copies, you want copies :-))
> A simple example:
> steve = People('Steve')
> joe = People(steve)
> mystruct = [(steve, joe, Dollars(5)), steve]
> Now, should a deepcopy have two distinct steves, or one? Is the $5.00
> in the new structure "the same as" another $5.00 or not? Sometime you
> mean the identity of an object, and sometimes you mean it as a value,
> and no general purpose function named deepcopy can guess where to stop
Valid questions, but just because they can have different answers need
not mean that deepcopy can't pick its own set of answers and stick to
them. In the deepcopy case, it will have one distinct steve
(referenced twice from the new list, and once from within the new joe
within the list) because deepcopy keeps a memo list of objects it has
already copied (to avoid problems with recursion). By doing so it
achieves the primary goal of ensuring that the steve references in the
copy point to a distinct steve from the original (since it is the
original container object you are deep copying), but they remain
internally consistent with how they were used in the original object.
Likewise, the Dollars() instance in the copy will be distinct from
that in the original.
A subtlety (which I'm guessing you are referring to by your identity
comment) is that a deepcopy (IMHO) is aimed at ensuring that the new
copy does not have the ability to mutate objects in the original.
Towards that end, there's no real need to create copies of immutable
objects since they can't be mutated in the first place. So immutable
objects such as numbers, tuples, etc.. will simply have references to
the same object placed in the copy (unless overridden by
__deepcopy__). This is no different to how a copy.copy() of an
immutable object will return a reference to the same object (barring
an override of __copy__).
You could probably view the general deepcopy guideline as making the
minimum number of actual copies to ensure that no references to
mutable objects from the original object remain in the copy, but that
the internal reference structure from the original is maintained. In
general, I find that to be a very practical approach.
Of course, if that's not what you wanted, then deepcopy isn't going to
fit the bill. If you have control of the objects involved, the
__copy__ and __deepcopy__ hooks are provided to let you make some
changes, but it certainly may not fit all cases. But that hardly
makes it "incorrect" or less useful for cases where it does fit (which
to be honest, I'd guess is the majority of them).
More information about the Python-list