[Numpy-discussion] copy on demand

Perry Greenfield perry at stsci.edu
Thu Jun 13 14:17:04 EDT 2002


> > Copy-on-demand requires the maintenance of a global list of all the
> > active views associated with a particular array buffer.  Here is a
> > simple example:
> >
> >     >>> a = zeros((5000,5000))
> >     >>> b = a[49:51,50]
> >     >>> c = a[51:53,50]
> >     >>> a[50,50] = 1
> >
> > The assignment to a[50,50] must trigger a copy of the array b;
> > otherwise b also changes.  On the other hand, array c does not need to
> > be copied since its view does not include element 50,50.  You could
> > instead copy the array a -- but that means copying a 100 Mbyte array
> > while leaving the original around (since b and c are still using it) --
> > not a good idea!
>
> Sure, if one wants do perform only the *minimum* amount of
> copying, things can
> get rather tricky, but wouldn't it be satisfactory for most cases
> if attempted
> modification of the original triggered the delayed copying of the "views"
> (lazy copies)?  In those cases were it isn't satisfactory the
> user could still
> explicitly create real (i.e. alias-only) views.
>
I'm not sure what you mean. Are you saying that if anything in the
buffer changes, force all views of the buffer to generate copies
(rather than try to determine if the change affected only selected
views)? If so, yes, it is easier, but it still is a non-trivial
capability to implement.

> >
> > The bookkeeping can get pretty messy (if you care about memory usage,
> > which we definitely do).  Consider this case:
> >
> >     >>> a = zeros((5000,5000))
> >     >>> b = a[0:-10,0:-10]
> >     >>> c = a[49:51,50]
> >     >>> del a
> >     >>> b[50,50] = 1
> >
> > Now what happens?  Either we can copy the array for b (which means two
>
> ``b`` and ``c`` are copied and then ``a`` is deleted.
>
> What does numarray currently keep of a if I do something like the
> above or:
>
> >>> b = a.flat[::-10000]
> >>> del a
>
> ?
>
The whole buffer remains in both cases.

> > copies of the huge (5000,5000) array exist, one used by c and the new
> > version used by b), or we can be clever and copy c instead.
> >
> > Even keeping track of the views associated with a buffer doesn't solve
> > the problem of an array that is passed to a C extension and is modified
> > in place.  It would seem that passing an array into a C extension would
> > always require all the associated views to be turned into copies.
> > Otherwise we can't guarantee that views won't be modifed.
>
> Yes -- but only if the C extension is destructive. In that case
> the user might
> well be making a mistake in current Numeric if he has views and
> doesn't want
> them to be modified by the operation (of course he might know
> that the inplace
> operation does not affect the view(s) -- but wouldn't such cases be rather
> rare?). If he *does* want the views to be modified, he would
> obviously have to
> explictly specify them as such in a copy-on-demand scheme and in the other
> case he has been most likely been prevented from making an error (and can
> still explicitly use real views if he knows that the inplace
> operation on the
> original will not have undesired effects on the "views").
>
If the point is that views are susceptible to unexpected changes
made in place by a C extension, yes, certainly (just as they
are for changes made in place in Python). But I'm not sure what
that has to do with the implied copy (even if delayed) being
broken by extensions written in C. Promising a copy, and not
honoring it is not the same as not promising it in the first
place. But I may be misunderstanding your point.

Perry





More information about the NumPy-Discussion mailing list