[Numpy-discussion] copy on demand

Rick White rlw at stsci.edu
Wed Jun 12 09:27:03 EDT 2002

Here is what I see as the fundamental problem with implementing slicing
in numarray using copy-on-demand instead views.

Copy-on-demand requires the maintenance of a global list of all the
active views associated with a particular array buffer.  Here is a
simple example:

    >>> a = zeros((5000,5000))
    >>> b = a[49:51,50]
    >>> c = a[51:53,50]
    >>> a[50,50] = 1

The assignment to a[50,50] must trigger a copy of the array b;
otherwise b also changes.  On the other hand, array c does not need to
be copied since its view does not include element 50,50.  You could
instead copy the array a -- but that means copying a 100 Mbyte array
while leaving the original around (since b and c are still using it) --
not a good idea!

The bookkeeping can get pretty messy (if you care about memory usage,
which we definitely do).  Consider this case:

    >>> a = zeros((5000,5000))
    >>> b = a[0:-10,0:-10]
    >>> c = a[49:51,50]
    >>> del a
    >>> b[50,50] = 1

Now what happens?  Either we can copy the array for b (which means two
copies of the huge (5000,5000) array exist, one used by c and the new
version used by b), or we can be clever and copy c instead.

Even keeping track of the views associated with a buffer doesn't solve
the problem of an array that is passed to a C extension and is modified
in place.  It would seem that passing an array into a C extension would
always require all the associated views to be turned into copies.
Otherwise we can't guarantee that views won't be modifed.

This kind of state information with side effects leads to a system that
is hard to develop, hard to debug, and really messes up the behavior of
the program (IMHO).  It is *highly* desirable to avoid it if possible.

This is not to deny that copy-on-demand (with explicit views available
on request) would have some desirable advantages for the behavior of
the system.  But we've worried these issues to death, and in the end
were convinced that slices == views provided the best compromise
between the desired behavior and a clean implementation.

Richard L. White    rlw at stsci.edu    http://sundog.stsci.edu/rick/
Space Telescope Science Institute
Baltimore, MD

More information about the NumPy-Discussion mailing list