Rick White
Here is what I see as the fundamental problem with implementing slicing in numarray using copy-on-demand instead views.
Copy-on-demand requires the maintenance of a global list of all the active views associated with a particular array buffer. Here is a simple example:
>>> a = zeros((5000,5000)) >>> b = a[49:51,50] >>> c = a[51:53,50] >>> a[50,50] = 1
The assignment to a[50,50] must trigger a copy of the array b; otherwise b also changes. On the other hand, array c does not need to be copied since its view does not include element 50,50. You could instead copy the array a -- but that means copying a 100 Mbyte array while leaving the original around (since b and c are still using it) -- not a good idea!
Sure, if one wants do perform only the *minimum* amount of copying, things can get rather tricky, but wouldn't it be satisfactory for most cases if attempted modification of the original triggered the delayed copying of the "views" (lazy copies)? In those cases were it isn't satisfactory the user could still explicitly create real (i.e. alias-only) views.
The bookkeeping can get pretty messy (if you care about memory usage, which we definitely do). Consider this case:
>>> a = zeros((5000,5000)) >>> b = a[0:-10,0:-10] >>> c = a[49:51,50] >>> del a >>> b[50,50] = 1
Now what happens? Either we can copy the array for b (which means two
``b`` and ``c`` are copied and then ``a`` is deleted. What does numarray currently keep of a if I do something like the above or:
b = a.flat[::-10000] del a
?
copies of the huge (5000,5000) array exist, one used by c and the new version used by b), or we can be clever and copy c instead.
Even keeping track of the views associated with a buffer doesn't solve the problem of an array that is passed to a C extension and is modified in place. It would seem that passing an array into a C extension would always require all the associated views to be turned into copies. Otherwise we can't guarantee that views won't be modifed.
Yes -- but only if the C extension is destructive. In that case the user might well be making a mistake in current Numeric if he has views and doesn't want them to be modified by the operation (of course he might know that the inplace operation does not affect the view(s) -- but wouldn't such cases be rather rare?). If he *does* want the views to be modified, he would obviously have to explictly specify them as such in a copy-on-demand scheme and in the other case he has been most likely been prevented from making an error (and can still explicitly use real views if he knows that the inplace operation on the original will not have undesired effects on the "views").
This kind of state information with side effects leads to a system that is hard to develop, hard to debug, and really messes up the behavior of the program (IMHO). It is *highly* desirable to avoid it if possible.
Sure, copy-on-demand is an optimization and optmizations always mess up things. On the other hand, some optimizations also make "nicer" (e.g. less error-prone) semantics computationally viable, so it's often a question between ease and clarity of the implementation vs. ease and clarity of code that uses it. I'm not denying that too much complexity in the implementation also aversely affects users in the form of bugs and that in the particular case of delayed copying the user can also be affected directly by more difficult to understand ressource usage behavior (e.g. a[0] = 1 triggering a monstrous copying operation). Just out of curiosity, has someone already asked the octave people how much trouble it has caused them to implement copy on demand and whether matlab/octave users in practice do experience difficulties because of the more harder to predict runtime behavior (I think, like matlab, octave does copy-on-demand)?
This is not to deny that copy-on-demand (with explicit views available on request) would have some desirable advantages for the behavior of the system. But we've worried these issues to death, and in the end were convinced that slices == views provided the best compromise between the desired behavior and a clean implementation.
If the implementing copy-on-demand is too difficult and the resulting code would be too messy then this is certainly a valid reason to compromise on the current slicing behavior (especially since people like me who'd like to see copy-on-demand are unlikely to volunteer to implement it :)
Rick
------------------------------------------------------------------ Richard L. White rlw@stsci.edu http://sundog.stsci.edu/rick/ Space Telescope Science Institute Baltimore, MD
alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/