FW: [Numpy-discussion] Bug: extremely misleading array behavior

Alexander Schmolck a.schmolck at gmx.net
Wed Jun 12 08:44:04 EDT 2002


"eric jones" <eric at enthought.com> writes:

> > Couldn't one have both consistency *and* efficiency by implementing a
> > copy-on-demand scheme (which is what matlab does, if I'm not entirely
> > mistaken; a real copy gets only created if either the original or the
> > 'copy'
> > is modified)? 
> 
> Well, slices creating copies is definitely a bad idea (which is what I
> have heard proposed before) -- finite difference calculations (and
> others) would be very slow with this approach.  Your copy-on-demand
> suggestion might work though.  Its implementation would be more complex,
> but I don't think it would require cooperation from the Python core.?
> It could be handled in the ufunc code.  It would also require extension
> modules to make copies before they modified any values.  
> 
> Copy-on-demand doesn't really fit with python's 'assignments are
> references" approach to things though does it?  Using foo = bar in
> Python and then changing an element of foo will also change bar.  So, I

My suggestion wouldn't conflict with any standard python behavior -- indeed
the main motivation would be to have numarray conform to standard python
behavior -- ``foo = bar`` and ``foo = bar[20:30]`` would behave exactly as for
other sequences in python. The first one creates an alias to bar and in the
second one the indexing operation creates a copy of part of the sequence which
is then aliased to foo. Sequences are atomic in python, in the sense that
indexing them creates a new object, which I think is not in contradiction to
python's nice and consistent 'assignments are references' behavior.


> guess there would have to be a distinction made here.  This adds a
> little more complexity.
> 
> Personally, I like being able to pass views around because it allows for
> efficient implementations.  The option to pass arrays into extension
> function and edit them in-place is very nice.  Copy-on-demand might
> allow for equal efficiency -- I'm not sure.

I don't know how much of a performance drawback copy-on-demand would have when
compared to views one -- I'd suspect it would be not significant, the fact
that the runtime behavior becomes a bit more difficult to predict might be
more of a drawback (but then I haven't heard matlab users complain and one
could always force an eager copy). Another reason why I think a copy-on-demand
scheme for slicing operations might be attractive is that I'd suspect one
could gain significant benefits from doing other operations in a lazy fashion
(plus optionally caching some results), too (transposing seems to cause in
principle unnecessary copies at least in some cases at the moment).

> 
> I haven't found the current behavior very problematic in practice and
> haven't seen that it as a major stumbling block to new users.  I'm happy

>From my experience not even all people who use Numeric quite a lot are *aware*
that the slicing behavior differs from python sequences. You might be right
that in practice aliasing doesn't cause too many problems (as long as one
sticks to arrays -- it certainly makes it harder to write code that operates
on slices of generic sequence types) -- I'd really be interested to know
whether there are cases where people have spent a long time to track down a
bug caused by the view behavior.


> with status quo on this. But, if copy-on-demand is truly efficient and
> didn't make extension writing a nightmare, I wouldn't complain about the
> change either.  I have a feeling the implementers of numarray would
> though. :-)  And talk about having to modify legacy code...

Since the vast majorities of slicing operations are currently not done to
create views that are depedently modified, the backward incompatibility might
not affect that much code. You are right though, that if Perry and the other
numarray implementors don't think that copy-on-demand could be worthwhile the
bother then its unlikely to happen.

> 
> > forwards-compatibility). I would also suspect that this would make it
> *a
> > lot*
> > easier to get numarray (or parts of it) into the core, but this is
> just a
> > guess.
> 
> I think the two things Guido wants for inclusion of numarray is a
> consensus from our community on what we want, and (more importantly) a
> comprehensible code base. :-)  If Numeric satisfied this 2nd condition,
> it might already be slated for inclusion...  The 1st is never easy with
> such varied opinions -- I've about concluded that Konrad and I are
> anti-particles :-) -- but I hope it will happen. 

As I said I can only guess about the politics involved, but I would think that
before a significant piece of code such as numarray is incorporated into the
core a relevant pep will be discussed in the newsgroup and that many people
will feel more confortable about incorporating something into core-python that
doesn't deviate significantly from standard behavior (i.e. doesn't
view-slice), especially if it mainly caters to a rather specialized
audience. But Guido obviously has the last word on those issues and if he
doesn't have a problem either way than either way then as long as the
community is undivided it shouldn't be an obstacle for inclusion.

I agree that division of the community might pose the most significant
problems -- MA for example *does* create copies on indexing if I'm not
mistaken and the (desirable) transition process from Numeric to numarray also
poses not insignificant difficulties and risks, especially since there now are
quite a few important projects (not least of them scipy) that are build on top
of Numeric and will have to be incorporated in the transition if numarray is
to take over. Everything seems in a bit of a limbo right now. I'm currently
working on a (fully-featured) matrix class that I'd like to work with both
Numeric and numarray (and also scipy where available) more or less
transparently for the user, which turns out to be much more difficult than I
would have thought.

alex

-- 
Alexander Schmolck     Postgraduate Research Student
                       Department of Computer Science
                       University of Exeter
A.Schmolck at gmx.net     http://www.dcs.ex.ac.uk/people/aschmolc/





More information about the NumPy-Discussion mailing list