[MATRIX-SIG] A proposal (LONG) - "reverse of take" revisited
P.S.Craig@durham.ac.uk
P.S.Craig@durham.ac.uk
Tue, 1 Jul 1997 23:56:47 +0100 (BST)
Hi all,
I have decided to stick my neck out and try to bring a little order to
the discussion about extending NumPy's indexing that has been going on
over the last week or so. There are clearly a number of schools of
thought about what should be done, ranging from Aaron Watter's minimal
suggestion that we have a put function corresponding to the existing
take function to Andrew Mullhaupt's and my desires for fairly exotic
indexing semantics based on ideas found in S and APL. I would like to
suggest the following program of development:
a) implement a give function which is the converse of the current take
function. It's still not entirely clear what this means. I would
tentatively propose:
give(a, b, seq, axis=0)
where a and b are NumPy arrays and seq is an integer value sequence
such that take(a, seq, axis) is an array of the same shape as
b. Formally, having checked the arguments, the function would do
the equivalent of
axisoffset = (slice(None,None,None),)*axis
for i in range(len(seq)):
a[axisoffset+(seq[i],)] = b[axisoffset+(i,)]
This has the nicely symmetric property that having done
give(a,b,seq,axis) then b and take(a, seq, axis) are arrays of the
same shape with the same contents.
I wonder how much efficiency gain there would be with a C implementation?
b) For most, if not all, of the indexing ideas that have been thrown
about, we want to be able to efficiently index more than one axis
simultaneously. So we appear to want versions of take and give that
operate on more than one axis simultaneously.
A natural extension to the current take function would be
multitake(a, (seq1,seq2,...,seqm), axes=range(m))
where m is the number of axes being simultaneously indexed. The
effect of assigning the result to b would be the same as
b = a
seqs = (seq1,seq2,...,seqm)
for i in range(m):
b = take(b, seqs[i], axes[i])
but could be much more efficiently implemented.
multigive is more challenging to define. In the spirit of my
proposal for the give function, this might be
multigive(a, b, (seq1,seq2,...,seqm), axes=range(m))
where multitake(a,(seq1,seq2,...,seqm), axes) would be an array of
the same shape as b
c) The simplest extension to current NumPy indexing is to allow an
arbitrary integer valued sequence as well as integers or slices as
an index. Thus, for example,
a[:,seq1,:,seq2,seq3]
would be the same as
multitake(a,(seq1,seq2,seq3),(1,3,4))
and
a[:,seq1:,seq2,seq3] = b
would be the same as
multigive(a,b,(seq1,seq2,seq3),(1,3,4))
Note that, according to the proposal so far,
a[seq1][seq2] = b
would be syntactically correct but would fail to alter a as a[seq1]
would have a copy of the apropriate part of a's data and would not
be a reference to a's data.
Mixing of slice and sequence indexing seems fairly straightforward
to implement.
d) It is interesting to note that if the take (and presumably also
multitake) function returned a reference object instead of copying
data to a new array, then the give (and multigive) function would
be trivial to implement.
At present,indexing in NumPy using slices is done by reference and
the resulting object is a highly efficient array implementation
because the memory location for a[i1,i2,...,ij ,...,in] is the
memory location for a[i1,i2,...,0,...,in] + ij *sj where sj is the
stride length for axis j. In other words, the reference object is
no different from a directly created array except for the value of
sj.
The difficulty with a reference implementation for indexing by
arbitrary sequences is that the resulting array will be less
efficient than the basic NumPy array. The implementation is fairly
straightforward and would have the advantage that a statement such
as
a[[1,3,4]][2]=1
would actually have the desired effect of changing a[4].
The other difficulty with a reference implementation for take is
that the default behaviour would have to remain copying. Otherwise,
existing code could well break.
Actually, once the basics of a reference implementation of take
have been sorted, multitake is a trivial function to write
efficiently by repeated use of take.
e) More exotic indexing ideas, such as those found in S and advocated
by Andrew Mullhaupt and me, would be the subject of individual or
collective experimentation using python classes for the time
being. I think that the ideas in a)-c) and possibly also d) would
provide a good basis for playing with more powerful index
notation. If a concensus emerged, we could then take the process
further.
Now, some of this is fairly straightforward. A start on code for a) is
Zane Mosteller's array_set function. However, this probably needs some
work. b) can either be tackled directly once a) has been completed or
can be tackled by doing d). c) is trivial once b) is done.
To my mind, the big question is whether we think d) is a good idea. If
it is, it really means having two different array objects, one of
which is Jim Hugunin's current object and the other of which
implements a more general reference array type. The latter is not
difficult to implement, I think. The problem is that the two
implementations cannot really be separated if we want efficiency. I
don't think we can implement as a separate module. Unfortunately, my
impression from Jim H's last message was that he does not have any
time to give to NumPy for the time being. So we would have to make
changes to NumPy code and hope that we could keep the changes up to
date if NumPy evolved. Of course, if we could get the changes accepted
into NumPy then all would be well.
I'm sorry to have been so long-winded. If anyone is still reading at
this point and is still interested, please comment. Otherwise, I might
just go ahead by myself anyway sometime in the next couple of months.
Exhaustedly yours,
Peter Craig
#--------------------------------------------------------------------#
| E-mail: P.S.Craig@durham.ac.uk Telephone: +44-91-3742376 (Work) |
| Fax: +44-91-3747388 +44-91-3860448 (Home) |
| |
| WWW: http://fourier.dur.ac.uk:8000/stats/psc.html |
| |
| Snail: Peter Craig, Dept. of Math. Sciences, Univ. of Durham, |
| South Road, Durham DH1 3LE, England |
#--------------------------------------------------------------------#
_______________
MATRIX-SIG - SIG on Matrix Math for Python
send messages to: matrix-sig@python.org
administrivia to: matrix-sig-request@python.org
_______________