[PYTHON MATRIX-SIG] A problem with slicing
Guido van Rossum
guido@CNRI.Reston.VA.US
Thu, 14 Sep 1995 09:20:28 -0400
[The third in a series of short essays on subjects raised in the
Matrix discussion.]
Here's a problem where I have neither a strong opinion nor a perfect
solution...
Jim Fulton proposes an elegant indexing syntax for matrix objects
which doesn't require any changes to the language:
M[i][j]
references the element at column i and row j (or was that column j and
row i? Never mind...).
This nicely generalizes to slicing, so you can write:
M[i][j1:j2]
meaning the column vector at column i with row indices j1...j2-1.
Unfortunately, the analogous expression for a row vector won't work:
M[i1:i2][j]
The reason for this is that it works by interpreting M as a sequence
of columns (and it's all evaluated one thing at a time -- M[i][j]
means (M[i])[j], and so on). M[i] is column i, so M[i][j] is the
element at row j thereof. But slice semantics imply that of M is a
sequence of X'es, then M[i1:j1] is still a sequence of X'es -- just
shorter. So M[p:q][r] is really the same as M[p+r] (assuming r<q-p).
One way out of this is to adopt the syntax
M[i, j]
for simple indexing. This would require only a minor tweaking of the
grammar I believe. This could be extended to support
M[i1:i2, j]
M[i1:i2, j1:j2]
M[i, j1:j2]
(and of course higher-dimensional equivalents).
This would require considerable changes of the run-time architecture
of slicing and indexing, and since currently everything is geared
towards one-dimensional indexing/slicing, but I suppose it would be
doable.
(Funny how I'm accepting this possibility of changing the language
here, while I'm violently opposed to it for operator definitions. I
guess with adding operators there is no end to the number of new
operators you could dream up, so there would be no end to the change;
while here there's a clear-cut one-time change.)
Of course adopting such a change would completely ruin any possbility
of using things like
M[3, 4, 7] = [1, 10, 100]
as roughly equivalent to
M[3] = 1
M[4] = 10
M[7] = 100
but then again I'm not too fond of that anyway (as a matter of fact,
I'd oppose it strongly).
Some other things that I haven't completely followed through, and that
may cause complications for the theoretical foundation of it all:
- Allowing M[i, j] for (multidimensional) sequence types would also
meaning that D[i, j] would be equivalent to D[(i, j)] for
dictionaries.
- Should M[i][j] still be equivalent to M[i, j]?
- Now we have multidimensional sequence types, should be have a
multidimensional equivalent of len()? Some ideas:
- len(a, i) would return a's length in dimension i; len(a, i) == len(a)
- dim(a) (or rank(a)?) would return the number of dimensions
- shape(a) would return a tuple giving a's dimensions, e.g. for a
3x4 matrix it would return (3, 4), and for a one-dimensional
sequence such as a string or list, it would return a singleton
tuple: (len(a),).
- How about multidimensional map(), filter(), reduce()?
- map() seems easy (except there seems to be no easy way to specify
the rank of the operator): it returns a similarly shaped
multidimensional object whose elements are the function results for
the corresponding elements of the input matrix
- filter() is problematic since the columns won't be of the same
length
- reduce()??? -- someone who knows APL tell me what it should mean
- Multidimensional for loops? Or should for iterate over the first
dimension only?
One sees there are many potential consequences of a seemingly simple
change -- that's why I insist that language changes be thought through
in extreme detail before being introduced...
--Guido van Rossum <guido@CNRI.Reston.VA.US>
URL: <http://www.python.org/~guido/>
=================
MATRIX-SIG - SIG on Matrix Math for Python
send messages to: matrix-sig@python.org
administrivia to: matrix-sig-request@python.org
=================