[PYTHON MATRIX-SIG] A problem with slicing

Thu, 14 Sep 1995 09:20:28 -0400

[The third in a series of short essays on subjects raised in the
Matrix discussion.]

Here's a problem where I have neither a strong opinion nor a perfect
solution...

Jim Fulton proposes an elegant indexing syntax for matrix objects
which doesn't require any changes to the language:

	M[i][j]

references the element at column i and row j (or was that column j and
row i?  Never mind...).

This nicely generalizes to slicing, so you can write:

	M[i][j1:j2]

meaning the column vector at column i with row indices j1...j2-1.

Unfortunately, the analogous expression for a row vector won't work:

	M[i1:i2][j]

The reason for this is that it works by interpreting M as a sequence
of columns (and it's all evaluated one thing at a time -- M[i][j]
means (M[i])[j], and so on).  M[i] is column i, so M[i][j] is the
element at row j thereof.  But slice semantics imply that of M is a
sequence of X'es, then M[i1:j1] is still a sequence of X'es -- just
shorter.  So M[p:q][r] is really the same as M[p+r] (assuming r<q-p).

One way out of this is to adopt the syntax

	M[i, j]

for simple indexing.  This would require only a minor tweaking of the
grammar I believe.  This could be extended to support

	M[i1:i2, j]
	M[i1:i2, j1:j2]
	M[i, j1:j2]

(and of course higher-dimensional equivalents).

This would require considerable changes of the run-time architecture
of slicing and indexing, and since currently everything is geared
towards one-dimensional indexing/slicing, but I suppose it would be
doable.

(Funny how I'm accepting this possibility of changing the language
here, while I'm violently opposed to it for operator definitions.  I
guess with adding operators there is no end to the number of new
operators you could dream up, so there would be no end to the change;
while here there's a clear-cut one-time change.)

Of course adopting such a change would completely ruin any possbility
of using things like

	M[3, 4, 7] = [1, 10, 100]

as roughly equivalent to

	M[3] = 1
	M[4] = 10
	M[7] = 100

but then again I'm not too fond of that anyway (as a matter of fact,
I'd oppose it strongly).

Some other things that I haven't completely followed through, and that
may cause complications for the theoretical foundation of it all:

- Allowing M[i, j] for (multidimensional) sequence types would also
meaning that D[i, j] would be equivalent to D[(i, j)] for
dictionaries.

- Should M[i][j] still be equivalent to M[i, j]?

- Now we have multidimensional sequence types, should be have a
multidimensional equivalent of len()?  Some ideas:

  - len(a, i) would return a's length in dimension i; len(a, i) == len(a)

  - dim(a) (or rank(a)?) would return the number of dimensions

  - shape(a) would return a tuple giving a's dimensions, e.g. for a
  3x4 matrix it would return (3, 4), and for a one-dimensional
  sequence such as a string or list, it would return a singleton
  tuple: (len(a),).

- How about multidimensional map(), filter(), reduce()?

  - map() seems easy (except there seems to be no easy way to specify
  the rank of the operator): it returns a similarly shaped
  multidimensional object whose elements are the function results for
  the corresponding elements of the input matrix

  - filter() is problematic since the columns won't be of the same
  length

  - reduce()??? -- someone who knows APL tell me what it should mean

- Multidimensional for loops?  Or should for iterate over the first
dimension only?

One sees there are many potential consequences of a seemingly simple
change -- that's why I insist that language changes be thought through
in extreme detail before being introduced...

--Guido van Rossum <guido@CNRI.Reston.VA.US>
URL: <http://www.python.org/~guido/>

=================
MATRIX-SIG  - SIG on Matrix Math for Python

send messages to: matrix-sig@python.org
administrivia to: matrix-sig-request@python.org
=================