[PYTHON MATRIX-SIG] Time for recap?
Guido van Rossum
guido@CNRI.Reston.VA.US
Sat, 30 Sep 1995 14:07:39 -0400
> > Likewise, instead of my earlier wild ideas about multidimensional
> > slices, I propose to use a slice() method that can produce arbitrary
> > slices; e.g. a[k][i:j] would be equivalent to a.slice(k, (i,j)).
>
> I disagree slightly here. I still prefer a[(k, range(i,j))] for Guido's
> example. This makes a[(k, [x,y,z])] possible.
Ah. I was writing from memory, and forgot this feature. I don't like
it. How many times do you have to select a more or less arbitrary
group of rows/columns from a matrix? It makes the slice representation
bulkier -- contiguous slices can be stored (in C) as 4 ints per
dimension, while random selections will require a variable-length
list of indices per dimension. (It is also wasteful to have to
generate the full range of numbers when what you mean is a contiguous
slice.)
Perhaps a more important reason why I don't like this is that I see
problems with various notations. In the current version of the
language, you can't write a[2,3] -- you have to write a[(2,3)] and it
will be used as a mapping index (not as a sequence index!). The
proposal is to make this mean a[2][3]. That's fine with me.
There are certain ambiguities with the use of parentheses in Python
that I can't completely get rid of without breaking the type
structure: (1) means the same as 1 -- a scalar, while (1,2) is a tuple
of length 2. To write a tuple of length one, you can't write (1) --
you have to write (1,). (A tuple of length 0 is () -- easy.) For
consistency, I think I'll have to make a[(1,)] mean the same as a[1],
if a[(1,2)] is the same as a[1,2]. Finally, notice that to the object
being indexed, there is no way to tell a[(1,2)] from x=(1,2); a[x].
A tuple is just one example of a sequence in Python. Lists are
another example. In many situations, any sequence is acceptable and
the results are the same (e.g. for loops). (And in situations where
only lists or only tuples are accepted by the current version of the
language, Steve Majewski has often made the point that there is no
inherent reason why only one type should be accepted and that this
should be fixed. I agree in most cases.)
If we extend this rule to N-dimensional indexing, a[sequence], for
some sequence of integers whose elements are i, j, k, ..., should be
equivalent to a[i][j][k]..., and we can't make a[(1,2,3)] mean
a[1][2][3] while at the same time interpreting a[[1,2,3]] as a[1:4].
(Sooner or later, you'll be passing a vector to the index. Then the
question will arise, should this have the same meaning as a tuple or
as a list. It's better if they all three mean the same.)
Instead of supporting a[[2,3,5]] to select elements 2, 3 and 5 from a,
I would propose to use filter() or a multi-dimensional extension
thereof if you want to access selected subarrays. Or perhaps just a
method a.select() where each argument is either an index (meaning a
reduction of dimensionality in this dimension by picking just the
sub-array with this index) or a sequence of indices (meaning selecting
the set of sub-arrays with the indices in the sequence).
How about it? Is this acceptable?
> The one thing that I've been doing differently is that slices (ie. a[1:5])
> are returned by value rather than by reference. This was Jim Fulton's
> original implementation and I kept it because it was similar to the notion
> of slicing a list. Conceptually, I have no problems with treating slices
> the same as any other discontinuous index (ie. a[(range(1,5), range(4,6))])
> and returning them by reference. Actually, I like the simplicity of being
> able to think of every index into a matrix returning by reference.
It's somewhat odd that slices are returned by reference (since they
return a new value for lists), but not against the rules or silent
assumptions of the language, and I think it is necessary to make the
most of the flexibility that you get by using the notion of separating
the indexing and the representation object.
> I would be interested in hearing other opinions on this issue.
Me too.
> > If we do things just right, it may be possible to pass in the sequence
> > to be used as the representation -- it could be a Python list, tuple
> > or array (from the array module).
>
> This is an interesting notion, but I can't see what it would gain
> over using a 1d C-array of basic types as the representation.
I like this idea because the list or array containing the
representation may already be available in memory -- so why copy it?
Also by using an immutable underlying sequence (e.g. a tuple) it is
easy to create immutable N-dimensional arrays without the need for a
read-only flag. Finally it makes it possible to use a representation
where the actual values are stored in disk and only fetched into
memory when needed, using a cache -- this way you can implement your
own virtual memory system, persistent matrices, etc.
There could still be a "default" underlying representation that is
highly optimized and that the indexing object knows about, for speedier
access.
> It wouldn't be too hard to expand the definition of basic type to
> include (PyObject *)'s if you'd like to have the possibility of a
> matrix of "real" python objects.
Yes, the latter is definitely something that should be possible even
if my idea doesn't find the acceptance I hope it will get.
> I should have time to finalize some of these things this weekend so that I
> can make an alpha version of a matrix object written in C available to
> interested members of this group on Monday. It is surprisingly similar to
> Konrad's sample implementation in python (though a lot bigger, uglier and
> yet faster owing to it's C-implementation).
Cool!
--Guido van Rossum <guido@CNRI.Reston.VA.US>
URL: <http://www.python.org/~guido/>
=================
MATRIX-SIG - SIG on Matrix Math for Python
send messages to: matrix-sig@python.org
administrivia to: matrix-sig-request@python.org
=================