I am having some problems relating to the current function dot
(identical to matrixmultiply, though I haven't seen the equivalence in
any documentation). Here is what the docs say:
dot(a,b)
Will return the dot product of a and b. This is equivalent to matrix
multiply for 2d arrays (without the transpose). Somebody who does
more linear algebra really needs to do this function right some day!
Or the builtin doc string:
>>> print Numeric.dot.__doc__
dot(a,b) returns matrix-multiplication between a and b. The product-sum
is over the last dimension of a and the second-to-last dimension of b.
First, this is misleading. It seems to me to indicate that b must
have rank at least 2, which experiments indicate is not necessary.
Instead, the rule appears to be to use the only axis of b if b has
rank 1, and otherwise to use the second-to-last one.
Frankly, I think this convention is ill motivated, hard to remember,
and even harder to justify. As a mathematician, I can see only one
reasonable default choice: One should sum over the last index of a,
and the first index of b. Using the Einstein summation convention
[*], that would mean that
dot(a,b)[j,k,...,m,n,...] = a[j,k,...,i] * b[i,m,n,...]
[*] that is, summing over repeated indices -- i in this example
This would of course yield the current behaviour in the important
cases where the rank of b is 1 or 2.
But we could do better than this: Why not leave the choice up to the
user? We could allow an optional third parameter which should be a
pair of indices, indicating the axes to be summed over. The default
value of this parameter would be (-1, 0). Returning to my example
above, the user could then easily compute, for example,
dot(a,b,(1,2))[j,k,...,m,n,...] = a[j,i,k,...] * b[m,n,i,...]
while the current behaviour of dot would correspond to the new
behaviour of dot(a,b,(-1,-2)) whenever b has rank at least 2.
Actually, there is probably a lot of code out there that uses the
current behaviour of dot. So I would propose leaving dot untouched,
and introducing inner instead, with the behaviour I outlined above.
We could even allow any number of pairs of axes to be summed over, for
example
inner(a,b,(1,2),(2,0))[k,l,...,m,n,...] = a[k,i,j,l,..] * b[j,m,i,n,...]
With this notation, one can for example write the Hilbert-Schmidt
inner product of two real 2x2 matrices (the sum of a[i,j]b[j,i] over
all i and j) as inner(a,b,(0,1),(1,0)).
If my proposal is accepted, the documentation should probably declare
dot (and its alias matrixmultiply?) as deprecated and due to disappear
in the future, with a pointer to its replacement inner. In the
meantime, dot could in fact be replaced by a simple wrapper to inner:
def dot(a,b):
if len(b.shape) > 1:
return inner(a,b,(-1,-2)
else:
return inner(a,b)
(with the proper embellishments to allow it to be used with python
sequences, of course).
- Harald