[Numpy-discussion] numarray interface and performance issues (for dot product and transpose)

Thu Feb 28 13:13:17 EST 2002

Hi Alexander,

[SNIP]

> Two essential matrix operations (matrix-multiplication and transposition
> (which is what I am mainly using) are both considerably
>
> a) less efficient and
> b) less notationally elegant

[Interesting stuff about notation and efficiency]

> Or, even worse if one doesn't want to pollute the namespace:
>
>   Numeric.dot(Numeric.dot(Numeric.dot(Numeric.M,
>               Numeric.dot(Numeric.transpose(C), C)),
Numeric.transpose(v)), u)

I compromise and use np.dot, etc. myself, but that's not really relavant to
the issue at hand.

[More snippage]

> 2. Numeric performs unnecessary transpose operations (prior to 20.3, I
think,
>    more about this later). The transpose operation is really damaging with
big
>    matrices, because it creates a complete copy, rather than trying to do
>    something lazy (if your memory is already almost half filled up with
>    (matrix) C, then creating a (in principle superfluous) transposed copy
is
>    not going to do you any good). The above C' * C actually creates,
AFAIK,
>    _3_ versions of C, 2 of them transposed (prior to 20.3;

I think you're a little off track here. The transpose operation doesn't
normally make a copy, it just creates a new object that points to the same
data, but with different stride values. So the transpose shouldn't be slow
or take up more space.

Numarray may well make a copy on transpose, I haven't looked into that, but
I assume that at this point your are still talking about the old Numeric
from the look of the code you posted.

>
>      dot(a,b)
>
>    translates into
>
>      innerproduct(a, swapaxes(b, -1, -2))
>
>    In newer versions of Numeric, this is replaced by
>
>      multiarray.matrixproduct(a, b)
>
>    which has the considerable advantage that it doesn't create an
unnecessary
>    copy and the considerable disadvantage that it seems to be factor 3 or
so
>    slower than the old (already not blazingly fast) version for large
Matrix x
>    Matrix multiplication, (see timing results [1])).

Like I said, I don't think either of these should be making an extra copy
unless it's happening inside multiarray.innerproduct or
multiarray.matrixproduct. I haven't looked at the code for those in a _long_
time and then only glancingly, so I have no idea about that.

[Faster! with Atlas]

Sounds very cool.

>
>
> As I said,
>
>   dot(dot(dot(M, dot(transpose(C), C)), transpose(v)), u)
>
> is pretty obscure compared to
>
>   M * (C' * C) * V' * u)

Of the options that don't require new operators I'm somewhat fond of
defining __call__ to be matrix multiply. If you toss in .t notation that you
mention below, you get:

(M)( (C.t)(C) ) (V.t)(u)

Not perfect, but not too bad either. Note that I've tossed in some extra
parentheses to make the above look better. It could actually be written:

M( C.t(C) )(V.t)(u) )

But I think that's more confusing as it looks too much like a function call.
(Although there is some mathematical precedent for viewing matrix
multiplication as a function.)

I'm a little iffy on the '.t' notation as it could get out of hand.
Personally I could use cunjugate as much as transpose, and it's a similar
operation -- would we also add '.c'? And possibly '.s' and '.h' for skew and
Hermetian matrices? That might be a little much.

The __call__ idea was not particularly popular last time, but I figured I'd
toss at it out again as an easy to implement possibility.

-tim