
Boost : 
From: Benedikt Weber (weber_at_[hidden])
Date: 20020629 17:02:07
I would like to discuss one important aspect of expression templates for
matrix algebra, that has not been touched so far. The documentation and
examples shown in uBLAS give the impression that expression templates always
lead to more efficient calculations. That is, however, only true for some
expressions. For many expressions, expression templates perform much worse
than the regular evaluation. Incidentally, all examples in uBLAS are taken
from BLAS and these do indeed perform well with expression templates. uBLAS,
however, is very flexible and allows for many more complicated expressions
and I am sure users want to take advantage of that possibility.
Expression templates are a clever technique to avoid temporaries during the
evaluation of matrix/vector expressions. While avoiding temporaries can
speed up the evaluation, it can also increase the number of operations
dramatically. Consider the example A*B*v, where A and B are matrices and v
is a vector. First take the case without expression templates. The
expression should be evaluated as A*(B*v), because then you have only two
matrix/vector products. Taking all dimensions as N, this will lead to 2*N^2
flops. If you evaluate it as (A*B)*v, you have one matrix/matrix product and
one matrix/vector product, which is much worse in terms of number of
operations, namely N^3+N^2. Now do the same thing with expression templates.
Evaluating A*(B*v) will lead to N^3 flops, because the vector B*v is not
stored as a temporary but recalculated N times. Evaluating (A*B)*v also
gives N^3 flops, because each matrix row in (A*B) is evaluated N times. So
with expression templates the performance is much worse than without. On my
Pentium 3, taking N=100 and evaluating the expression 1000 times takes
_with_ expression templates (#define NUMERICS_USE_ET):
A*(B*v): 6.59 s
(A*B)*v): 7.39 s
and _without_ expression templates:
A*(B*v): 0.21 s
(A*B)*v): 13.62 s
Considering only the number of operations, the last number should
theoretically be of the same order as the calculations with expression
templates, so we gain a factor of 2 here from expression templates. But this
depends on how efficient the evaluation is done without expression
templates. It's quite tricky to understand what's going on exactly from the
code, and I did not get it yet up to now. I did the calulations with
Codewarrior because MSVC gives internal compiler errors, as also indicated
in config.h. (For MSVC, NUMERICS_USE_ET is therefore always defined in
config.h, so be careful.)
As I understand it, and looking at a few tests, neither matrix/matrix
multiplication nor matrix/vector multiplication gain much efficiency from
expression templates. A factor of two was sometimes observed as in the
examples above, but this could be due to an inefficient implementation of
the the case without expression templates (correct me, if I am wrong). So my
first guess is, you should use expression templates only for addition,
subtraction, assignment and scalar multiplication with vectors or matrices
(and, of course, the corresponding +=, =, *=). I hope it is possible to
have only some of the operations evaluated by expression templates. And I
hope the problem can be solved by just having some operations being treated
differently. If you would have to analyse the whole expression and optimize
it, this would probably be much more involved.
I think this is a serious problem and should be considered in more detail. I
really like the library, especially for its flexibility to evaluate
expressions with matrices of different shapes and different types of entries
and it would be a shame if users could not write down more complicated
expressions without loosing efficiency.
Benedikt
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk