Hi, Numeric is an impressively powerful and in many respects easy and comfortable to use package (e.g. it's sophisticated slicing operations, not to mention the power and elegance of the underlying python language) and one would hope that it can one day replace Matlab (which is both expensive and a nightmare as a programming language) as a standard platform for numerical calculations. Two important drawbacks of Numeric (python) compared to Matlab currently seem to be the plotting functionality and the availability of specialist libraries (say for pattern recognition problems etc.). However python seems to be quickly catching up in this area (e.g. scipy and other projects are bustling with activity and, as far as I know, there is currently a promising undergraduate project on the way at my university to provide powerful plotting facilities for python). There is however a problem that, for the use to which I want to put Numeric, runs deeper and provides me with quite a headache: Two essential matrix operations (matrix-multiplication and transposition (which is what I am mainly using) are both considerably a) less efficient and b) less notationally elegant under Numeric than under Matlab. Before I expound on that, I'd better mention two things: Firstly, I, have to confess largely ignorant about the gritty details of implementing efficient numerical computation (and indeed, I can't even claim to possess a good grasp of the underlying Linear Algebra). Secondly, and partly as a consequence, I realize that I'm likely to view this from a rather narrow perspective. Nonetheless, I need to solve these problems at least for myself, even if such a solution might be inappropriate for a more general audience, so I'd be very keen to hear what suggestions or experiences from other Numeric users are. I also want to share what my phd supervisor, Richard Everson, and I have come up with so far in terms of a), as I think it might be quite useful for other people, too (see [2], more about this later). Ok, so now let me proceed to state clearly what I mean by a) and b): The following Matlab fragment M * (C' * C) * V' * u currently has to be written like this in python: dot(dot(dot(M, dot(transpose(C), C)), transpose(v)), u) Or, even worse if one doesn't want to pollute the namespace: Numeric.dot(Numeric.dot(Numeric.dot(Numeric.M, Numeric.dot(Numeric.transpose(C), C)), Numeric.transpose(v)), u) Part a) (efficiency) -------------------- Apart from the syntactic inconveniences, the Matlab above will execute __considerably__ faster if C is a rather larger matrix (say 1000x1000). There are AFAIK two chief reasons for this: 1. Matlab uses a highly optimized ATLAS[3] BLAS routine to compute the dot-product, whereas Numeric uses a straight-forward home-brewn solution. 2. Numeric performs unnecessary transpose operations (prior to 20.3, I think, more about this later). The transpose operation is really damaging with big matrices, because it creates a complete copy, rather than trying to do something lazy (if your memory is already almost half filled up with (matrix) C, then creating a (in principle superfluous) transposed copy is not going to do you any good). The above C' * C actually creates, AFAIK, _3_ versions of C, 2 of them transposed (prior to 20.3; dot(a,b) translates into innerproduct(a, swapaxes(b, -1, -2)) In newer versions of Numeric, this is replaced by multiarray.matrixproduct(a, b) which has the considerable advantage that it doesn't create an unnecessary copy and the considerable disadvantage that it seems to be factor 3 or so slower than the old (already not blazingly fast) version for large Matrix x Matrix multiplication, (see timing results [1])). Now fortunately, we made some significant progress on the performance issues. My supervisor valiantly patched Numeric's multiarrayobject.c to use ATLAS for {scalar,vector,matrix} * {scalar,vector,matrix} multiplications. As a consequence the multiplication of pair of 1000x1000 matrices is computed more than _40_ times faster on my athlon machine (see [1], it seems to work fine but both the timing results as well as the code should be treated with a bit of caution). The BLAS-enabled dot takes almost exactly the same time as Matlab doing the same thing. In my case, this makes the difference between waiting several minutes and a day, and between deciding to use python or something faster but more painful. Although to benefit from this patch, one has to install ATLAS this is straight-forward (see [2] and [3] for pointers). In addition it is easy to install Lapack at the same time, as ATLAS provides significantly optimised versions of some of the Lapack routines too. Using Lapack rather than the lapack_lite that comes with Numeric also has the advantage that it is extensively tested and works very reliably (e.g. the lapack_lite routine that is called by Heigenvalues is pretty broken and searching the net suggests that this has been the case for some time now). This still leaves point 2), transposes, as a potential source for improvement. As far as I understand, the Lapack routines operate on flattened arrays, i.e. the only difference on calling some routine on a matrix and its transpose is that one has to pass a different additional parameters in the second case that specifies the structure of the matrix. Therefore if one were to program C' * C in fortran, only C itself needed to exist in memory. Would it therefore be possible that operations like transpose, slicing etc. be carried out lazily (i.e. so that only _modifying_ the result of the transpose or slice or passing it to some function for some reason needs a real copy resulted in an actual copy operation being performed)? As far as I understand the slicing in numarray is no longer simple aliasing (which is a good thing, since it make arrays more compatible with lists), are there any plans for implementing such an on-demand scheme? Part b) (syntax) ________________ As I said, dot(dot(dot(M, dot(transpose(C), C)), transpose(v)), u) is pretty obscure compared to M * (C' * C) * V' * u) Although linear algebra is not the be all and end all of numerical calculations, many numerical algorithms end up involving a good deal of linear algebra. The awkwardness of python/Numeric's notation for common linear algebra operation severely impairs, in my opinion, the utility of writing, using and debugging algorithms in python/Numeric. This is a particular shame given python's generally elegant and expresssive syntax. So far, I've thought of the following possible solutions: 0. Do nothing: Just live with the awkward syntax. 1. Comment: add a comment like the Matlab line above above complex expressions. 2. Wait for fix. Hope that either: 2.1. numarray will overload '*' as matrix multiplication, rather than elementwise multiplication (highly unlikely, I suspect). 2.2. the core python team accepts one of the proposed operator additions (or even, fond phantasy, allows users to add new operators) 3. Wrap: create a DotMatrix class that overloads '*' to be dot and maybe self.t to return the transpose -- this also means that all the numerical libraries I frequently use need to be wrapped. 4. Preprocess: Write some preprocessor that translates e.g. A ~* B in dot(A,B). 5. Hack: create own customized version of array that overloads e.g. '<<' to mean dot-product. The possible downsides: 0. Do nothing: Seems unacceptable (unless as a temporary solution), because the code becomes much less readable (and consequently harder to understand and more error-prone), thus forgoing one of the main reasons to choose python in the first place. 1. Comment: Unsatisfactory, because there is no way to gurantee that comment and code are in synch, which very likely will lead to difficult to find bugs. 2. Wait for a fix: Either would be nice (I could live with having to write multiply(a,b) -- I suppose however some other people can't, because I can't think of another reason why Numeric didn't overload * for matrixproduct in the first place, moreover it would mean a significant interface change (and code breakage), so I guess it is rather unlikely). From what I gather from searching the net and looking at peps there is also not much of a chance to get a new operator anytime soon. 3. Wrap: Promises to be a maintenance nightmare (BTW: will the new array class be subclassable?), but otherwise looks feasible. Has anyone done this? 4. Preprocess: Would have the advantage that I could always get back to "standard" python/Numeric code, but would involve quite a bit of work and might also break tools that parse python. 5. Hack array: Seems evil, but in principle feasible, because AFAIK '<<' isn't used in Numeric itself and hopefully it wouldn't involve too much work. However, given a free choice '<<' is hardly the operator I would choose to represent the dot product. I am not completely sure what the policy and rationale behind the current division of labor between functions, methods and attributes in numarray is but as far as the lack of a missing transposition postfix operator is concerned, one reasonable approach to make the transpose operation more readable (without the need to change anything in python itself) would seem to me to provide arrays with an attribute .t or .T so that: a.t == transpose(a) While this would admittedly be a bit of a syntactic hack, the operation is so commonplace and the gain in readability (in my eyes) is so significant that it would seem to me to merit serious consideration (maybe a.cc or so might also be an idea for complex conjugate notation, but I'm less convinced about that?). If the addition of a single operator for the exclusive benefit of Numeric users is rejected by the core python team, maybe it's worthwhile lobbying for some mechanism that allows users to define new operators (like e.g. in Haskell)... OK, that's all -- thanks for bearing with me through this long email. Suggestions and comments about the patch [2] and possible solutions to issues raised are greatly appreciated, regards, alexander schmolck Footnotes: [1] timing results for the patched version of Numeric[2] comparing new and old 'dot' performance: http://www.dcs.ex.ac.uk/~aschmolc/Numeric/TimingsForAtlasDot.txt [2] A patch for Numeric 21.1b that speeds up Numeric's dot function by several factors for large matrices can be found here: http://www.dcs.ex.ac.uk/~aschmolc/Numeric/ [3] ATLAS (http://math-atlas.sourceforge.net/) is a project to provide a highly optimized version of BLAS (Basic Linear Algebra Subroutines, a standard and thouroughly tested implementation of basic linear algebra operations) and some LAPACK routines. The charm of ATLAS is that it is platform-independent and yet highly optimized, which it achieves by essentially fine tuning a number of parameters until optimum performance for the _particular_ machine on which it is built is reached. As a consequence complete builds can take some time, but binary versions for ATLAS for common processors are available from http://www.netlib.org/atlas/archives (moreover, even if one decides to build ATLAS oneself, the search space can be considerably cut down if one accepts the suggested "experience" values offered during the make process). -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.Schmolck@gmx.net http://www.dcs.ex.ac.uk/people/aschmolc/