[PYTHON MATRIX-SIG] a few thoughts

P. Dubois dubois@kristen.llnl.gov
Thu, 19 Oct 1995 10:38:09 +0800


I have only begun to peruse the archive but I did see a couple of discussions
worth some comments which I would like to make. For ten years I have been
in charge of a Fortran extension language called Basis, which has now been
used in perhaps 150-200 codes. I designed this language to look like an
array Fortran. It has been extremely popular with users. So what follows is
a synthesis of a whole lot of experience. 

Let me begin with a short synopsis of Basis' array rules:

Arrays can be of any of the standard Fortran types, including complex.
They consist of a single block of storage with auxillary "shape" information.
They can be up to seven dimensional (the Fortran limit) but in practice
five is the highest I have seen in use. Had I been implementing them
in a modern language of course no limit would have been necessary.

Operations are element-wise but we have a separate operator for matrix
multiply and matrix divide (the latter means, a /! b is that x such
that b *! x = a), where /! and *! are the matrix operators). All
functions such as sin(x) operate element wise and do the right thing
depending on the type of x. It is important to be able to pass the
address of the storage area off to compiled code. Scalars broadcast
but we have an explicit function for "dup'ing" something to make it
match something higher dimensional, rather than do this automagically

Now some comments:

1. Usage of the matrix operators is perhaps 1% or less of the usage of
the element-wise numbers. This is because when two-dimensional
matrices do arise, they usually represent the spatial or other type of
discretization, far more often than they represent operators.  If I
had it to do over again, I would not have special operators, just a
function call, since the light usage is not worth the trouble. 

2. Speed is crucial. The basic operations must take place in compiled
loops without function calls, with as little overhead as possible. 
In other words, x + y must be lead to a compiled loop doing the 
corresponding operation on the blocks of storage. Yes, maybe one has
broadcast/type coercion/shape checks or operations first, but if both
x and y are really double arrays of length n we want to be into a 
C loop doing xa[i] + ya[i] where xa and ya point to the storage areas.

This is because when a code is written as a programmable application
with an interpreter over compiled routines, the codes tend to evolve
to having more and more parts of the code in the interpreter. Also,
the interpreter is used to calculate information that is derivable
from the compiled state variables rather than add new compiled code,
but sometimes these calculations are quite intensive.

3. I think it is mistaken to try to reduce the implementor's job by
doing many types in one like the "array" built-in object does.  
Having basic double/integer/complex stuff work fast should be the
primary consideration, even if it means some tedious and not terribly
elegant coding. A completely Python class like the J-array posting is
beautiful and suitable for cases where uniformity of representation is
of value, but it is a completely different question than having something
fast that can talk to C or Fortran. I timed this class doing x+x where
x was 100,000 elements long, and it was about 1000 times slower than
a simple C extension to Python and even ten times slower than doing a for 
loop in Python (but boy, I learned a lot about Python from it!).

4. In Basis, sqrt(-1.) is an error not a complex; we actually went
through a time when it was complex and found it to be a disadvantage.
One must balance the need to avoid/find errors against the need for easy 
expression.

5. One might want to consider having a very fast, very raw vector class
on which to base higher level classes that have concepts like shape, etc.

6. Banging malloc too hard can be a problem in these beasts. You probably
   couldn't afford to represent a big matrix with independently malloc'ed
   pointers for each row.

Some historical notes:

a. I noticed some discussion of representing things always as complex numbers.
The original Matlab (when Cleve Moler wrote it to 
teach students linear algebra) represented everything as a complex number.
There was a limit of I think 5000 numbers in a system, period, because it
used a fixed array. When I ported it to a Cray I replaced that part of it
with a memory manager so that you could have a lot of complex numbers.
But of course having all the numbers be secretly complex is completely 
crazy from an efficiency point of view, not to mention storage.

b. Yes, I implemented an extension language in Fortran. It was 1984 and
it was the only language available on a Cray. The computer center manager
said that I didn't need C, you can do everything in Fortran.

c. The documentation for the Basis language is available on line 
at http://www-phys.llnl.gov/X_Div/htdocs/basis.html. I have decided to base
my next generation system on Python.

d. Reference for some philosophy: Dubois, P.F., 
"Making Applications Programmable", Computers in Physics Jan/Feb 1994.

Paul F. Dubois, L-472				(510)-422-5426
Lawrence Livermore National Laboratory		FAX (510)-423-9969
Livermore, CA 94550				dubois1@llnl.gov
Consulting: PaulDubois@aol.com
Editor, Scientific Programming Department, Computers in Physics


=================
MATRIX-SIG  - SIG on Matrix Math for Python

send messages to: matrix-sig@python.org
administrivia to: matrix-sig-request@python.org
=================