[long and fussy] But what do these operators _mean_?

Fri Jul 21 16:21:38 EDT 2000

I tend to agree on what you said about error bounds.  Would you like to
summarize what you consider should go into the specifications? 

On the other hand I think a lot of this should go to the packages rather
than the operators.  The specification should be broad enough so even
special subclasses like sparse matrices, specially shaped matrices, etc, or
speed-greedy algorithms would not have to conflict with it.

My idea is to specify a collection of parameters to be guaranteed by the
numerical packages, each specifying its own values for these paramters.
What about the derivative packages?  If they are intended for serious number
crunching they should also specify their own parameters, and which packages
or algorithm they are using in implementation.

Does this somehow alleviate your concern about untrackable implementations?

On 21 Jul 2000 18:04:40 GMT, Edward Jason Riedy <ejr at lotus.CS.Berkeley.EDU>
wrote: 
> - >Quick summary:  Maybe there needs to be a numerics-sig before any 
> - >of these changes are adopted...
>So how do we get one started? 

Good question.  Any volunteers?

> - Concerning the semantics of these operators, I think we only need a 
> - general agreement, leaving out algorithmic specifications to various 
> - packages.
>
>I'm not so sure.  My experience on the periphery of the BLAST group 
>has shown that general agreement has to be backed up with some
>guaranteed, TESTABLE error bounds to be reliable.

This could be done whenever a particular implementation gets included with
the language distribution.  I don't think the language should make any
guarantee on packages not distributed with it.  When NumPy gets included, it
should come with the specs.

>You should not rely on the pure math definition because it assumes 
>exactness.
[snip]
>
>What I'd shoot for is a statement of exactly what problem is solved 
>along with a guaranteed error bound.

Agreed.  But the question is whether it goes into the specs of the operators
or the specs of the packages?  What is the definition of + in Python anyway?
It is different for numbers and lists.  Does / mean divide?  It's different
for floats and ints (not that this is a good thing).

The point is, for any serious numerical work, users should check the package
they are importing, rather than a generic description of the operators.

>And I'm not looking for the best possible error bounds, just currently 
>feasible ones.  Those are available, and any better algorithm must
>beat them to be considered better.  Also, the error bounds given in
>the LAPACK and BLAST docs have been achieved with very fast
>implementations, so it's not that much of a burden.

The problem is that there are trade-offs.  There can be better algorithms
that are considered better simply because they are faster.  Would such
packages be banned from using these operators?  Or doing things clearly
against the specs?  Where should users look for definite specs?

>However, I can see the sense in providing a common minimum environment
>that's good for many, so long as you do not restrict access to better 
>features.  The goal is not to turn everyone into numerical analysts,
>but rather to make numerical analysts as unnecessary to daily life as
>possible.  That takes guaranteed, testable error bounds.  Even though
>most users won't know what they mean, test suites can keep 
>implementors honest.

The common minimum should clearly be restricted to one package.  If a user
just import this package, he knows that he enters a care-free environment
where everything is guaranteed to be right, but often not optimal.  The
specs should still go with this package.

>And, imho, the default should always be accuracy before speed.  You
>can always compute a wrong answer very quickly.

Agreed.

> - Unlike matlab that is not object oriented, [...]
>
>It has been for the past few years.  The competition is not standing 
>still.

We'd better hurry. :-)

>Matlab's popular because people don't need to think about many of the 
>lower-level details.  Unfortunately, the lower-level details frequenty 
>come up an bite people.  (Ok, I don't have any numbers to back up 
>`frequently', and I only hear of the cases where things fail, but 
>still...)  That and the huge number of supporting packages.

I would still think that's mainly caused by matlab's monolithic nature.
Python with its clean namespace separation would greatly minimize this risk.

Here's my idea of "grand unification of numerical computation in python" :-)

There are many packages that implement various numerical routines.  They are
closely related to lapack, linpack, eispack, cephes or whatever names their
underlying fortran or C packages have.   

There are also several interface packages, which implements the interfaces
that emphasize multiarray or matrix algebra or other specific conveniences
like sparse matrices, etc.

They are designed in such a way that users can easily assemble numerical
pacakges that has given guaranteed performance with desirable interface.
The advantage of separation of implementation from interface is that it
gives much more choices to the users without increasing the load on
developers. 

There can be also one or two assembled packages for dummies-only, with the
easiest interface and most foolproof implementation.

How's that for a long term project?  :-)

Huaiyu