[long and fussy] But what do these operators _mean_?

Sat Jul 22 22:56:16 EDT 2000

And Huaiyu Zhu writes:
 - I tend to agree on what you said about error bounds.  Would you like to
 - summarize what you consider should go into the specifications? 

Um, the same as in the BLAS and LAPACK?  ;)  Seriously, though,
I need to think about this some more.  And do some back-reading,
like checking some of the papers on language support for complex,
etc.  No on-line copies are available, but I need to make a 
library run soon anyways.

 - On the other hand I think a lot of this should go to the packages rather
 - than the operators.

I don't.  Historically, unless people have required error bounds
from the very beginning, package providers have gone the fast-but-
wrong route.  That's why the Alpha happened at all.  See yet 
another of Dr. Kahan's writings for examples:
  http://www.cs.berkeley.edu/~wkahan/ieee754status/baleful.ps

I'm not always citing his papers just because I know him; they're
really the most lucid, easily available writings I've seen on the 
topic.  If anyone can point at others, please do.

 - The specification should be broad enough so even
 - special subclasses like sparse matrices, specially shaped matrices, 
 - etc, or speed-greedy algorithms would not have to conflict with it.

Special matrices always have _better_ bounds, so there's no issue 
there.  And speed-over-accuracy algorithms should not be encouraged
for general use at all.  Like I said, they may be appropriate for
very special uses, but those people are likely to use their own, 
special packages for those environments.  And they're likely to
implement most of it in C, anyways.

 - My idea is to specify a collection of parameters to be guaranteed by the
 - numerical packages, each specifying its own values for these paramters.

Hm.  I don't think that's too far from what I want.  LAPACK and the
new BLAST standard give ``environmental query routines'' to get the
underflow, overflow, eps, etc. values, then the standard specifies
the error in terms of those values.  Given that that's the only thing
even possible with varying hardware and arithmetics...

 - Does this somehow alleviate your concern about untrackable 
 - implementations?

Maybe.  Unfortunately, it's been a nice, relaxing Saturday, so I
may not be thinking properly.  ;)

 - Good question.  Any volunteers?

I'm going to subscribe to the Numerical Python list tomorrow or
so, then start looking over the {float,complex}object.c code.

 - I don't think the language should make any guarantee on packages not 
 - distributed with it.

I see your point, but there are the famous ``language design is
library design'' and ``library design is language design'' quotes.

I would not feel at all bad if the language guaranteed that results
are reasonably accurate.  I would feel horribly if it didn't, and
if that lack affected people.

 - The point is, for any serious numerical work, users should check the 
 - package they are importing, rather than a generic description of the 
 - operators.

Yes, but how many will?

 - There can be better algorithms that are considered better simply 
 - because they are faster.

I disagree.  You can _always_ produce a mostly wrong answer quickly.
Would you be making the same arguments for faster but inaccurate
_integer_ arithmetic?

Now, there are cases where a few particular calculations trade the
most-accurate position in different regions.  The general solution
there is to add a few tests before them to find the right region.
Unfortunately, that's not always feasible...

 - Would such packages be banned from using these operators?

>From supplying implementations for them?  Sure.  Keep in mind that
we cannot (and should not) legally ban them (the way a certain
other language does).  We can state clearly that they do _not_ 
meet the minimum requirements.  How woul you feel about a Python
implementation that had 3 == 1 + 1?

 - Where should users look for definite specs?

www.python.org comes to mind.  ;)

 - The common minimum should clearly be restricted to one package.  

I can't agree to this.  Especially since I'm also talking about
things that aren't in packages, like complex arithmetic.

AUGH!  I just looked at complex division:
> Py_complex c_quot(Py_complex a, Py_complex b)
> {
>         Py_complex r;
>         double d = b.real*b.real + b.imag*b.imag;
>         if (d == 0.)
>                 errno = EDOM;
>         r.real = (a.real*b.real + a.imag*b.imag)/d;
>         r.imag = (a.imag*b.real - a.real*b.imag)/d;
>         return r;
> }

The calculation of d can overflow, destroying a computation that
should have been just fine.  The standard solution is to scale
it by 1/b.real (consider it as a dot product).  I need to double-
check this, but I don't think the general accuracy is changed if
you compute 1/d and multiply...

I'll try to fix and test this sometime this week.  sigh.  Time.
The best fix (assuming you're not trapping on overflow) is to
simply compute it much like it is, test for the error afterwards,
and re-compute it if it was screwed up.

 - I would still think that's mainly caused by matlab's monolithic nature.

I agree with Travis Oliphant that the availability of packages is
also a huge deal...

 - There are many packages that implement various numerical routines.  They are
 - closely related to lapack, linpack, eispack, cephes or whatever names their
 - underlying fortran or C packages have.   

LINPACK and EISPACK are dead.  They are both slower and more error-
prone than the latest LAPACK.  Cephes is for elementary functions,
and it provides and documents appropriately good errors.  This all
from a discussion over linear algebra operators, so I've mainly
been thinking of the impact of the error bounds on the operators,
not necessarily of those in functions.

Here's another idea:  Consider the interface-provider idea.  It's
used pretty frequently in Java, at least.  The standard interface
for various numerical routines can have guaranteed error bounds.  
The interface can have various drivers behind it, chosen by specifying
in the installation or by user routine.  Something like ``import Numeric''
would import the default implementation.  All implementations that
could be the default must obey the bounds.

I think we're arguing the same point but with different words...
Like I said, it's been a nice Saturday, so I'm probably not thinking
too clearly on these issues.  Gotta run out to a nice dinner, too,
so I'm rushing.  I'll put more thought into this soon.

But we should try to move this to the correct mailing list, imho.

 - There can be also one or two assembled packages for dummies-only, with the
 - easiest interface and most foolproof implementation.

The ``import Numeric'' + implementation thingy...

Jason