Future floating point directions? [was Re: floating point in 2.0]

Tue Jun 12 03:17:44 EDT 2001

[Tim]
>> Python has no 754 story -- it doesn't try.

[Edward Jason Riedy]
> Which is really astonishing for a language aiming at ease of use
> and teaching.

Astonishing?  Don't know about that.  ABC (Python's closest predecessor) and
DrScheme are definitely in the teaching biz, and they take e.g. 6.02e23 as
meaning exact rational (301/(5*10L**24)).  That's more astonishing to me,
but it's a position reached by more than one serious newbie-friendly
language project.

> Yes, I know, it's someone else's fault.

Not really.  Who would that be?  Trying to tease precise x-platform sense
out of 754 HW given the current state of C environments is a major project,
unless we give up on using C's facilities and do it all in software instead.
Which is why I expect Python will sooner grow an 854 story, indeed doing
decimal fp entirely in software.

> (Not that I've been jumping up and down with code. sigh...  C++ sucks
> my will to code, but I don't know any more flexible language that
> compiles on both a T3E and an SP.)

Python is written in straight C (of the C89 flavor; BTW, it wasn't until
about a year ago that Python finally required an ANSI C compiler; new stds
are adopted slowly, which is why you shouldn't be so timid about requiring
what you really want in the end -- c.f. 754 punting on correct
decimal<->binary rounding for fear of rocking a boat that sank anyway a few
years later).

>> [on whether C99 is likely to be implemented in full]

> I can think of three companies with test vectors being developed,
> so I assume there's something to test.  Of course, all three also
> sell hardware with these features.  gcc's coming along, but with
> much more finely stretched resources.

gcc is important, but before Python can rely on a thing literally dozens of
compilers have to support it, and correctly.  OTOH, the more Platforms from
Mars Python lands on, the more I'm inclined to say "oh, screw it -- if we
can bash it into shape under gcc and WinTel, that's 90% of the benefit".
Guido doesn't listen to me, though <wink>.

>> [RS 6000 fused mul-add]

> Nope.  They happened to have a chunk of space left, someone
> suggested fma for correctly rounding an iterative division, and
> an engineer did it over a weekend, fitting it into the add, mul,
> and extra space.  Can't remember his name, but I can dig through
> my notebook if necessary.  Story from Dr. Kahan, who was there.
> Go fig.
> ...
>
> Actually, it was for getting a division without a divider.  But
> your point's mostly valid.  The primary push was on getting some
> extra speed (software-ish, thus pipelined, divide),

This is a spicier story than I read in the IBM Journal at the time
(thanks!), but can't say it's fundamentally different:  they wanted faster
division and didn't have the real estate for a monster divider.  That's just
pragmatic.  I expect they got at least as much market benefit from the math
library improvements as from pipelining division, too.

> but the group also wanted to compete on reliability features.  Thus
> the extra breakdown of the invalid flag.
>
> Most of the features we want to add will address reliability /
> debugability in some way.  Things like trapping methods that
> don't require precise interrupts, etc.

All Good Things!

[about "goodness of their hearts" motivations]
> Yes, but he's not the only one at Sun.  They're mostly ex-students
> of Dr. Kahan's, though.
>
> And I could have drawn an example from HP, and that wouldn't be
> because of a Kahan student.  Surprising, but true.  ;)

It happens.  Had KSR survived, you could have picked me as an example too --
they spent several hundred thousand dollars grafting better 754 support into
the hardware at my obnoxious urging (not to mention sweat, blood and
ultimately tears), but all it bought them in the end was financial disgrace
and bankruptcy.  That's 754's fault, because if we had rotten addition they
could have blamed the accounting discrepancies on the hardware <wink>.

[on directed roudning modes]
>> Virtually all HW FPUs support them, and because they're required.

> And because rounding modes are nigh impossible to implement in
> software.

The paying market for these modes isn't worth the bother.  What saved them
is that they're (a) easy to implement in HW, and (b) required.  Either one
makes a decent case, but the combination is compelling.  Worrying about how
hard it would be to do in SW instead is a legitimate concern for a stds
committee, but cuts no ice with the people who pay for chip design -- not
unless legions of customers, or key internal developers, insist they need
it.  If they insist they need it and it turns out they don't use it, they
don't get to play that game again.

> I do wish software had access to the three extra bits necessary to
> round correctly.

So mandate them <0.7 wink>.

>> Without that C99 wouldn't have even considered it (part of X3J11's
>> mandate is not to require things that can't be done w/ reasonable
>> efficiency, so w/o near-universal HW support already in place, they
>> would have punted on this too).

> Um, fma is required in C99.

I'm glad of that, too, but it's unique among C99 features in being bundled
with inquiry macros that say whether or not the implementation "is fast".
So they didn't ignore efficiency there, and the existence of the macros is
(unfortunately) a red flag that will scare non-experts away.

> It certainly lacks near-universal hardware support...  And portability
> pressures have a way of making optional features required.  Look at
> how many vendors have Linux syscall translation layers.

The scientific market-- unlike the Linux market --is a shrinking piece of
the overall pie, though.  I'd say binary fp is perceived by the bulk of the
market as a necessary evil for now, but I'm not sure that will last either.

>> Tossing "recommended" out the window is the best tool a std has.

> Considering the groups being targetted have completely ignored
> REQUIRED pieces, I don't know if it matters.

Which modern chips do you have in mind that punt on required 754 pieces?  I
know you view 754 as "a systems standard", but most of the world doesn't:
if the bulk of it isn't in HW, most language committees aren't going to
agree to fill the gaps.  The long struggle with C alone should be evidence
enough of that.

>> You said you're introducing a quad format, and I assume that's
>> "the usual" 15-bit exponent + 113-bit mantissa one, so stop at
>> required correct rounding to and from no more than 36 decimal
>> digits, and everyone will be happy (this can be done with code
>> much simpler than Gay's, and also without bigint arithmetic).

> Well, if the only defined destinations are single, double, and
> quad, then you only have to support conversions to those
> destinations.  The standard only defines things in the standard.

I could repeat that paragraph (and did <wink>), but don't know how to make
it clearer.  Correct rounding to+from no more than 36 decimal digits can be
done economically and with relative ease, and covers formats up through and
including the quad format I guessed at there (which is the IEEE-like quad
format Sun and KSR both implemented in the early 90's, so I assume it's
still in favor).

> Binary to decimal is constrained so that binary -> decimal -> binary
> produces the same number if both binary numbers are in the same
> precision.

>> Users want that, but IIRC 754 already requires it.

> Not quite.  It's required for a smaller range than is now
> possible.

I believe you're talking about the range over which 754 requires correct
rounding.  I'm not -- just about binary -> decimal -> binary identity.
They're different (albeit related) issues in 754.  From section 5.6
(apparently after the point you stopped reading <wink>):

    When rounding to nearest, conversion from binary to decimal
    and back to binary shall be the identity as long as the decimal
    string is carred to the maximum precision specified in Table 2,
    namely, 9 digits for single and 17 digits for double.

There are no weasel words restricting that to the range where correct
rounding is required; correct rounding is sufficient for identity (given
enough precision in the decimal string), but not necessary.

> Ok.  Calculate in double precision with single precision inputs
> and output.

There's not enough precision in a single to enter some programmers' salaries
without catastrophic loss of information <0.9 wink>.  It's simply too small
for general use.

> You'll get _much_ more reliable answers.  But unfortunately, Python
> doesn't make that easy.  A calculator with that feature (the HP 12C)
> does use extra precision in the calculation.

HP should get Major Awards for usable numerics!  I'm sure the 12C offered at
least 10 decimal digits of displayed precision, though, which is a whale of
a lot more practical than 754 single's "7 -- kinda, it's hard to explain".

>> Single is a storage-efficiency gimmick for big problems with
>> low-precision data.

> Nonsense.

Not in my world.  In 15 years at Cray and KSR, neither company built a box
with HW support for any fp format other than 64 bit.  There wasn't
sufficient interest to justify the development cost, and both businesses
lived and died on fp arithmetic.  Some specific market segments wanted it,
such as weather forecasting and the oil industry; what they had in common
was truly massive quantities of low-precision data, for which single
sufficed and saved half their tape, disk and in-memory storage costs.  Math
libraries could be a little cheaper for single too; that's it.

> Single precision is for limiting the effects of rounding.

Why do I need a distinct storage format for that?  I'd rather compute in
double extended all the time, and explicitly round back to exactly the # of
bits I want when I explicitly want to cut back.

> Single precision without double precision is incredibly silly (although
> some DSPs seem to survive).

Some DSPs get by with 16-bit integer "block fp" -- but they know what
they're doing, and are designed with specific algorithms in mind.

>> "Number-like types" include things like Dates and Times in Python;
>> there is no widest numeric type, and the set of numeric types isn't
>> even known at compile-time (users can add new ones on the fly).

> At the very worst, you can find the widest common type of all
> floating-point numbers in a given statement.  Consider a
> statement containing arithmetic-looking operations.  The
> interpreter can evaluate all the methods appearing in the
> statement first.  Then it can examine the objects involved
> and find anything that's a floating-point type.  Dates and
> times aren't floating-point types.  All the floating-point
> types can be coersed up once.  This does require that the
> floating-point types be comparable, but that's reasonable.

I look forward to the PEP <wink>.  Seriously, I don't know how to implement
this in Python.  You at least need to invent and enforce a new protocol by
means of which an object can be queried as to whether or not "it's fp", and
if so what its precision and dynamic range is.  The only PEP even vaguely
related (and it's vaguely at strongest) on the table now is Paul Dubois's
for Fortran-like Kinds:

    http://python.sourceforge.net/peps/pep-0242.html

Sounds like you need first to define a few new methods in the std
tp_as_number type slot.

> Yes, this does change the semantics slightly.  Anyone relying
> on both the order and results of subexpression evaluation to
> pass information silently between objects should be shot, so I
> don't think it's a big loss.

I'm sure someone will, but until it's fleshed out a real debate won't begin.

> It also changes the implementation a great deal, which is why
> there are no patches attached.  ;)

Whew!  I was afraid my mail client lost them <wink>.

> I'm trying to find a sensible extension outside of statement-
> level, but it might be hopeless.

It may also be more promising.  For example, a new kind of block,

with fp_context=double_extended:
    all intermediates in the block use double extended

IBM's Standard Decimal Arithmetic Proposal requires user-settable precision
(akin to, e.g., x86's "precision control" fpu bits) as part of numeric
context, and it's also a puzzle how to work that into Python smoothly.  In
the Java version it looks like a context argument is explicitly passed to
every operation -- brrrrrr.

> It certainly can't be perfect, as there is no way to assume that
> the types involved are at all related to the types of the inputs, but
> there may be a  reasonable compromise for 90% of the code.  A run-
> time type inferencer?  Think that could be useful well beyond
> floating-point expressions.

Ya, people have been saying that for a decade, but I'm afraid I lost their
patches too <wink>.

> (IMHO, static and dynamic typing is simply a matter of time
> scale.  Everything is statically typed when it is actually
> executed (run through the processor) and dynamically typed
> during development.)

I like that view; it has merit.

>> Even restricted to floating types, at least two Python GMP
>> wrappers exist now, at least one of which exposes GMP's
>> unbounded (but dynamically specified) precision binary fp type.

> This should _not_ be mixable with typical floating-point numbers
> without the programmer saying it's ok.  The semantics are entirely
> different.  If you want them mixed, you should be promoting
> everything they touch to the arbitrary length types.

They're not arbitrary-length, they're fixed-length, but of user-settable
length (fall back on your own view of static vs dynamic here!).  "Entirely
different" seems a stretch, as they're as much "as if to infinite precision
with one final rounding" as 754 semantics.

As to mixing them at all, I'm much less a fan of silent coercions than most,
but I usually get out-voted on such things.  If you do write a PEP, it's
something to argue for there.

> Think of the user whose problem is actually in the decimal->binary
> conversion.  Or a case like z = a + b + c, where a is a gmp type.
> Is it [(a + b) :: gmp-type + c] :: gmp-type, or [a + (b + c) ::
> Python float] :: gmp-type?  Yes, _I_ know it's left-to-right, but
> must everyone be a language lawyer?

If they want to use a language w/o surprises, yes.  Left-to-right is a
Sacred Rule in Python, and I don't know of any language other than Icon that
makes expression evaluation so easy to predict.

> What if a user of a routine had input c as a gmp-type, intending
> for the internals to use arbitrary precision?  The routine would
> have silently used fixed precision intermediates.

That depends on the internal implementation of the routine, and the rules
you're hypothesizing for your version of Python.  Presumably under "widest
available", the routine would adjust to the precision of its inputs.

> ...
> Consider too interval arithmetic.  For intervals to be really
> useful, even the decimal->binary conversions touching an interval
> need to return intervals.  Otherwise, you've violated the
> properties guaranteeing that the answer lies within the result
> interval.  Requiring explicit casts to intervals doesn't mesh
> well with a dynamically typed language like Python.  That makes
> the code much less reusable.

I don't see a reason "in theory" that intervals couldn't be contagious too.
But this is all so far from current Python it's too unreal to get picky
about.

> Actually, with a widest-feasible evaluation strategy, you _can_
> mix these pretty well.  You'll get the most expected results.
> I'm still not comfortable with mixing random-length floats with
> fixed-length floats, especially when division, square root, and
> elementary functions are involved, but intervals and further
> fixed-length floats work beautifully.

I'm happy to hypothesize arbitrary-precision math libraries too.  They're
doable (from Macsyma to Emacs calc), and quite handy.

> On the flip side, complex really shouldn't be silently mixed
> with real at all.

Since the way to spell a complex literal in Python does mix them (like
3+4*j:  that's an honest-to-god + and * there, not a lexical gimmick),
that's a hard rule to adopt <wink>.

> Imagine a beginner trying Cholesky factorizaion in Python.

I can't.  I can picture a beginner using a bad algorithm for computing std
deviation, though, and then Python does stop them from trying to take the
sqrt of their negative real variance (newbies generally don't find about
cmath.sqrt for a long time).

> It'll seem to run just fine.  Any solve with the factors will give
> only real numbers from real numbers.  But some results are completely,
> SILENTLY wrong.  Why?  The matrix wasn't positive definite.  It should
> have choked, but entries silently went imaginary.  Solves will still
> produce real numbers.

Is this special case important enough to stop mountains of code that mix
real and complex without incident?

> Now imagine debugging a program that has a symmetric solve
> somewhere.  Would you spend much time looking at one of the best-
> understood portions?  Or would you end up blowing a huge amount
> of time checking input, data structure conversions, etc.?  Most
> programmers would assume it's a ``problem with floating-point
> arithmetic.''
>
> There are similar problems with many textbook formulas and
> complex arithmetic.  Perhaps people should know better, but how
> often do they?  Wouldn't it be better for Python to help them
> learn?

Guido's view of matrix algebra, and complex numbers, be they in combination
or isolation, is that they're not on a newbie's radar.  "Don't ask, don't
tell."  This is something to take up with the NumPy folks.

> It somewhat does by separating math and cmath.  This one isn't
> something that can be handled with a `widest' evaluation scheme;
> silently mixing these types in any way can lead to horrible bugs.

In the background Guido is reworking Python's type and class model.  It
should be possible after that's done to subclass from the builtin types.  If
so, then you should be able to subclass the complex type and override its
coercion policies.  Any change to Python's core binary fp numerics would
likely require near-universal consensus -- and I just don't see that
happening.  I'm afraid there's no behavior that doesn't have a vocal
constituency when threatened, and when the primary users of a thing can't
reach agreement the status quo wins by default.

life-in-the-same-old-same-old-lane-ly y'rs  - tim